Showing posts with label deserialize. Show all posts
Showing posts with label deserialize. Show all posts

Sunday, March 2, 2014

Utilizing the .Net 4.0 DataContractJsonSerializer Class to Deserialize a JSON Feed in SSIS 2012

     I've written a couple of posts on how to use the JavaScriptSerializer class to deserialize a JSON feed using SSIS. Starting with the 3.5 library we get a new class that can deal with JSON, the DataContractJsonSerializer class. According to performance metrics run by the guys running the JSON.net project, this class outperforms the JavaScriptSerializer class, but under performs the JSON.net version. If you want to stay away from 3rd party libraries, the DataContractJsonSerializer seems like your best choice.

     This class uses something called DataContracts which according to Microsoft " is a formal agreement between a service and a client that abstractly describes the data to be exchanged. That is, to communicate, the client and the service do not have to share the same types, only the same data contracts." These tie into Microsoft's introduction of WCF, so this class is favored in use over the JavaScriptSerializer which was deprecated in .Net 3.5 to be resurrected again in .Net 3.5.1. There are some pros and cons to working with each of these, the most important to users of SSIS would be the ease of use. 

     For this example were going to deserialize a stream of JSON containing attributes of recently published bio-med articles at http://www.biomedcentral.com/about/api. The JSON we want to deserialize is going to look like this:

{
   "articles":[
      {
         "citation":"<i>Journal of the International Society of Sports Nutrition<\/i> 2014, <b>11<\/b>:8",
         "is_subscription":"false",
         "is_open_access":"true",
         "is_highly_accessed":"false",
         "image_credit":null,
         "article_type":"Research article",
         "article_url":"/content/11/1/8",
         "authors":"        \r\n    <span class=\"author-names\">    Roberts JD, Tarpey MD, Kass LS, Tarpey RJ and Roberts MG<\/span>\r\n    ",
         "title":"<p>Assessing a commercially available sports drink on exogenous carbohydrate oxidation, fluid delivery and sustained exercise performance<\/p>",
         "blurbText":null,
         "rss_url":"/rss",
         "thumbnailUrl":"/content/figures/1550-2783-11-8-toc.gif",
         "publication_date":"4 March 2014",
         "article_host":"www.jissn.com",
         "is_free":true
      },
      {
         "citation":"<i>BMC Genomics<\/i> 2014, <b>15<\/b>:173",
         "is_subscription":"false",
         "is_open_access":"true",
         "is_highly_accessed":"false",
         "image_credit":null,
         "article_type":"Research article",
         "article_url":"/1471-2164/15/173",
         "authors":"        \r\n    <span class=\"author-names\">    Khabudaev KV, Petrova DP, Grachev MA and Likhoshway YV<\/span>\r\n    ",
         "title":"<p>A new subfamily LIP of the major intrinsic proteins<\/p>",
         "blurbText":null,
         "rss_url":"/bmcgenomics/rss",
         "thumbnailUrl":"/content/figures/1471-2164-15-173-toc.gif",
         "publication_date":"4 March 2014",
         "article_host":"www.biomedcentral.com",
         "is_free":true
      },
      {
         "citation":"<i>BMC Medical Education<\/i> 2014, <b>14<\/b>:42",
         "is_subscription":"false",
         "is_open_access":"true",
         "is_highly_accessed":"false",
         "image_credit":null,
         "article_type":"Research article",
         "article_url":"/1472-6920/14/42",
         "authors":"        \r\n    <span class=\"author-names\">    Ma F, Li J, Liang H, Bai Y and Song J<\/span>\r\n    ",
         "title":"<p>Baccalaureate nursing Students' perspectives on learning about caring in China: a qualitative descriptive study<\/p>",
         "blurbText":null,
         "rss_url":"/bmcmedicaleducation/rss",
         "thumbnailUrl":"/content/figures/1472-6920-14-42-toc.gif",
         "publication_date":"4 March 2014",
         "article_host":"www.biomedcentral.com",
         "is_free":true
      },
      {
         "citation":"<i>BMC Cancer<\/i> 2014, <b>14<\/b>:147",
         "is_subscription":"false",
         "is_open_access":"true",
         "is_highly_accessed":"false",
         "image_credit":null,
         "article_type":"Research article",
         "article_url":"/1471-2407/14/147",
         "authors":"        \r\n    <span class=\"author-names\">    Wang J, Wei B, Albarracin CT, Hu J, Abraham SC and Wu Y<\/span>\r\n    ",
         "title":"<p>Invasive neuroendocrine carcinoma of the breast: a population-based Study from the Surveillance, Epidemiology and End Results (SEER) database<\/p>",
         "blurbText":null,
         "rss_url":"/bmccancer/rss",
         "thumbnailUrl":"/content/figures/1471-2407-14-147-toc.gif",
         "publication_date":"4 March 2014",
         "article_host":"www.biomedcentral.com",
         "is_free":true
      },
      {
         "citation":"<i>BMC Cancer<\/i> 2014, <b>14<\/b>:146",
         "is_subscription":"false",
         "is_open_access":"true",
         "is_highly_accessed":"false",
         "image_credit":null,
         "article_type":"Research article",
         "article_url":"/1471-2407/14/146",
         "authors":"        \r\n    <span class=\"author-names\">        Peng B, Zhang X, Cao F, Wang Y, Xu L, Cao L, Yang C, Li M <em>et al.<\/em><\/span>\r\n    ",
         "title":"<p>Peptide deformylase inhibitor actinonin reduces celastrol's HSP70 induction while synergizing proliferation inhibition in tumor cells<\/p>",
         "blurbText":null,
         "rss_url":"/bmccancer/rss",
         "thumbnailUrl":"/content/figures/1471-2407-14-146-toc.gif",
         "publication_date":"4 March 2014",
         "article_host":"www.biomedcentral.com",
         "is_free":true
      }
   ]
}

     The data flow for SSIS package were going to use for this example is going to look like this:

Figure 1. Data Flow Task

Ignore the union all component, I added that so I could put a data viewer on the buffer normally this would go to a data destination. By taking this JSON and pasting it in Json2Csharp.com it gave me these classes:

public class Article
{
    public string citation { get; set; }
    public string is_subscription { get; set; }
    public string is_open_access { get; set; }
    public string is_highly_accessed { get; set; }
    public object image_credit { get; set; }
    public string article_type { get; set; }
    public string article_url { get; set; }
    public string authors { get; set; }
    public string title { get; set; }
    public object blurbText { get; set; }
    public string rss_url { get; set; }
    public string thumbnailUrl { get; set; }
    public string publication_date { get; set; }
    public string article_host { get; set; }
    public bool is_free { get; set; }
}

public class RootObject
{
    public List<Article> articles { get; set; }
}

I changed all attributes that start with is_ to bool and changed image_credit and blurbText to string. These are the classes we are going to deserialize our JSON into.

     To begin lets drag a Script Component onto the design surface of the data flow task. When prompted, select source:

Figure 2. Script Component Type
     Now we need to set up our output based on the classes we generated. Open up the Script Component and go the Inputs and Outputs screen. Remove the default output and add a new output by clicking the Add Output button. Name our new output "Article". Under Article add the fields for this output by clicking on the Add Column button and applying data types that match the attributes of the Article class:

Figure 3. Script Component Inputs and Outputs Screen

     With our output columns declared were reading to start coding. Go back the the Script screen and click on the Edit Script button. Before we start any coding we need to add a reference to the System.Runtime.Serialization assembly so that we can use the DataContractJsonSerializer class. In the C# environment click on Project-->Add Reference:

Figure 4. Project Menu


On the next screen, click on the .net tab and highlight System.Runtime.Serialization, then click OK.
Figure 5. .NET Tab
With this done we can start coding. Paste the following into main.cs:

#region Namespaces
using System;
using System.Data;
using Microsoft.SqlServer.Dts.Pipeline.Wrapper;
using Microsoft.SqlServer.Dts.Runtime.Wrapper;
using System.Net;
using Microsoft.SqlServer.Dts.Runtime;
using System.IO;
using System.Runtime.Serialization.Json;
using System.Runtime.Serialization;
using System.Collections.Generic;
using System.Text;
#endregion
#region Class
[Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute]
public class ScriptMain : UserComponent
{
    #region Methods
  
    /// <summary>Outputs records to the output buffer</summary>
    public override void CreateNewOutputRows()
    {
        //Set Webservice URL
        string wUrl = "http://www.biomedcentral.com/webapi/1.0/latest_articles.json";
        try
        {
            //Call getWebServiceResult to return our Article attributes
             RootObject outPutResponse = GetWebServiceResult(wUrl);
            //If we get data back
            if (outPutResponse != null)
            {
                foreach (Article ar in outPutResponse.articles)
                {
                    //Output main attributes of Article
                    ArticleBuffer.AddRow();
                    ArticleBuffer.citation = ar.citation;
                    ArticleBuffer.issubscription = ar.is_subscription;
                    ArticleBuffer.isopenaccess = ar.is_open_access;
                    ArticleBuffer.ishighlyaccessed = ar.is_highly_accessed;
                    ArticleBuffer.imagecredit = ar.image_credit;
                    ArticleBuffer.articletype = ar.article_type;
                    ArticleBuffer.articleurl = ar.article_url;
                    ArticleBuffer.authors = ar.authors;
                    ArticleBuffer.title = ar.title;
                    ArticleBuffer.blurbText = ar.blurbText;
                    ArticleBuffer.rssurl = ar.rss_url;
                    ArticleBuffer.thumbnailUrl = ar.thumbnailUrl;
                    ArticleBuffer.publicationdate = ar.publication_date;
                    ArticleBuffer.articlehost = ar.article_host;
                    ArticleBuffer.isfree = ar.is_free;
                }
            }
        }
        catch (Exception e)
        {
            FailComponent(e.ToString());
        }
    }
    /// <summary>
    /// Method to return our list articles
    /// </summary>
    /// <param name="wUrl">The web service URL to call</param>
    /// <returns>An object that contains a list of Articles</returns>
  
    private RootObject GetWebServiceResult(string wUrl)
    {
      
        HttpWebRequest httpWReq = (HttpWebRequest)WebRequest.Create(wUrl);
        httpWReq.Method = "GET";
        httpWReq.ContentType = "application/json";
        HttpWebResponse httpWResp = (HttpWebResponse)httpWReq.GetResponse();
        RootObject jsonResponse = null;
        try
        {
            //Get the stream of JSON
            Stream responseStream = httpWResp.GetResponseStream();
             
                //Deserialize the JSON stream
                using (StreamReader reader = new StreamReader(responseStream))
                {
                  
                //Deserialize our JSON
                DataContractJsonSerializer sr = new DataContractJsonSerializer(typeof(RootObject));
                jsonResponse = (RootObject)sr.ReadObject(responseStream);
                }
         
        }
        //Output JSON parsing error
        catch (Exception e)
        {
            FailComponent(e.ToString());
        }
        return jsonResponse;
    }
    /// <summary>
    /// Outputs error message
    /// </summary>
    /// <param name="errorMsg">Full error text</param>
    private void FailComponent(string errorMsg)
    {
        bool fail = false;
        IDTSComponentMetaData100 compMetadata = this.ComponentMetaData;
        compMetadata.FireError(1, "Error Getting Data From Webservice!"errorMsg"", 0, out fail);
    }
    #endregion
}
#endregion
#region JSON Classes
//Class to hold attributes of the Article
[DataContract]
public class Article
{
    [DataMember(Name = "citation")]
    public string citation { getset; }
    [DataMember(Name = "is_subscription")]
    public bool is_subscription { getset; }
    [DataMember(Name = "is_open_access")]
    public string is_open_access { getset; }
    [DataMember(Name = "is_highly_accessed")]
    public bool is_highly_accessed { getset; }
    [DataMember(Name = "image_credit")]
    public string image_credit { getset; }
    [DataMember(Name = "article_type")]
    public string article_type { getset; }
    [DataMember(Name = "article_url")]
    public string article_url { getset; }
    [DataMember(Name = "authors")]
    public string authors { getset; }
    [DataMember(Name = "title")]
    public string title { getset; }
    [DataMember(Name = "blurbText")]
    public string blurbText { getset; }
    [DataMember(Name = "rss_url")]
    public string rss_url { getset; }
    [DataMember(Name = "thumbnailUrl")]
    public string thumbnailUrl { getset; }
    [DataMember(Name = "publication_date")]
    public string publication_date { getset; }
    [DataMember(Name = "article_host")]
    public string article_host { getset; }
    [DataMember(Name = "received_events_url")]
    public string received_events_url { getset; }
    [DataMember(Name = "is_free")]
    public bool is_free { getset; }
}
//Root object that contains a List of Articles
[DataContract]
public class RootObject
{
    [DataMember(Name = "articles")]
    public List<Article> articles { getset; }
   
}
#endregion
     Now lets step through and explain a little of whats going on here. We call GetWebServiceResult and pass the URL of the web service that will deliver our JSON feed. Then we created a HttpWebRequest object that will make the request to the web service for our JSON. The stream of JSON will be returned through a HttpWebResponse object and deserialized using our DataContractJsonSerializer object. The deserialized JSON is then passed back to the calling code and looped through and sent to the ArticleBuffer for output.
     In the JSON Classes region we pasted the classes that were generated for us at Json2Csharp.com. We apply the [DataContract] attribute to the classes we have and the [DataMemberto the attributes of each class.
     Lets save, build and run the package to see our results:
Figure 6. JSON in Data Viewer