Joseph Michael Pesch
VP Programming

SSIS Missing Data when Importing from Excel File

by 30. July 2009 16:38

Typically in SSIS when we use a file connection (i.e. flat file, or as in this case Excel file), there is a built in feature that reads a certain number of rows in the file and tries to “guess” the data type based on the data it sees in those rows.  In the case of a flat file connection you can set the number of rows in the SSIS advanced editor.  In the case of Excel file connection you cannot set that setting anywhere in SSIS (at least I could not find anywhere in SSIS to do so).  The default value is 8, meaning that only the first 8 rows are scanned for data type mapping.   In my testing all the columns with null dates have the first occurrence of a value in a row higher than 8.   In summary, to fix the issue for my testing I changed this registry entry value (HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Jet\4.0\Engines\Excel\TypeGuessRows) from decimal 8 to decimal 1000. 

http://msdn.microsoft.com/en-us/library/ms141683.aspx

Excerpt from the link above:

The Excel Connection Manager uses the Microsoft OLE DB Provider for Jet 4.0 and its supporting Excel ISAM (Indexed Sequential Access Method) driver to connect and read and write data to Excel data sources.

Many existing Microsoft Knowledge Base articles document the behavior of this provider and driver, and although these articles are not specific to Integration Services or its predecessor Data Transformation Services, you may want to know about certain behaviors that can lead to unexpected results. For general information on the use and behavior of the Excel driver, see HOWTO: Use ADO with Excel Data from Visual Basic or VBA.

The following behaviors of the Jet provider with the Excel driver can lead to unexpected results when reading data from an Excel data source.

  • Data sources. The source of data in an Excel workbook can be a worksheet, to which the $ sign must be appended (for example, Sheet1$), or a named range (for example, MyRange). In a SQL statement, the name of a worksheet must be delimited (for example, [Sheet1$]) to avoid a syntax error caused by the $ sign. The Query Builder automatically adds these delimiters. When you specify a worksheet or range, the driver reads the contiguous block of cells starting with the first non-empty cell in the upper-left corner of the worksheet or range. Therefore you cannot have empty rows in the source data, or an empty row between title or header rows and the data rows.
  • Missing values. The Excel driver reads a certain number of rows (by default, 8 rows) in the specified source to guess at the data type of each column. When a column appears to contain mixed data types, especially numeric data mixed with text data, the driver decides in favor of the majority data type, and returns null values for cells that contain data of the other type. (In a tie, the numeric type wins.) Most cell formatting options in the Excel worksheet do not seem to affect this data type determination. You can modify this behavior of the Excel driver by specifying Import Mode. To specify Import Mode, add IMEX=1 to the value of Extended Properties in the connection string of the Excel connection manager in the Properties window. For more information, see PRB: Excel Values Returned as NULL Using DAO OpenRecordset.
  • Truncated text. When the driver determines that an Excel column contains text data, the driver selects the data type (string or memo) based on the longest value that it samples. If the driver does not discover any values longer than 255 characters in the rows that it samples, it treats the column as a 255-character string column instead of a memo column. Therefore, values longer than 255 characters may be truncated. To import data from a memo column without truncation, you must make sure that the memo column in at least one of the sampled rows contains a value longer than 255 characters, or you must increase the number of rows sampled by the driver to include such a row. You can increase the number of rows sampled by increasing the value of TypeGuessRows under the HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Jet\4.0\Engines\Excel registry key. For more information, see PRB: Transfer of Data from Jet 4.0 OLEDB Source Fails w/ Error.
  • Data types. The Excel driver recognizes only a limited set of data types. For example, all numeric columns are interpreted as doubles (DT_R8), and all string columns (other than memo columns) are interpreted as 255-character Unicode strings (DT_WSTR). Integration Services maps the Excel data types as follows:
    • Numeric – double-precision float (DT_R8)
    • Currency – currency (DT_CY)
    • Boolean – Boolean (DT_BOOL)
    • Date/time – datetime (DT_DATE)
    • String – Unicode string, length 255 (DT_WSTR)
    • Memo – Unicode text stream (DT_NTEXT)
  • Data type and length conversions. Integration Services does not implicitly convert data types. As a result, you may need to use Derived Column or Data Conversion transformations to convert Excel data explicitly before loading it into a non-Excel destination, or to convert non-Excel data before loading it into an Excel destination. In this case, it may be useful to create the initial package by using the Import and Export Wizard, which configures the necessary conversions for you. Some examples of the conversions that may be required include the following:
    • Conversion between Unicode Excel string columns and non-Unicode string columns with specific codepages
    • Conversion between 255-character Excel string columns and string columns of different lengths
    • Conversion between double-precision Excel numeric columns and numeric columns of other types

Tags: ,

SSIS

Books to Read

by 30. July 2009 13:28
  • A Colossal Failure of Common Sense - The Inside Story of the Collapse of Lehman Brothers - Written by Lawrence G. McDonald and Patrick Robinson

Tags: ,

[None]

RegEx: Take a URL and Replace the Page with New Page (isolate page from path and query)

by 14. July 2009 00:35

publicstaticstring ReplaceURLPageName(string URL, string PageNameExtension, string NewPageName)
{
 

// This method takes a URL and replaces the page (with given extension).
 
// Capture the root URL (i.e. everything up to the page
 
// Page is defined by first word with a period in it between slashes
 
string RootURL = "";  

foreach (Match m inRegex.Matches(URL, @"^.*/"))  
{
 
RootURL = m.ToString();
 
}
 

//Strip off everything before the page
 
URL = Regex.Replace(URL, @"^.*/", "");  

//Replace the current page name with new page name (keeping the querystring)
 
URL = RootURL + Regex.Replace(URL, "(.+)." + PageNameExtension, NewPageName);  
System.Diagnostics.Debug.WriteLine(URL);  


return URL;  }

Tags: ,

ASP.Net | C#

MSI Version Reader

by 7. July 2009 16:18

Tags: ,

[None]

Sample Code to Get a Workflow Instance from Persistence DB

by 6. July 2009 03:30

static MyWorkflow GetMyWorkflowInstance(WorkflowRuntime workflowRuntime, Guid instanceID)


{


  /* ================================================================================ */


  // This method is a workaround to get an instance of a workflow from the persistence


  // data store, since there is no direct method to get the workflow instance (i.e.


  // the workflowRuntime.GetWorkflow(instanceId) method gets the workflow model not


  // the actual workflow in it's persisted state.


  /* ================================================================================ */


  // DESERIALIZE FROM PERSISTENCE DB INTO OBJECT.....


  MyWorkflow activity;


  SqlConnection cn = new SqlConnection(


    ConfigurationManager.ConnectionStrings["WorkflowPersistence"].ConnectionString);


  SqlCommand cmd = new SqlCommand(


    "select state from InstanceState where uidInstanceID = @InstanceID", cn);


  cmd.Parameters.AddWithValue("@InstanceID", instanceID.ToString());


  cn.Open();


  byte[] byt = (byte[])cmd.ExecuteScalar();


  cn.Close();


  System.Runtime.Serialization.Formatters.Binary.BinaryFormatter formatter


    = new System.Runtime.Serialization.Formatters.Binary.BinaryFormatter();


  formatter.SurrogateSelector


    = System.Workflow.ComponentModel.Serialization.ActivitySurrogateSelector.Default;


  System.IO.MemoryStream stream = new System.IO.MemoryStream(byt);


  stream.Position = 0;


  using (System.IO.Compression.GZipStream stream2


    = new System.IO.Compression.GZipStream(stream,


      System.IO.Compression.CompressionMode.Decompress, true))


  {


    // Here we can finally do the real work to deserialize... 


    activity


      = (MyWorkflow)System.Workflow.ComponentModel.Activity.Load(stream2


      , workflowRuntime.CreateWorkflow(typeof(MyWorkflow)).GetWorkflowDefinition()


      , formatter);


  }


  return activity;


  /* ================================================================================ */


}


 

Tags: ,

ASP.Net | C# | Workflow Foundation

Get PublicKeyToken in Visual Studio

by 4. July 2009 18:43

The easiest way to get the PublicKeyToken of a .Net assembly is to use the SN.EXE which is typically installed in the SDK folder of the .Net framework (as shown below).

VS2005: C:\Program Files\Microsoft Visual Studio 8\SDK\v2.0\Bin\sn.exe

VS2008: C:\Program Files\Microsoft Visual Studio 9.0\SDK\v3.5\Bin\sn.exe

Sample usage: SN.exe -T C:\samplepath\sampleassembly.dll

Add the following to the Post Build events: "C:\Program Files\Microsoft Visual Studio 8\SDK\v2.0\Bin\sn.exe" -T "$(TargetPath)"

Then when you build the assembly it will print the public key token to the output window.

Also, you could run the assembly and put the following line in source code:

System.Diagnostics.Debug.WriteLine(System.Reflection.Assembly.GetExecutingAssembly().FullName);

Tags: ,

ASP.Net | C# | Visual Studio