Archive

Archive for the ‘SSIS’ Category

String or binary data would be truncated

Doing an INSERT INTO a SQL Server 2014 table from a complex join with UNION i got this error.
There were many fields, very difficult to understand at a first sight where was the problem.
Fortunately in SQL Management Studio right clicking the database with Tasks->Import data i inserted the SQL with UNION in the Data Source, and the destination table in the Destination.
The curious thing: in SQL Management Studio the INSERT INTO failed and no records imported, the same operation in the SQL Server Import and Export Wizard instead succeed but with warnings, which are useful to understand which fields causes truncation:
01-03-2016 14-49-13

Categories: SQL Server, SSIS

xp_cmdshell strange problem

I was working on this request: from a ship are unloaded many containers, it is needed to track the ones not still invoiced; in this case must be sent an email with an Excel file as attachment, containing the list of these containers.
The containers are register in a table of an SQL Server 2008 SP3 10.0.5520.0 (X64).
I tried OPENQUERY but without success (it seems that read and write Excel with OPENQUERY is a sort of black magic) so , giving also the fact that in every case there were many steps too complex to achieve with TSQL (CreateObject and CLR stored procedures were not acceptable options) i used an SSIS package , without using a SSIS catalog: it must be used dtexec.
I shoud try with SQL Server 2014, but with SQL 2008 there is this problem and double quoted strings are tipically requested in a dtexec command line: so i use the trick of the old MSDOS command ECHO to create a batch file and then execute this .bat from xp_cmdshell.
My TSQL is :

DECLARE @varSql nvarchar(500)
DECLARE @varFileName nvarchar(20)
DECLARE @varEcho nvarchar(1000)
DECLARE @XlsFileName nvarchar(500) 
DECLARE @intRes int
SET @varFileName = 'D:\SSIS\ContainersWithoutBilling\batches\' + CAST(@p_Voyage AS varchar) + '.bat'
SET @XlsFileName = 'D:\SSIS\ContainersWithoutBilling\Voyages\' + CAST(@p_Voyage AS varchar) + '_NotInvoiced.xlsx'
SET @varSql = '' + CHAR(34) + 'C:\Program Files (x86)\Microsoft SQL Server\100\DTS\Binn\dtexec' + CHAR(34) + ' /FILE ' + CHAR(34) + 
				'D:\SSIS\ContainersWithoutBilling\SSISPkg\bin\Package.dtsx' + CHAR(34) + ' /CHECKPOINTING OFF  /REPORTING EW  /SET \Package.Variables[IDVoyage].Value;' + 
				CAST(@p_Voyage AS varchar)
SELECT @varEcho = 'ECHO ' + @varSql  + ' > ' + @varFileName 
EXEC @intRes =  master..xp_cmdshell @varEcho , no_output	

At this point we have an batch file, for example is generated something as this 116243.bat:

"C:\Program Files (x86)\Microsoft SQL Server\100\DTS\Binn\dtexec" /FILE "D:\SSIS\ContainersWithoutBilling\SSISPkg\bin\Package.dtsx" /CHECKPOINTING OFF  /REPORTING EW  /SET \Package.Variables[IDVoyage].Value;116243

This example works without problems launched from the File Explorer, and you can think that with

EXEC @intRes = master..xp_cmdshell @varFileName, no_output

your problems are resolved, instead i was surprised that sometimes the .bat file, that was perfectly working launched by hand, under xp_cmdshell the process hangs: and i was constrained to kill both the cmd and the dtexec processes.
Tipically a xp_cmdshell is blocked if is requested an user input , but this was not the case.
After some head scratching, i tried with an temporary job:

DECLARE @jobID uniqueidentifier
DECLARE @cmd varchar(1000) 
SET @cmd = 'D:\SSIS\ContainersWithoutBilling\batches\' + CAST(@p_Voyage AS varchar) + '.bat'
EXEC msdb.dbo.sp_add_job @job_name = '_tmp_batch', @enabled  = 1, @start_step_id = 1, @owner_login_name='sa', @job_id = @jobID OUTPUT 
EXEC msdb.dbo.sp_add_jobstep @job_id = @jobID, @step_name = 'launch batch', @step_id = 1, @subsystem = 'CMDEXEC', @command = @cmd
EXEC msdb.dbo.sp_add_jobserver @job_id = @jobID
EXEC msdb.dbo.sp_start_job @job_id = @jobID, @output_flag = 0 
WAITFOR DELAY '00:00:30' -- wait an reasonable time of 30 seconds until the job should be securely completed
IF EXISTS (SELECT name FROM msdb.dbo.sysjobs WHERE name = '_tmp_batch')
BEGIN
	EXEC msdb.dbo.sp_delete_job @job_name = '_tmp_batch'
END
-- or you could delete if exists the temp job before the creation, and avoid the
-- WAITFOR if there are not other instructions that depends from the job work

and this is working also for the .bat files that hangs under xp_xmdshell.
I think that using an SSIS catalog there aren’t these problems, but in this case the dtexec was mandatory.
At the end it seems that xp_cmdshell has some bug, and an temporary job is always a better choice.

Categories: SQL Server, SSIS

BI for Visual Studio 2013

Still unavailable…someday at this address.

Update 05/23 : available !

Categories: BI, SQL Server, SSIS

Execute SSIS package from SQL Server Job

In order to execute a SSIS package from a SQL Server job there are some security issues.

You develop the package in Sql Server Data Tools (or the old SQL Server Business Intelligence Development Studio) and it works; in the package for example there are some trasformations from CSV files to an SQL Server server instance , and in the package the Connection Manager for SQL Server uses the SQL Server authentication.

The first step that seems logical to try is to insert a step in the SQL Server job as this:


But the job fails ,and in the job log is written :Failed to decrypt protected XML node “DTS:Password”

The problem is that in the SSIS project all sensitive informations are available , instead when the package is executed from the SQL job this uses the dtsexec utility, for which you need to pass in the command line the credentials.

If the security is a concern, an approach can be to open the SSIS project and in the project properties (open the dtsx on Control Flow visual editor, clic on the empty space, press F4) , where there is the security section


By default the ProtectionLevel is EncryptSensitiveWithUserKey , but in this case we use EncryptSensitiveWithPassword.

Then clicking on PackagePassword we are requested for the password:


Which is important to remember, the password will be requested opening the solution:


The package will be opened if you press Cancel, but is not possible to save changes:


At this point , going on our SQL Job definitions we will be required for the password by clicking on the “Command Line” tab:


We see an dtexec command line, in this case by default

/FILE “E:\PkgSSISHecPool\PkgSSISHecPool\PkgSSISHecPool\HecPool.dtsx” /DECRYPT /CHECKPOINTING OFF /REPORTING E

Clicking on “Edit the command line manually” we can change as


Save , and the package step works when called from the SQL Server job.

Categories: SQL Server, SSIS

Microsoft SQL Server 2012 Integration Services: An Expert Cookbook


I worked already with DTS, then with the previous SSIS versions (2005, 2008) but it was not my main activity, only occasional work items.

Now SSIS 2012 is becomed an huge , very complete product and i’m more involved in this argument, so i buyed some books on the argument.

The typical problem of a newcomer to this argument is that after opening SSDT you are staring at the screen observing the icons and wondering what are these items and how to correctly use them , when and where.

For example many icons are intuitive, a “Send Mail Task” is clear what it do, a “Data Flow Task” is intuitive that is something related to data transformations, but inside a “Data Flow Task” begins to appear strange things: no more the intuitive “Script Task” where i code immediately something in VB or C# but an “Script Component” that asks if should a be a Data Source or Destination (a script..as Data Destination?) , the “DQS Cleansing”, “Slowly Changing Dimension”…help…

This book is a valid starting point, even if is not updated to the latest SSIS version: for the examples is used the Visual Studio 2010 shell, but is valid also for the SSDT version working inside Visual Studio 2012.

I began to follow all of the chapters in June in the spare time, and in the meantime i migrated from the Visual Studio 2010 shell to 2012 , without encounter big differences.

This Expert Cookbook is developer oriented , so is not writing about items as “Message Queue Task” , “Maintenance Cleanup Task” , “Maintenance Cleanup Task”, etc. and inside a Data Flow Task arguments as the “Data Mining Query” are not mentioned in the book.

Anyway the book is a very valid introduction to the core tasks that a developer typically does in the everyday work, with some arguments as the DQS Cleansing (the Data Quality functionality of SQL Server 2012) well explained.

Sometimes the book is verbose (some SQL repeated instead of send the reader to a previous page) and sometimes following the samples you can encounter errors (for example a field that cannot be null) , but are minor issues that a developer can easily solve.

Some samples are showed as easily working when instead there are little problems, for example the fact that does not exists an 64 bit driver for Excel (see this previous post) and the connection to mySQL is not simple as showed in the book (origin of another post).

There also other minimal logical errors in the explanation flow that a reader which follows with attention can easily fix.

Arguments as Merge Join Transform. Text Analysis, Fuzzy Transformations , ForEach loops, Asynchronous Script, the entire Deployment chapter, Data Tap, the Restartability and Robustness chapter are very valuable and i learned some very useful techniques for my job.

A very useful chapter is the 14, “Programming SSIS”, in which is explained how to create custom components.

Thw downloadable code is very valuable but i recommend to develop the solutions step by step , so it will be very clear how the things are working.

Categories: Books, SQL Server, SSIS, Vs2010, Vs2012

SSIS Lookup as SQL NOT EXISTS

First of all: IF NOT EXISTS is not the best optimized TSQL : IF EXISTS is much more efficient.

After i have remembered this (how many stored i’m constrained to read with NOT EXISTS in the text…) a typical job is to import updates from an Excel file or another external source, verifying that are not imported duplicated data.

The external source could be a SQL temp table where some program insert new records.

In a stored procedure using TSQL we can use IF EXISTS , for example we can read from a temp table and insert with a syntax as

INSERT INTO Table
SELECT <fields> FROM TempTable WHERE NOT EXISTS (SELECT <fields> FROM Table WHERE <TableKeyFields> = <ImportTableKeyFields>

If we must use a SSIS package that reads for example from an Excel file we can implement a NOT EXISTS logic with the Lookup transformation.

Sometimes is required some treatment to the Excel file: for example i have an destination field for the Litres field in the database table (for trucks refuels) which is numeric and in the Excel file the corresponding field is padded with zeroes to the left in order to have a fixed length ( for example 0099.50 , 0119.67) and this is a problem with Derived Columns that i will search to resolve.

I have tried in the Advanced Editor of the Excel Source to specify a numeric DataType for the Output column, but there were still the leading zeroes (a bug?).

For the moment i found an simple solution using a macro in the Excel file with a “Sheet1” sheet and 179 rows to import :

Public Sub ChangeLiters()
    Dim i As Long
    Dim objRange As Range
    Dim strToSubst As String
    ' in this case i'm sure to always have values...
    For i = 2 To 180
        Set objRange = Worksheets("Sheet1").Range("E" + CStr(i) + ":E" + CStr(i))
        strToSubst = objRange.Text
        While Mid$(strToSubst, 1, 1) = "0"
            strToSubst = Mid$(strToSubst, 2)
        Wend
        objRange.Value = Trim(strToSubst)
    Next i
End Sub

With the pre-treatment my Excel file is ready for the import and the Liters column causes no more errors at the Derived Column transformation , that we see now.

After placing an Excel source is very useful an Derived Column transformation , for example we need to specify fixed values for some columns and we need to create columns that can be joined with the ones in the real db table having the same type.

For example in my case i created these Derived Columns


After this i placed an Lookup Transformation, which is the core trick.

First of all i joined the Derived columns from Excel to the real db columns (the ones with GTIASG010_ prefix)


And in the General properties i specified this:


This means that the rows from Excel that are NOT present in the final db table(we have joined the fields , in this sample 9 joins, in order to to find the existing rows that are equal to the incoming row from Excel so these are already existing and must be NOT imported) are “redirected” in the SSIS flow to the error output of the Lookup transformation.

The error output in this case has not the meaning of exception (as true runtime errors for data types or communications errors and so on) management, but is simply an alternate routing.

With an Conditional Split we can emulate additional SQL WHERE clauses:


So at the end this is the complete package:


Running the package a first time and then after adding a line in the Excel source, with another run we can see


In the real package there is a OLEDB destination, the Union All that in this post serves only to place the final Data Viewer, in order to show the result in the image.

Categories: SQL Server, SSIS

SSIS not so obvious

Sometimes there are things that are not so obvious at a first sight.

For example in Microsoft SSIS the Cache Transform component can use a disk file as cache.

But the thing that is not so widely explained is how to initially create this file.

We put an Cache Transform into our project and then link it to a previous source, the first thing is to define a Connection Manager:


The trick is to define a non existing .caw file in a folder, for example G:\Work\FILES\mysql2.caw when clicking New (above picture) .


In the column section the columns from the source are automatically recognized


Do not click on “Refresh Metadata” button on the General tab , because the file is not still generated so you have only an error.

After a first run of the package the cache file is generated so the “Refresh Metadata” button works if the data source is changed and we need to update the data structure in the Cache Connection Manager .

Categories: SQL Server, SSIS

MySQL and SSIS 64bit

It was a long time that i was working on MYSQL, and now i needed to transfer data from an MySQL instance to SQL Server.

In some blogs there are samples about, in theory downloading the .NET connector you should see in a ADO Net Source (inserted into an Data Flow Task as in these images) the MySQL item in the .NET providers


 
 

But no MySQL :


This because i’m working on a Windows 7 64 bit , the MySQL provider should be listed working in a 32 bit environment.

This even specifying False for the 64bit runtime (see this post).

I have tried to reboot, etc. but without success.

So i tried the x86, 64 bit msi installer for the ODBC connector, it should be slower than the native .NET provider but…


Note: the installation of the .NET connector causes modifications on the machine.config files (working on ASP.NET 4 for example C:\Windows\Microsoft.NET\Framework\v4.0.30319\Config\machine.config).
Launching an project from Visual Studio you can hit the error “Could not load file or assembly ‘MySql.Web, Version=6.7.4.0…..”
Edit the maching.config for your ASP.NET version from Notepad (or other text editor) launched as administrator then comment the lines containing references to “MySql”, clear the Visual Studio solution then recompile.

Once installed the ODBC connection , we can define for our Connection Manager an Odbc data provider (see above image)

But before this , in the system Control Panel -> Administrative tools we need to define an ODBC data source by clicking on the ODBC icon.

In the User DSN tab add a new connection


Here we can finally see something related to MySQL…clicking on finish we can see the dialog box:


We need to specify a DataSource Name (possibly meaningful..) and specify IP – port.

The port is tipically 3306, so is needed to verify firewalls (on the local pc and the server which is running mySQL) that this port is not blocked.

By default MySQL has a “root” user with no password, but this in the real case must be veryfied.

Note about the Database name: don’t write it, it seems that causes problems with the ODBC connection , you the must write your SQL with the complete alias for table (prefixed with the db name).

Clicking on Test does not work , if MySQL is not running in the same pc :


We need to give the perms.

On the MySQL instance open the web interface logging as root or other account with GRANT permissions , click on a db name (is not requested to click the specific db) and then on the SQL tab


In this case i created a “test” db (the real db is hidden for privacy reasons) with a Users table and issued the command

GRANT ALL ON *.* TO 'remoteaccount'@'192.168.8.1' IDENTIFIED BY "testpwd";

This creates a new account “remoteaccount” with password “testpwd” (ok, in the real case is better if more complicated) related to the specified IP (see this page).

This time specifying remoteaccount with the related password our connection works:


Now we have a working DSN


to place in our SSIS : add an ADO.NET connection an choose the news DSN:


Note that we cannot have a list of the tables


And also trying to use the Query Builder, no listed tables:


We need to insert by hand the query, eventually trying before on the MySQL web interface.

I inserted a test “Users” table in MySQL ,the interface is easy : the only problem was to understand how to insert an autoincrement value for the ID field, which is done by clicking on the A_I checkbox


So, after a test on the web interface


We can insert the SQL on SSIS


Categories: MySQL, SSIS

WMI Event Watcher Task tips

I had a problem with a “inherited” (aka: no one knows something , i must discover what is running and what it does…) IIS 7.5 server with an SQL 2008 instance running Reporting services, SSIS jobs (that i should transfer to another server) where sometimes the processor was at 100% for minutes.

Obviously the users are complaining about the lack of service : is a mission critical server; so a first step was to know how many times the processor was over 80% along the work day.

On the same server was needed to copy files in a directory , where a “watchdog” should read and process when a files was uploaded.

Giving the fact that during the night on this server are running some heavy jobs and the directory watchdog is not needed in the day hours, it was requested that the job surveilling the processor was running only during the day and the file watchdog only the night, in order to avoid unuseful workloads.

I don’t know server management software (as Nagios for example) and similar so i thinked an quick&dirty solution for the processor running at 100% .

The WMI Event Watcher seems perfect for the job , but the SSIS world is very complex and i have discovered some things about the WMI Event Watcher.

The first step is to create from the SQL Server Data Tools a new project , it creates automatically a new .dtsx package , here in the Control Flow we can drap&drop from the SSIS toolbox an WMI Event Watcher:


Double clicking the new object we define the properties, where for the first thing we define a new WMI connection:


The WqlQuerySource is the earth of the WMI procedure, a Wql query is a SQL-like language; for example the query

SELECT * from __InstanceModificationEvent WITHIN 20 WHERE TargetInstance ISA 'Win32_Processor' and TargetInstance.LoadPercentage > 80

gives to us a notification when the CPU use is over 80%

In order to help us to the construction of these queries is better to use some software for help.

The first is the WMI Code Creator from Microsoft.

This software is like a wizard with 4 main tabs where to test the queries, that can be constructed browsing the very huge amount of namespaces and WMI classes .

Naturally is requested an understanding of the WMI structure, this page can be of help.

In our case we use the third tab , “Receive an event”:


Using our example for the CPU use , we use the wizard for the code dinamic construction:


Selecting items in the dropdowns and inserting values (as in figure) we generate the code, in this case our final generated code is :

strComputer = "." 
Set objWMIService = GetObject("winmgmts:\\" & strComputer & "\root\CIMV2") 
Set objEvents = objWMIService.ExecNotificationQuery _
("SELECT * FROM __InstanceModificationEvent WITHIN 60 WHERE " & _
                    "PreviousInstance ISA 'Win32_Processor'" & _
                    " AND TargetInstance.LoadPercentage > 80")
Wscript.Echo "Waiting for events ..."
Do While(True)
    Set objReceivedEvent = objEvents.NextEvent
    'report an event
    Wscript.Echo "__InstanceModificationEvent event has occurred."
Loop

Clicking on the “Execute Code” button is started an command line :


Is the condition is reached (we drop a file in that directory), is launched an MessageBox with text “__InstanceModificationEvent event has occurred.”

This is a simple case , with the case of the directory watchdog the things goes complicated.

The Event Class in this case is __InstanceOperationEvent, the Target Instance CIM_DirectoryContainsFile, for which we should insert the directory name with the syntax “Win32_Directory.Name='<foldername>'”.

The problem is that the wizard is creating string delimited from ” , and we have the slash char \ (directory separator in ms-dos paths) that must be escaped, in SSIS we escape a single \ with \\\\ (4 characters).

The problem is complex because we must insert string with a mix of delimiters (‘ and “”) , for example my first attempt was:

(“SELECT * FROM __InstanceOperationEvent WITHIN 60 WHERE TargetInstance.GroupComponent = “Win32_Directory.Name=’g:\\\\WMIFileWatcher'””)

But this causes an error launching the code (lacking of a parentesis); note that after we must insert the working code in the WqlQuerySource property of the WMI Event Watcher .

In order to be not driven mad we could correct the generated code (is not readonly) as

Dim strComputer
Dim objWMIService 
Dim varSrc 
Dim objEvents 
' better do declare the variables , even if in VBScript are all  of Variant type
strComputer = "."  ' the local pc name , otherwise a NETBIOS name
Set objWMIService = GetObject("winmgmts:\\" & strComputer & "\root\CIMV2") 
' below we specify the directory to be watched
varSrc = "g:\WMIFileWatcher"
varSrc = Replace(varSrc, "\", "\\\\")
Set objEvents = objWMIService.ExecNotificationQuery _
    ("SELECT * FROM __InstanceOperationEvent WITHIN 10 WHERE " _
        & "Targetinstance ISA 'CIM_DirectoryContainsFile' and " _
            & "TargetInstance.GroupComponent= " _
                & "'Win32_Directory.Name=""" & varSrc & """'")
Wscript.Echo "Waiting for events ..."
Do While(True)
    Set objReceivedEvent = objEvents.NextEvent
    Wscript.Echo "A file operation has occurred in the directory."
Loop

This works , so we can follow the same approach for our SSIS package.

Note for WITHIN : is the number of seconds between each WMI polling of a directory, a low value causes an high CPU load, an high value could lead to problems if the file in the meantime is changed or deleted: it is necessary to find an adequate value for the specific case.

But the problem with the WMI Event Watcher is related to the event : when it fires our WMI Event Watcher “dies” and the dtsx is terminated.

A first approach could be to insert the Watcher inside a loop:


The loop espression could be


And our dtsx is running forever.

Now we can construct the Wql query using first Script Task linked via a Precedence Constraint to the WMI Event Watcher:


In the Script Task the code is (Vb2010):

Public Sub Main()
    Dim strPathWatch As String
    strPathWatch = "g:\WMIFileWatcher"
    strPathWatch = Replace(strPathWatch, "\", "\\\\")
    Dim strCode As String = "SELECT * FROM __InstanceOperationEvent WITHIN 10 WHERE " _
    & "Targetinstance ISA 'CIM_DirectoryContainsFile' and " _
        & "TargetInstance.GroupComponent= " _
            & "'Win32_Directory.Name=""" & strPathWatch & """'"
    Dts.Variables("User::varScriptSrc").Value = strCode
    Dts.TaskResult = ScriptResults.Success
End Sub

And using the variable varScriptSrc as source of the WMI Event Watcher:


The variable must be , obvious, specified as a read write variable:


But our problem is that we need to stop the procedure at a certain hour, so we change the EvalEspression in

GETDATE() < DATEADD(“Mi”,30,DATEADD(“Hh”,18,(DT_DBDATE)GETDATE()))

What is this , that seems TSQL but is not TSQL? Is an example of the Wql syntax , a SQL dialect that supports the WMI infrastructure.

If this case the condition is true when the system date is over 18:30

How to construct these queries ? In MSDN there is an reference , but we need something to test the Wql queries , we can’t debug the EvalExpression…

So is very useful a software as the Expression Tester Tool , that can be downloaded from Codeplex.

Here we can create our Wql with the treeview wizard or write directly a Wql string and try what happens :


Now that we have a tested Wql query we can use it in our Loop:


But there is a problem : until does not happens that a file is copied in our guarded directory, the dtsx is running forever despite the EvalExpression, which is tested only at the first launch !

We can be resolve the problem using the Timeout property :


Note that is specified an timeout (20 seconds) that must be >= of the WITHIN time in the Wql query (in our sample 10 seconds).

What happens at this point? If a file is dropped at the end of the polling (WITHIN 10) interval the dtsx flows outside the WMI Event Watcher , we can do something with the event via a Script Task, for example, after the WMI Event Watcher:


otherwise is nothing happens (no file dropped in that directory) after 20 seconds the code goes in every case to the Script Task as when a file is dropped .

Here there is the problem : apparently with this approach there is nothing that we can test in order to understand if the Script Task is running because was happened an Timeout or a file was dropped in the surveilled directory.

And beware of the Expressions, you can create something as:


Using some test variable as the varResult of the example , the sample expression is compiled without errors but is absolutely wrong this thing (should be provided an warning…)! the expression serves to calculate the value of the property instead of a fixed value chosen drom the dropdown : in this case the AfterTimeout dropdown


 
 

is no more working (is not possible to change the value) because its value is calculated from the Expression; its result should be one of the enumerated values so the sample expression is a no sense (even if it is correct, from the syntax point of view).

In practice the correct approach is to use the Event Handlers, not a Script Task or other after the WMI Event Watcher:


where we can correctly manage if the WMI Watcher is exited because a timeout or because (with the WMIEventWatcherEventOccurred event) a file was dropped.

In any case (timeout or file dropped) our loop condition is tested at least every 20 seconds (the defined timeout) so the Loop condition is evaluate ; at least 20 seconds after 18:30 our final package:


is terminated; the same approach can be taken for the CPU surveillance.

For SSIS expressions some useful references here and from Microsoft, which has a dedicated blog.

Categories: SQL Server, SSIS, Vs2010