At work, I often find myself exporting data in .csv format for our clients. I am talking about thousands of data, to the point that I never really noticed this issue.
I am talking about data that contain special characters like a comma for example.
So once in a while, I’d encounter names with commas in them. Names with suffixes like “Smith, Jr”, “Donald, Jr”, or “Elizabeth, III” and so on.
When exporting to a comma separated value .csv file using SQL Server Integration Services (SSIS), the flat file destination connection manager assumes that this comma is the end of the data and therefore “spills” the rest of the data into the next column. The problem with this is that it disrupts the arrangement of the output data thereby causing inconsistent and even incorrect output.
In the following example, I will be showing how to correct this issue using a text qualifier:
Before I continue, I will create my sample data below:
CREATE TABLE dbo.Employee
INSERT INTO dbo.Employee (EmpID, FirstName, LastName, MiddleName, BirthDate,
HireDate, Salary, EndDate)
(1, 'John', 'Brown, Jr.','K', '01-01-1960', '10-01-1985', 85000.00, GETDATE()),
(2, 'Pete', 'Dawn, III', NULL,'12-03-1965', '12-01-2000', 60000.00, NULL),
(3, 'Amanda', 'Lopez', NULL,'06-01-1980', '12-01-2005', 45000.00, NULL),
(4, 'Adam', 'Smith, Sr.', NULL, '02-26-1971', '04-23-2001', 70000.00, GETDATE()),
(5, 'Taylor', 'Ramsey, Jr', NULL, '03-01-1984', '11-11-2004', 60000.00, NULL)
SELECT * FROM dbo.Employee
My result is shown below:
Notice how the last name has some suffix in it.
Next I will show the typical approach to exporting data from an OLEDB source into a .csv file in a serious of steps
Step One: Create a data flow task
Step Two: Create your Source and destination tasks
Step Three: Connect the data source. In my case, I am connecting to the employee table I created earlier which contains the last names including commas.
Step Four: Configure the flat file connection manager.
Notice that the highlighter text qualifier section has a default value <none>
Step Five: Complete and run your package
Step Six: Review the generated .csv file
You will notice that everything looks fine until column D (MiddleName). The suffix has “spilled” over to that column and the middle name has over shifted over to the BirthDate. You will also notice that a new column has been added (column I) which has no header and was not part of the original employee table.
This is a problem and it is quite easy to solve.
The key is to utilize the text qualifier option in the flat file connection manager as shown in a couple of steps below.
Step Seven: Repeat the same steps as above (Step One to Step Three).
Step Eight: This time around, when you want to configure the flat file connection manager, replace the <none> in the text qualifier option with a quote (“) as highlighted in the image below.
Step Nine: Complete and run your package
Step Ten: Review the generated .csv file
You will notice that the data is returned as it was in the source table. The last name is displayed correctly without and spilling over.
Although this is a simple process, it is often overlooked by many developers. This is sometimes due to the volume of data we work with and sometimes because we often do not encounter names with suffix or maybe because we just never noticed.
I will be very happy to hear your feedback on this. Let me know your thoughts by adding them in the comment section.
Thanks for reading and I hope it helps someone out there.