This site uses cookies to deliver our services. By using this site, you acknowledge that you have read and understand our Cookie and Privacy policy. Your use of Kontext website is subject to this policy. Allow Cookies and Dismiss

Create ETL Project for Collecting Sales Data using SSIS

1916 views 0 comments last modified about 5 years ago Raymond

Serial: An Introduction to SQL Server Features

Case Scenario

For this ETL project, the requirements are listed below:

  • Sales data will be pushed to specified shared folder regularly.
  • Data is stored in CSV files with columns: Sale Number, Product Name, Product Color, Sale Amount, Sale Area and date.
  • Minions of sales records are stored in each file.
  • The flat files need to be archived after processing.
  • The process can be scheduled to run automatically and periodically.


In order to create dummy sales files, we can use C# to create one console application to generate them randomly.

    class Program
        static int SaleNo = 10000000;
        static void Main(string[] args)
            var folder = @"E:\Temp\Archive";

            var fileCount = 10;
            var saleCountInEachFile = 100000;

            for (int i = 0; i < fileCount; i++)
                var fileName = Path.Combine(folder, string.Format("Sales_{0}.csv", i));
                //if (!File.Exists(fileName))
                //    File.CreateText(fileName);
                using (var writer = new StreamWriter(fileName, false, Encoding.UTF8))
                    writer.WriteLine(string.Format("{0},{1},{2},{3},{4},{5}", "SaleNo", "ProductName", "ProductColor", "Amount", "AreaName", "SaleDate"));
                    for (int j = 0; j < saleCountInEachFile; j++)
                        writer.WriteLine(string.Format("{0},{1},{2},{3},{4},{5}", GetSaleNo(), GetRandomProduct(), GetRandomColor(), GetRandomPrice(), GetRandomArea(), GetRandomDay()));
                Console.WriteLine("{0} is generated.", fileName);

        static int GetSaleNo()
            return SaleNo++;

        static string GetRandomColor()
            var colors = new string[] { "R", "Y" };
            Random gen = new Random();
            return colors[gen.Next(1000) % colors.Length];

        static string GetRandomArea()
            var areas = new string[] { "Earth", "Mars", "Saturn" };
            Random gen = new Random();
            return areas[gen.Next(1000) % areas.Length];

        static int GetRandomPrice()
            var minPrice = 888;
            var maxPrice = 5999;

            return new Random().Next(minPrice, maxPrice);

        static string GetRandomProduct()
            var products = new string[] { "myPhone", "yourPhone", "ourPhone" };
            Random gen = new Random();
            return products[gen.Next(1000) % products.Length];

        static DateTime GetRandomDay()
            DateTime start = new DateTime(2011, 1, 1);
            Random gen = new Random();
            int range = (DateTime.Today - start).Days;
            return start.AddDays(gen.Next(range));

Run the program and 10 CSV files will be created.

Now move these files into parent folder. In next step, we are going to create one SSIS package to load these files into table dbo.Sales and move them to Archive folder.

Create SSIS Package to Load Files

SSIS (SQL Server Integration Service) provides components/tasks to implement complex ETL projects. The following steps will illustrate how to use it build package rapidly.

1) Create SSIS Project ‘ETL-Sample’

2) Add package ‘Package-HiSqlServer-LoadSales.dtsx’

3) Add two variables


4) Create one connection manager ‘RAYMOND-PC\MSSQL2012.HiSqlServer’ connecting to the database using SQL account.


5) Create another flat file connection manager ‘Flat File Connection Manager - Sales File’ connecting to one of the sales file we created. Use expression to bind the connection to variable FileName. Settings are as following:



6) Create Flat File Connection Manager ‘Sales_0.csv’.


Set the file path using expression:


7) Create Foreach Loop Container ‘Foreach Loop Container - Sales File’ to process each file one bye one. The settings are listed below:



(* Set the variable FileName with value from the file path of the current loop)

8) Create tasks in the container to import data, set variable values and move file to archive folder once processed.


The data flow task is going to load data from the file source (Foreach Loop Container - Sales File) and then data is typed and transformed to join dbo.Areas and dbo.Products in database ‘HiSqlServer-Sample’ through connection ‘RAYMOND-PC\MSSQL2012.HiSqlServer’. Finally, data is loaded into the OLE DB destination (table dbo.Sales).


The script task changes the variable value for the following step.


The scripts are:

public void Main()
            // TODO: Add your code here
            var filepath = Dts.Variables["User::FileName"].Value.ToString();
            var dir = Path.GetDirectoryName(filepath);
            var newPath = Path.Combine(dir, "Archive");
            Dts.Variables["User::ArchiveFileName"].Value = newPath;

            Dts.TaskResult = (int)ScriptResults.Success;

The final task is a File System Task, which moves the processed file in the loop to the archive folder. Use ‘Sales_0.csv’ as destination connection. The file path of this connection is already changed by the previous script task.


9) Execute the package and all the data will be loaded into the database.


(* Files are being loaded one by one.)

Using the following query, you can find 1,000,000 records are loaded into the database.



Use SQL Server Agent Job, you can set up schedules for running the above package. Please reference <>.

Related pages

Use Hadoop File System Task in SSIS to Write File into HDFS

576 views   0 comments last modified about 8 months ago

Context SQL Server Integration Service ( SSIS ) has tasks to perform operations against Hadoop, for example: Hadoop File System Task Hadoop Hive Task Hadoop Pig Task In Data Flow Task, you can also use: Hadoop HDFS Source ...

View detail

Create ETL Project with Teradata through SSIS

8769 views   2 comments last modified about 4 years ago

Infosphere DataStage is adopted as ETL (Extract, Transform, Load) tool in many Teradata based data warehousing projects. With the Teradata ODBC and .NET data providers, you can also use the BI tools from Microsoft, i.e. SSIS. In my previous post, I demonstrated how to install Teradata Tool...

View detail

Querying Teradata and SQL Server - Tutorial 1: The SELECT Statement

30172 views   7 comments last modified about 4 years ago

SELECT is one of the most commonly used statements. In this tutorial, I will cover the following items: Two of the principal query clauses—FROM and SELECT Data Types Built-in functions CASE expressions and variations like ISNULL and COALESCE. * The functio...

View detail

Querying Teradata and SQL Server - Tutorial 2 Filtering and Sorting

3109 views   2 comments last modified about 4 years ago

Tutorial –1 For Teradata developers, if you have no SQL Server installed, please go to the following link to download the SQL Server 2014 Expression Edition. ...

View detail

Querying Teradata and SQL Server - Tutorial 0: Preparation

2821 views   0 comments last modified about 4 years ago

In this serial, I will compare Teradata SQL with T-SQL with samples provided. This is mainly prepared for SQL Server DBAs, Developers and other users to help them quickly master the common used SQLs in Teradata platform. Similar to T-SQL, Teradata SQL implements most part of the ANSI SQL with ext...

View detail

[SQL Server 2012] 使用自带功能简化SQL

782 views   0 comments last modified about 6 years ago

在低版本SQL Server中,对于一些常用的查询功能,我们可能需要运用复杂的逻辑,如临时表,游标等等来实现,这样可能会导致代码难维护,同时产生性能问题。本文将简述用自带的功能去简单的实现这些操作。 注意:一些功能仅在SQL Server 2012以及以上版本支持。 1.查询分页 (OFFSET, FETCH) 在很多Web应用程序中,我们需要实现分页功能,在过去,我们往往通过top语句实现或者通过ROW_NUMBER;在2012中加入了FETECH以及OFFSET标准SQL中支持的功能。下面的例子是查询第11到20行数据: ...

View detail

Add comment

Please login first to add comments.  Log in New user?  Register

Comments (0)

No comments yet.