access_time 6 years ago visibility4339 comment 0 languageEnglish
more_vert

Serial: An Introduction to SQL Server Features

Case Scenario

For this ETL project, the requirements are listed below:

  • Sales data will be pushed to specified shared folder regularly.
  • Data is stored in CSV files with columns: Sale Number, Product Name, Product Color, Sale Amount, Sale Area and date.
  • Minions of sales records are stored in each file.
  • The flat files need to be archived after processing.
  • The process can be scheduled to run automatically and periodically.

Preparation

In order to create dummy sales files, we can use C# to create one console application to generate them randomly.

    class Program
    {
        static int SaleNo = 10000000;
        static void Main(string[] args)
        {
            var folder = @"E:\Temp\Archive";

            var fileCount = 10;
            var saleCountInEachFile = 100000;

            for (int i = 0; i < fileCount; i++)
            {
                var fileName = Path.Combine(folder, string.Format("Sales_{0}.csv", i));
                //if (!File.Exists(fileName))
                //{
                //    File.CreateText(fileName);
                //}
                using (var writer = new StreamWriter(fileName, false, Encoding.UTF8))
                {
                    writer.WriteLine(string.Format("{0},{1},{2},{3},{4},{5}", "SaleNo", "ProductName", "ProductColor", "Amount", "AreaName", "SaleDate"));
                    for (int j = 0; j < saleCountInEachFile; j++)
                    {
                        writer.WriteLine(string.Format("{0},{1},{2},{3},{4},{5}", GetSaleNo(), GetRandomProduct(), GetRandomColor(), GetRandomPrice(), GetRandomArea(), GetRandomDay()));
                    }
                }
                Console.WriteLine("{0} is generated.", fileName);
            }
            Console.ReadKey();
        }


        static int GetSaleNo()
        {
            return SaleNo++;
        }


        static string GetRandomColor()
        {
            var colors = new string[] { "R", "Y" };
            Random gen = new Random();
            return colors[gen.Next(1000) % colors.Length];
        }

        static string GetRandomArea()
        {
            var areas = new string[] { "Earth", "Mars", "Saturn" };
            Random gen = new Random();
            return areas[gen.Next(1000) % areas.Length];
        }

        static int GetRandomPrice()
        {
            var minPrice = 888;
            var maxPrice = 5999;

            return new Random().Next(minPrice, maxPrice);
        }

        static string GetRandomProduct()
        {
            var products = new string[] { "myPhone", "yourPhone", "ourPhone" };
            Random gen = new Random();
            return products[gen.Next(1000) % products.Length];
        }


        static DateTime GetRandomDay()
        {
            DateTime start = new DateTime(2011, 1, 1);
            Random gen = new Random();
            int range = (DateTime.Today - start).Days;
            return start.AddDays(gen.Next(range));
        }
    }

Run the program and 10 CSV files will be created.

Now move these files into parent folder. In next step, we are going to create one SSIS package to load these files into table dbo.Sales and move them to Archive folder.

Create SSIS Package to Load Files

SSIS (SQL Server Integration Service) provides components/tasks to implement complex ETL projects. The following steps will illustrate how to use it build package rapidly.

1) Create SSIS Project ‘ETL-Sample’

2) Add package ‘Package-HiSqlServer-LoadSales.dtsx’

3) Add two variables

image_thumb4

4) Create one connection manager ‘RAYMOND-PC\MSSQL2012.HiSqlServer’ connecting to the database using SQL account.

image_thumb6

5) Create another flat file connection manager ‘Flat File Connection Manager - Sales File’ connecting to one of the sales file we created. Use expression to bind the connection to variable FileName. Settings are as following:

image_thumb8

image_thumb10

6) Create Flat File Connection Manager ‘Sales_0.csv’.

image_thumb31

Set the file path using expression:

image_thumb30

7) Create Foreach Loop Container ‘Foreach Loop Container - Sales File’ to process each file one bye one. The settings are listed below:

image_thumb12

image_thumb14

(* Set the variable FileName with value from the file path of the current loop)

8) Create tasks in the container to import data, set variable values and move file to archive folder once processed.

image_thumb17

The data flow task is going to load data from the file source (Foreach Loop Container - Sales File) and then data is typed and transformed to join dbo.Areas and dbo.Products in database ‘HiSqlServer-Sample’ through connection ‘RAYMOND-PC\MSSQL2012.HiSqlServer’. Finally, data is loaded into the OLE DB destination (table dbo.Sales).

image_thumb20

The script task changes the variable value for the following step.

image_thumb23

The scripts are:

public void Main()
        {
            // TODO: Add your code here
            var filepath = Dts.Variables["User::FileName"].Value.ToString();
            var dir = Path.GetDirectoryName(filepath);
            var newPath = Path.Combine(dir, "Archive");
            Dts.Variables["User::ArchiveFileName"].Value = newPath;

            Dts.TaskResult = (int)ScriptResults.Success;
        }

The final task is a File System Task, which moves the processed file in the loop to the archive folder. Use ‘Sales_0.csv’ as destination connection. The file path of this connection is already changed by the previous script task.

image_thumb25

9) Execute the package and all the data will be loaded into the database.

image_thumb33

(* Files are being loaded one by one.)

Using the following query, you can find 1,000,000 records are loaded into the database.

SELECT COUNT(*) FROM dbo.Sales

Schedule

Use SQL Server Agent Job, you can set up schedules for running the above package. Please reference <http://technet.microsoft.com/en-us/library/ms139805(v=SQL.90).aspx>.

info Last modified by Raymond at 6 years ago * This page is subject to Site terms.

More from Kontext

visibility 14052
thumb_up 0
access_time 6 years ago

SQL Server Compact 4.0 (CE 4.0) is a free SQL Server embedded database ideal for building standalone and occasionally connected applications for mobile devices, desktops, Web clients and others. In one of my projects, I used it as the database for logging errors, which assumes the errors will onl...

open_in_new View open_in_new SQL Server

visibility 688
thumb_up 0
access_time 6 years ago

SQL Server provides a batch of great features to build robust, high-performance and scalable data solutions. ...

open_in_new View open_in_new SQL Server

visibility 1165
thumb_up 0
access_time 6 years ago

Serial: An Introduction to SQL Server Features Case Scenario In the previous articles of this serial, I’ve introduced how to use SQL Server as database for online transaction proces...

open_in_new View open_in_new SQL Server

Create Business Reports using SSRS

local_offer plot

visibility 877
thumb_up 0
access_time 6 years ago

Serial: An Introduction to SQL Server Features Case Scenario After the data is loaded into the database, reports can be built using SSRS (SQL Server Reporting Service). The g...

open_in_new View open_in_new SQL Server

info About author

Dark theme mode

Dark theme mode is available on Kontext.

Learn more arrow_forward

Kontext Column

Created for everyone to publish data, programming and cloud related articles. Follow three steps to create your columns.


Learn more arrow_forward