Loading Data from Microsoft SQL Server to Snowflake

Before going into the finer details of loading data from Microsoft SQL Server to Snowflake, it is necessary to get an overview of both the data storage and retrieval platforms. Only then can the process and steps of transferring data from one to the other be taken up and analyzed.

Choosing the right tools to load data from SQL Server to Snowflake is critical. The selected tool should be able to load data automatically and with minimum intervention from the DBAs. There are in-built systems in Microsoft SQL Server such as the Microsoft SQL Server Management Studio that quickens the loading even for large and voluminous databases.

Microsoft SQL Server

SQL Server is software developed by Microsoft and implemented from the specification of the Relational Database Management System (RDBMS). It is both GUI and command based software and is platform dependent. The software supports SQL (SEQUEL) language which is an IBM product and case insensitive.

The software has a wide range of usages. It is used to create and maintain databases, analyze data through SQL Server Analysis Services (SSAS), and generate reports through SQL Server Reporting Services (SSRS). ETL operations are done through SQL Server Integration Services (SSIS).

One of the main benefits of Microsoft SQL Server is that applications are supported on a single machine across the web or a local area network on a single machine.

Snowflake

Snowflake is a recently introduced cloud-based data warehousing solution and is available as a Software-as-a-Service (SaaS) product. It has addressed many issues linked to traditional database solutions and is one reason why organizations prefer to load data from SQL Server to Snowflake now.

There are several unique benefits of Snowflake.

  • Compute and storage facilities are available separately. Users can scale up or down in data usage in either of them, paying only for the quantum of resources used. 
  • Snowflake allows the loading of both structured and unstructured data and supports Avro, JSON, XML, and Parquet data.
  • Several workgroups can work simultaneously on multiple workloads. While doing so, users do not find any contention roadblocks or lowered performance.
  • Snowflake enables several functions, from the encoding of columns to auto-scaling up or down in computing. Data is auto-clustered and indexes do not have to be defined. However, users have to use cluster keys to co-locate table data for very large tables.
  • Snowflake architecture is in sync with several cloud vendors with new ones being continually added. This is a boon for users as the same tools may be used to analyze and process data of different cloud vendors.

Now that both Microsoft SQL Server and Snowflake have been examined, the next point is how to load data from SQL Server to Snowflake.

Loading data from SQL Server to Snowflake

There are several steps in this process.

  • The first step is to mine data from the SQL Server. Usually, it is done through queries for extraction. Select statements are used to sort, filter, and limit the data that has to be retrieved. Microsoft SQL Server Management Studio tool is used when bulk data or entire databases in formats like text, CSV, or SQL queries have to be exported.
  • The next step is to process and prepare this extracted data for loading to Snowflake. The architecture of this data warehousing solution supports specific data types. Before loading, it has to be verified whether the structure of the extracted data matches that supported by Snowflake so that the loading goes off without a hitch. However, this verification is not necessary when loading JSON or XML data to Snowflake.
  • Even this processed and prepared data cannot be loaded directly to Snowflake. It has to be kept in a temporary location called the staging area. There are two components here – internal and external stages.   

An internal stage is created entirely with SQL statements. It offers users a great degree of flexibility while loading data. A file format and other options are assigned to this stage.

For external staging area, Snowflake supports Amazon S3 and Microsoft Azure.

  • The final stage to load data from SQL Server to Snowflake is done through the Data Loading Overview. Smaller databases can be loaded with the data loading wizard of Snowflake. For very large databases, the PUT command is used to stage files and COPY INTO table command for loading processed data.            

This is not the end though. Incremental data that is changes and modifications have to be continually updated in Snowflake to maintain data freshness.