Integration Services is a platform for building high performance data integration and workflow solutions, including extraction, transformation, and loading (ETL) operations for data warehousing.
Integration Services includes graphical tools and wizards for building and debugging packages; tasks for performing workflow functions such as FTP operations, SQL statement execution, and e-mail messaging; data sources and destinations for extracting and loading data; transformations for cleaning, aggregating, merging, and copying data; a management service, the Integration Services service, for administering Integration Services packages; and application programming interfaces (APIs) for programming the Integration Services object model.
Merging Data from Heterogeneous Data Stores
Integration Services can connect to a wide variety of data sources, including multiple sources in a single package. A package can connect to relational databases by using .NET and OLE DB providers, and to many legacy databases by using ODBC drivers. It can also connect to flat files, Excel files, and Analysis Services projects.
Integration Services includes source components that perform the work of extracting data from flat files, Excel spreadsheets, XML documents, and tables and views in relational databases from the data source to which the package connects.
Next, the data is typically transformed by using the transformations that Integration Services includes. After the data is transformed to compatible formats, it can be merged physically into one dataset.
After the data is merged successfully and transformations are applied to data, the data is usually loaded into one or more destinations. Integration Services includes destination for loading data into flat files, raw files, and relational databases. The data can also be loaded into an in-memory recordset and accessed by other package elements.
Populating Data Warehouses and Data Marts
The data in data warehouses and data marts is usually updated frequently, and the data loads are typically very large.
Integration Services includes a task that bulk loads data directly from a flat file into SQL Server tables and views, and a destination component that bulk loads data into a SQL Server database as the last step in a data transformation process.
An SSIS package can be configured to be restartable. This means you can rerun the package from a predetermined checkpoint, either a task or container in the package. The ability to restart a package can save a lot of time, especially if the package processes data from a large number of sources.
You can use SSIS packages to load the dimension and fact tables in the database. If the source data for a dimension table is stored in multiple data sources, the package can merge the data into one dataset and load the dimension table in a single process, instead of using a separate process for each data source.
Updating data in data warehouses and data marts can be complex, because both types of data stores typically include slowly changing dimensions that can be difficult to manage through a data transformation process. The Slowly Changing Dimension Wizard automates support for slowly changing dimensions by dynamically creating the SQL statements that insert and update records, update related records, and add new columns to tables.
Additionally, tasks and transformations in Integration Services packages can process Analysis Services cubes and dimensions. When the package updates tables in the database that a cube is built on, you can use Integration Services tasks and transformations to automatically process the cube and to process dimensions as well. Processing the cubes and dimensions automatically helps keep the data current for users in both environments; users who access information in the cubes and dimensions, and users who access data in a relational database.
Integration Services can also compute functions before the data is loaded into its destination. If your data warehouses and data marts store aggregated information, the SSIS package can compute functions such as SUM, AVERAGE, and COUNT. An SSIS transformation can also pivot relational data and transform it into a less-normalized format that is more compatible with the table structure in the data warehouse.
Building Business Intelligence into a Data Transformation Process
A data transformation process requires built-in logic to respond dynamically to the data it accesses and processes.
The data may need to be summarized, converted, and distributed based on data values. The process may even need to reject data, based on an assessment of column values.
To address this requirement, the logic in the SSIS package may need to perform the following types of tasks:
- Merging data from multiple data sources
- Evaluating data and applying data conversions
- Splitting a dataset into multiple datasets based on data values
- Applying different aggregations to different subsets of a dataset
- Loading subsets of the data into different or multiple destinations