What is data integration?
Data integration is a set of practices, tools, and architectural procedures that allow companies to consume, combine, and leverage all types of data. Along with consolidating data from disparate systems, the process ensures data is clean and free of errors to optimize its usefulness to the business.
Integrated data is especially helpful for organizations with a diverse and distributed landscape, with a range of data sources and assets generating information. In these instances, data is often siloed and disconnected from other business data, leaving the organization without a unified view of its business.
Data integration allows the business to achieve its true potential. Important decisions are based on accurate information, and new technology that relies on clean data can be implemented and optimized, helping the company to innovate and prosper.
Data integration history
Combining different data sources has been a problem since business systems started collecting data. It wasn’t until the early 1980s that computer scientists began designing systems that supported the interoperability of heterogeneous or different databases.
One of the first data integration systems was launched by the University of Minnesota in 1991 – its objective was to make thousands of population databases interoperable. The system used a data warehousing approach that extracted, transformed, and loaded data from disparate sources into a view schema to make the data compatible.
In the intervening years, different challenges arose, including issues with data quality, data governance, data modeling, and, importantly, with data isolation or siloed data.
Integrated data became a business imperative in the early 2010s with the advent of the Internet of Things (IoT). Suddenly a wide range of devices, applications, and platforms were generating enormous amounts of data – companies were drowning in it. Big Data became a thing, and businesses needed to find a way to harness the power of all the information.
Today companies of all sizes and industries use data integration to extract value from data that is stored across applications and platforms within the enterprise.
Data integration use cases
If a company generates data, it can be integrated and used to build real-time insights that benefit the business. An organization that spans diverse geographies can consolidate views across its entire operation to understand what’s working and what’s not. A singular view of the business makes it easier to understand cause and effect, allowing organizations to course-correct in real time and minimize risk.
Data integration allows companies to:
- Optimize analytics: Access, queue, or extract data from operational systems – commonly known as data warehousing – then transform and deliver it to the business in the form of trusted analytics.
- Drive consistency between operational applications: Ensure database-level consistency across applications (intra- and interenterprise), on a bi- and unidirectional basis.
- Share data outside your organization: Provide trusted data to external parties such as customers, suppliers, and partners.
- Orchestrate data services: Deploy all runtime data integration functionality as data services to ensure speed and accuracy.
- Support data migration and consolidation: Address data movement and transformation needs relative to data migration and consolidation, for example, when replacing legacy applications or migrating to new environments.
Benefits of integrated data
Data integration is a critical element to the overall data management strategy of any organization. Data integration helps deliver the right information and bring the organization together – coordinating all activities and decisions in support of the enterprise’s purpose, which is to effectively and efficiently deliver quality products and services to customers.
After data is gathered from across the enterprise, it is cleansed and validated to ensure it is free of errors and inconsistencies before it is integrated into a single data set or orchestrated across numerous data sets – which is often referred to as a data fabric methodology.
A comprehensive, accurate source of integrated data helps business support the innovative processes and technologies it needs to succeed. For example, artificial intelligence, machine learning, and Industry 4.0 initiatives would not be sustainable without access to large stores of integrated data.
Without data integration, data remains siloed within disparate applications and platforms. This hinders the operational and strategic capabilities of the organization. For example, important business decisions would be based on inaccurate analytics due to limited data sets.
See how these organizations are reaping the benefits of data integration:
- Federal Mogul: A leading producer of original equipment and spare parts in the automotive industry, Federal Mogul manufactures the technology that lies at the heart of prestigious vehicle brands such as Mercedes-Benz, Bentley, Caterpillar. Learn how they established one source of data and enabled rapid decision-making with access to real-time information.
- The Costain Group: A partner to government agencies in the UK, the Costain Group consolidates and accesses siloed data to make transportation projects more efficient while lowering emissions and saving public funds. The group relies on data integration to access more of its data, providing faster data-driven decisions to maximize outcomes.
How does data integration work?
The most commonly used data integration models rely on an extract, transform, load (ETL) process.
- Extract: Data is moved from a source system to a temporary staging data repository where it is cleaned and the quality is assured.
- Transform: Data is structured and converted to match the target source.
- Load: The structured data is loaded into a data warehouse or some other storage entity.
After the information is integrated, data analysis is carried out, providing business users with information they need to make informed decisions.
Types of data integration
There are different types of data integration, often depending on the source and kind of data.
- Bulk/batch data movement: This is the most common style, involving data extraction, data transformation, and data load.
- Data replication: Data is copied from one database to another, using only changed data, which is replicated into a secondary database.
- Data virtualization: This is a single view of all data in a database using a virtual abstraction layer, providing real-time access to data regardless of location, source system, or type.
- Stream data integration: This is used for data created in a constant flow or stream where transformation must occur on the fly.
- Message-oriented data movement: Chunks of data are grouped into messages that are read by applications, with data exchange happening in real time.
The challenge is choosing the right data integration style for your unique landscape and business needs. Most organizations need more than one. Understanding how to bring these data integration tools together into a coherent whole is critical.
Data integration the right way
Discover how to choose the right data integration approach for your business.
Data integration trends and technologies
Transforming and harnessing the value of data is the key to businesses being resilient and agile in today’s environment. It’s also critical to digital transformation and adopting new technologies. Emerging trends are taking data integration to the next level and delivering that all important value.
As the business landscape becomes more distributed, data sources proliferate, and information types diversify, companies are turning to data orchestration to help organize large volumes of data.
The process applies a more comprehensive approach to data integration and the traditional ETL model, integrating, enriching, and transforming all types of data, such as unstructured and streaming, from across on-premise, cloud, and external sources. Data orchestration produces smarter insights while lowering the complexity of data integration and associated costs.
In recent years, standard data integration methods have failed due to new and expanding challenges such as complex data sources, connectivity limitations, and other factors. Data fabric provides a more agile and resilient approach to data integration, minimizing complexity by automating processes, workflows, and pipelines.
Hybrid data integration
Today, many enterprises support cloud and on-premise systems, with data from these systems distributed across a range of applications and locations. Hybrid data integration allows users to access and share data via any application, regardless of the location of the data.
In this fast-paced, digital economy, business agility is a strategic priority. A holistic approach to integration is essential to achieving this result. By combining the separate data and application integration disciplines into a comprehensive effort, all flavors of integration are supported across a hybrid landscape.
Explore SAP Data Intelligence solutions
Transform data into vital business insights and drive innovation.
More in this series
Data integration FAQs
Data intelligence is the value an organization gets from data integration. During the integration process, data is consumed, combined, and provisioned into data sets to satisfy the requirements of all business processes and applications that rely on access to data. Innovative and new technologies such as artificial intelligence and machine learning tools can analyze and transform these massive data sets into intelligent data insights, which are used to inform strategic business decisions.
Data orchestration extends beyond data integration, combining data discovery, preparation, integration, processing, and the connection of data across multiple and complex landscapes. Data integration is used for data in one place, while data orchestration processes and combines data in a flexible manner to enable new and/or improved business processes.
Big Data, by its very name, is composed of massive sets of unstructured data spread across disparate sources within and outside of the enterprise. Traditional databases and integration mechanisms are not equal to handling these volumes. Instead, in-memory databases, software, and storage solutions built for Big Data are necessary to acquire, store, and analyze the data. These powerful components support the velocity needed to ensure Big Data insights are actionable and valuable.
SAP Insights Newsletter
Ideas you won’t find anywhere else
Sign up for a dose of business intelligence delivered straight to your inbox.