What is data mesh?
Data mesh represents a new way of looking at information. It is born from the growing concept that data is actually itself a product, a tool, a means to an end – not simply something businesses gather and analyze later in a backward-looking attempt to understand things that have already happened.
Data mesh definition
Data mesh is an approach to data management that uses a distributed architectural framework. In other words: it spreads ownership and responsibility for specific data sets across the business, to those users who have the specialist expertise to understand what that data means and how to make the best use of it.
Data mesh architecture connects and draws data from various sources like data lakes and warehouses and distributes the relevant data sets to the appropriate human experts and domain teams across the business. Essentially, a voluminous jumble of data in a central data lake is sorted and distributed into manageable chunks to those best suited to understand and leverage it.
Data mesh principles for data lake challenges
When we talk about data lakes and data mesh, we’re essentially talking about Big Data. What makes data “Big” is not simply its huge volume. Among other criteria, Big Data is also defined by being complex, variable, rapidly generated, and unstructured.
A linear database is like a spreadsheet: it has columns and rows and immutable categories into which all the data components must fit. Some of the data generated from machinery, sensors, and industrial sources is structured and fits neatly into a linear database. No matter how much data volume you have to deal with, if it’s 100% structured it doesn’t meet Big Data criteria and can be housed in a linear database, making it relatively straightforward to filter and extract.
But increasingly, modern Big Data is unstructured and consists of visual components, open-ended text, and even video and rich media. This crucial data can comprise thousands of terabytes of information for many companies, and it simply can’t be stored in a standard linear database.
Enter the data lake. As Big Data volumes began to increase, data lakes were developed as a place in which complex data could be stored in and accessed from a central repository in its raw format. While data lakes represent an excellent solution to the Big Data problem, they nonetheless have weaknesses. Data lakes lack certain analytic features, making them dependent on other services for retrieval, indexing, transformation, querying, and analytics functionality. And from a business management point of view, data lakes also present three additional challenges:
1. Complex ownership
Ownership in data lakes is complex to define when too many players generate and access data. In the absence of clearly defined roles and responsibilities, the same set of data can be managed differently by different parties, creating inconsistencies that make it difficult to use. Likewise, other data ends up being neglected when it is not actively managed by those who will ultimately be using it. Data mesh architecture ensures data governance is clearly distributed by domain so that each team or domain expert governs the data they produce and use. To back this up, data meshes also use a federated governance structure to also allow for central control of data modeling, security policies, and compliance.
2. Data quality
Data lakes can fail to ensure data quality when the volume of data becomes too large or when central data managers themselves do not understand it. Data mesh architecture fundamentally treats data as a valuable product, which puts the quality and completeness of data at the forefront of data management. Presumably, each team knows the most important criteria and issues that they wish to extrapolate from the data they are collecting. By integrating these criteria and priorities into the architecture, data mesh can help to ensure the continuous and prioritized delivery of clean, fresh, and complete data, even when larger datasets are involved. And of course, when machine learning algorithms are applied, these criteria and resultant data sets become increasingly accurate and useful over time.
Data lakes can create bottlenecks because of their centralized architecture and traditionally difficult data retrieval processes and protocols. This typically means that the control of a large amount of consolidated data comes down to a single IT or data management team. And, as volumes of data (and demand for its retrieval) increase, these IT teams get over-taxed.
Furthermore, the data must be reviewed and structured properly to ensure compliance and adherence to data governance principles. When facing undue pressure, there can be a tendency to rush through these compliance stages which generates potential risk and loss to the company. Data mesh architecture on the other hand gives access and control to authorized specialized users who have a greater vested interest in the data – all while employing stringent, baked-in security protocols.
Data mesh principles arose in direct response to these growing data lake challenges. Decentralized and democratized data management architecture has made businesses smarter, more agile, and more accurate by ensuring the right data is immediately available to the right people, wherever and whenever they need it. Data mesh makes data-as-a-product an actual reality, reducing barriers and prioritizing the value of information so that teams can get faster, unobstructed access to essential data.
Data mesh architecture explained
We've discussed how data mesh is a decentralized form of data architecture that treats data as an essential business management tool. And importantly, independent teams are responsible for handling the data within their domains of work and expertise, while still ensuring compliance with centrally-determined data management practices. This change in mindset is at the core of data mesh.
To better understand how this is accomplished, we can view the data mesh architecture as having three main components:
1. Data sources represent the repository (like a data lake) into which the primary raw data is being fed. Whether it’s collected from cloud IIoT networks, customer feedback forms, or scraped web data, this is the raw input data that will be referenced and processed as needed by users across the network. While a data lake approach would funnel all this data into one central place, the data mesh methodology instead distributes the responsibility for intake, storage, processing, and extraction of this raw data within a series of responsible domains.
2. Data mesh infrastructure means that this information is not solely isolated within individual departmental domains but can also be shared at will across the organization’s operational network while remaining compliant with established data governance guidelines. This is a direct result of two of the key pillars of data mesh: A self-serve data platform, and federated governance. The self-serve data platform provides the tooling and infrastructure needed by each domain to universally ingest, transform, process, and serve their data. Meanwhile, the federated governance principles ensure standardization across an organization, allowing for effortless interoperability of data between all domain teams.
3. Data owners are the final component of a data mesh and are responsible for applying the compliance, governance, and categorization protocols for their departments’ data. For example, HR files must be stored using certain security protocols, they must not be used for this or that purpose, they must only be released to such-and-such a person. Of course, each department will have categories and types of data unique to their department or purposes. In a data lake system, IT teams must grapple with all these different protocols and categories for all the different data owners who have dumped stuff into the lake. Whereas data mesh architecture gives domain owners full authority and control over these matters because again, who better than subject area experts to manage their own data and ensure that it meets quality standards.
Data mesh in practice: Who’s using it and why
For data management solutions to evolve and become more successful, they have to be usable and relevant for a wide range of applications and operations. As data mesh architecture and user friendliness improve, we are seeing an increased range of businesses functions that can be enhanced with a secure and distributed approach to data as a product and a tool.
Here’s some common business use cases:
- Sales: For sales teams, it all comes down to acquiring, nurturing, and closing leads. The more time your sales team members spend at their desks doing administrative tasks, the less time they have to build relationships with new customers. With data mesh architecture, sales team users don’t need to be data management and retrieval experts to have the most powerful and relevant data sets and combinations at their fingertips. When sales departments have all the right data to analyze, it translates into more actionable insights and strategies.
- Supply chain and logistics: Modern supply chains are vulnerable to an enormous range of disruptions. A competitive edge comes when companies can pivot quickly and respond to both threats and opportunities with equal agility. Today’s global supply chain data is coming in thick and fast – from customer feedback, to IIoT networks, and digital twins. When experienced and savvy supply chain managers are themselves able to curate and drill into any of those data sets in real-time, businesses get a powerful source of insight and acumen.
- Manufacturing: As part of the supply chain, a company’s manufacturing operations are equally vulnerable to rapid market shifts and volatile customer demands. In the past, design and R&D teams would have to rely on historical customer data, fed to them from other departments. Today, the data mesh brings live data access to users behind the drafting table, on the R&D and testing teams, and all the way to the manufacturing floor. Real-time customer feedback can inform product development in an instant, and up-to-the-minute intel from IIoT networks and digital simulations can help factories run safer, faster, and more efficiently.
- Marketing: Today, customer demands and expectations are shaping the future and changing and growing at an unprecedented pace. A single brand typically has myriad consumer touchpoints across social media, targeted digital ads, and online and omnichannel shopping portals. The current market sees the growing desire for rapid customization, shorter product lifecycles, and enormous levels of choice and competition. To understand and leverage these trends, modern marketers need to have real-time and simultaneous access to a wide variety of data sets. In the past, this has meant requesting (and waiting for) this data from other departments. With a data mesh setup, however, marketers can curate and access this data in the moment, on their own terms.
- Human Resources: HR teams must manage large amounts of extremely complex and sensitive data. And with the growing trend toward remote and hybrid workplaces, that data is getting more complicated and geographically diverse every day. Not to mention the ever-changing set of compliance and legal issues that HR teams must so urgently stay on top of. From hire to retire, HR leaders must be able to validate, assess, and analyze some of the most broadly disparate data sets in any organization. Data mesh architecture allows for the appropriate security protocols and tightly restricted access – while at the same time, making it possible for authorized HR users to access data and information quickly and without dependence upon complex internal protocols and multi-departmental bureaucracy.
- Finance: As with HR, Finance and accounting teams are also responsible for enormously crucial and sensitive data. Modern ERP systems are revolutionizing finance, using in-memory database technology to customize up-to-the-moment reports, analyses, and projections. Yet even when finance teams are using the best databases and ERPs, they often still face obstacles because they are plagued by longstanding and rigid cultures, heavy silos, and bureaucratic, old-school processes. Data mesh architecture brings a fundamental shift in how finance data is looked at and managed – and can even shake up stagnant thinking that can happen when teams have the opportunity to own and revise their own aging data processes.
More than just a hype: Data mesh as a new approach to increase agility in value creation from data.
It's clear that data mesh is not just another buzzword and is a data strategy trend that needs to be taken seriously. Companies of all sizes and industries are using data mesh, looking for ways to use data to create insights and value.
Benefits of data mesh
In the past, legacy databases and limited data management infrastructures have contributed to the sense that data is something to be held in a single vault and meted out at the discretion of a few data managers. Now, data is the fuel that drives your business, and it should be given freely to those subject specialists who best know how to make it work and drive profit in competitive times.
The main advantages of data mesh architecture can be summarized as:
- Increased data accessibility. Data mesh ensures that all the right people across your organization can access the data they need – to be the absolute best at their jobs.
- Improved analytics capabilities. When data is looked at as a product to be used every day, teams start to take a data-first approach to planning and strategy. This leads to a reduction in errors and a more objective, less opinion-driven approach to business development.
- Customizable data pipelines and processes. Many of the best and potentially most profitable projects get shelved due to the enormous hassle of curating the unique and customized data sets needed to achieve success. With a data mesh, teams can quickly access and test new project models without the traditional loss of time or resources.
- Bottlenecks can be reduced. This is an obvious win/win for both IT teams and data owners. Furthermore, by reducing a source of frustration and irritation businesses can help to break down silos that stand in the way of healthy business development.
- Reduced strain on central data management teams. This means not only reducing backlogs and frustration but also freeing up countless hours for your talented IT teams to devote to more specialized, interesting, and profitable pursuits
Data Mesh FAQs
At its core, data democratization is all about solving the data challenges that people face in their day-to-day work. More details about the definition, principles and how to help employees feel comfortable about asking data-related questions and getting answers are listed in this blog.
Interoperability is defined as the ability of a system or a product to work with other systems or products without special effort on the part of the user. Techtarget adds that it helps organizations achieve higher efficiency and a more holistic view of information and data. For more detailed information, this Open MOOC lesson provides the basics of data interoperability as well as the different types and layers of interoperability of data.
Data mesh and data fabric are different architectural approaches within a company’s data management strategy.
Data fabric is a technocentric approach that seeks to find increasingly seamless ways to manage complex metadata and unstructured information by merging AI, machine learning, and advanced analytics. Data mesh on the other hand, while dependent upon all the technological developments within the data fabric, is more focused on integrating data management processes with the human users who depend upon them – and finding ways to streamline and simplify data access and usefulness from a people perspective.
There is something of a chicken-and-egg relationship between data mesh and data fabric: Ever-advancing data fabric technologies are needed if data management is to evolve at the speed it needs to. Yet, without an accompanying evolution in human processes and organizational strategies, people will not be able to properly leverage the advancing data fabric technologies. Just as DOS and complex interfaces gave way to the more seamless computer operating systems we enjoy today, data mesh and data fabric architectures are destined to grow increasingly seamless as these processes and technologies advance.
SAP Insights Newsletter
Gain key insights by subscribing to our newsletter.