flex-height
text-black
Data management glossary
Explore a range of data-related terms and definitions in our data management glossary.
default
{}
default
{}
primary
default
{}
secondary
What is a database?
A database is a facility for organising, storing, managing, safeguarding, and controlling access to data. Databases are designed according to a number of different schemes (schema), many of which adhere to the relational model for ease of access by programmes and data queries. Common types of databases include relational database management systems (RDBMS), in-memory databases, object-oriented databases (OODBMS), NoSQL databases, and NewSQL databases—each with their own advantages.
What is data management?
Data management refers to all the functions necessary to collect, control, safeguard, manipulate, and deliver data. Data management systems include databases, data warehouses, and data marts; tools for data collection, storage, and retrieval; and utilities to assist with validation, quality, and integration with applications and analytical tools. Businesses need a data strategy to establish accountability for data that originates or is endemic to particular areas of responsibility.
What is database management?
Database management refers to the processes and procedures that are required to store, handle, manipulate and safeguard data. In many organisations, the responsibility for establishing and overseeing such procedures is the primary responsibility of a Database Administrator (DBA) or similar position. Most organisations rely on a commercial database management system (DBMS) as the primary tool for managing their database.
What is a database management system (DBMS)?
A database management system (DBMS) is the software toolkit that provides a storage structure and data management facility for database management. The DBMS may be an integral part of a licensed enterprise resource planning (ERP) system, a required separate purchase, a part of the system software (operating system), or a separately licensed software product. No matter the source, it is essential that applications are built around and/or completely integrated with the DBMS, as they are mutually dependent for effective functionality of both applications and the DBMS. The DBMS is essentially a toolkit for database management.
What is an SQL database?
An SQL database is a relational database that stores data in tables and rows. Data items (rows) are linked based on common data items to enable efficiency, avoid redundancy, and facilitate easy, flexible retrieval. The name SQL derives from Structured Query Language, the toolkit and natural language query protocol that users can learn and apply to any compliant database for data storage, manipulation, and retrieval.
What is a NoSQL database?
NoSQL databases were developed for handling unstructured data that SQL cannot support because of the lack of structure. NoSQL uses creative techniques to overcome this limitation including dynamic schemata and various pre-processing techniques. The most common types of databases for unstructured data are key-value, document, column, and graph databases and often include things like video, graphics, free text, and raw sensor output.
What is a relational database management system (RDBMS)?
A relational database management system is a database management system (DBMS) that is based on the relational data model. The contents of the RDBMS are stored in tables, made up of rows and columns, with each table representing a specific object, or entity, in the database that can be related to another. An RDBMS typically contains multiple tables and includes additional functions that maintain the accuracy, consistency, integrity, and security of the data, as well as an SQL interface to access the data in relation to each other through complex queries.
What is a CDBMS?
CDBMS is a term coined by Gartner that mainly describes a cloud deployment model for RDBMS above.
What is structured data?
Structured data is neatly formatted into rows and columns and mapped to predefined fields. Typically stored in Excel spreadsheets or relational databases, examples include financial transactions, demographic information, and machine logs. Until recently, structured data was the only usable type of data for businesses.
What is unstructured data?
Unstructured data is not organised into rows and columns—making it more difficult to store, analyse, and search. Examples include raw Internet of Things (IoT) data, video and audio files, social media comments, and call centre transcripts. Unstructured data is usually stored in data lakes, NoSQL databases, or modern data warehouses.
What is semi-structured data?
Semi-structured data has some organisational properties, such as semantic tags or metadata, but does not conform to the rows and columns of a spreadsheet or relational database. A good example of semi-structured data is e-mail—which includes some structured data, like the sender and recipient addresses, but also unstructured data, like the message itself.
What is data mapping?
Data mapping is the process of matching fields between different data structures or databases. This is a necessary step if databases are to be combined, if data is being migrated from one system or database to another, or if different data sources are to be used within a single application or analytical tool—as happens frequently in data warehousing. Data mapping will identify unique, conflicting, and duplicate information so that a set of rules can be developed for bringing all the data into a coordinated schema or format.
What is data modelling?
In creating a new or alternative database structure, the designer starts with a diagram of how data will flow into and out of the database. Diagramming the data flows is called data modelling. From this flow diagram, software engineers can define the characteristics of the data formats, structures, and database handling functions to efficiently support the data flow requirements.
What is data warehousing?
A data warehouse provides a single, comprehensive storage facility for data from many different sources—both internal and external. Its main purpose is to supply the data for business intelligence (BI), reporting, and analytics. Modern data warehouses can store and manage all data types, structured and unstructured, and are typically deployed in the cloud for greater scalability and ease of use.
What is a data lake?
A data lake is a vast pool of data stored in its raw or natural format. Data lakes are typically used to store Big Data, including structured, unstructured, and semi-structured data.
What is Big Data?
Big Data is a term that describes extremely large datasets of structured, unstructured, and semi-structured data. Big Data is often characterised by the five Vs: the sheer volume of data collected, the variety of data types, the velocity at which the data is generated, the veracity of the data, and the value of it. With Big Data management systems and analytics, companies can mine Big Data for deep insights that guide decision-making and actions.
What is small data?
In contrast to Big Data, which is hugely voluminous and complex, small data is easy for humans to understand. Small data sets can include anything from marketing surveys to everyday spreadsheets—and can even be as “small” as a single social media post or e-mail. Increasingly, companies are using small data, in addition to Big Data, to train their AI and machine learning algorithms, for even deeper insights.
What is thick data?
Thick data is qualitative information that provides insight into the everyday emotional lives of consumers. It includes observations, feelings, and reactions—things that are typically difficult to quantify. When combined with Big Data, a very comprehensive picture emerges about a consumer’s preferences and requirements.
What is data integration?
Data integration is the practice of ingesting, transforming, combining, and provisioning data, where and when it’s needed. This integration takes place in the enterprise and beyond—across partners as well as third-party data sources and use cases—to meet the data consumption requirements of all applications and business processes. Techniques include bulk/batch data movement, extract, transform, load (ETL), change data capture, data replication, data virtualisation, streaming data integration, data orchestration, and more.
What is data virtualisation?
Data virtualisation provides companies with a unified view of all enterprise data—across disparate systems and formats—in a virtual data layer. Instead of duplicating data, data virtualisation leaves the data in its source systems and simply exposes a virtual representation of it to users and applications in real time. Data virtualisation is a modern approach to data integration that lets users discover and manipulate data regardless of its physical location, format, or protocol.
What is data fabric?
A data fabric is a customised combination of architecture and technology. It uses dynamic data integration and orchestration to connect different locations, sources, and types of data. With the right structures and flows as defined within the data fabric platform, companies can quickly access and share data regardless of where it is or how it was generated.
What is data mesh?
Data mesh is an approach to data management that uses a distributed architectural framework. In other words: it distributes ownership and responsibility for specific data sets across the business, to those users who have the specialist expertise to understand what that data means and how to make the best use of it.
What is a data pipeline?
A data pipeline describes a set of automated and repeatable processes for finding, cleansing, transforming, and analysing any type of data at its source. Because data is analysed near where it’s generated, business users can quickly analyse and share the information they need at a lower cost to the organisation. Data pipelines can also be enhanced by technologies such as machine learning to make them faster and more effective.
What are data silos?
A data silo is a slang term for a situation in which individual departments or functional areas within an enterprise do not share data and information with other departments. This isolation prevents coordinated efforts towards company goals and results in poor performance (and poor customer service), high costs, and a general inability to respond to market demands and changes. Duplicate and redundant data is difficult to reconcile, further preventing any attempt to coordinate activities and effectively manage the business.
What is data wrangling?
Data wrangling is the process of taking raw data and transforming it into a format that is compatible with established databases and applications. The process may include structuring, cleaning, enriching, and validating data as necessary to make raw data useful.
What is data security?
Data security is the act of making data safe and secure—safe from unauthorised access or exposure, disaster, or system failure, and, at the same time, readily accessible to legitimate users and applications. Methods and tools include data encryption, key management, redundancy and backup practices, and access controls. Data security is a requirement for organisations of all sizes and types to safeguard customer and organisational data against the ever-increasing threat of data breaches and privacy risks. Redundancy and backups are important for business continuity and disaster recovery.
What is data privacy?
Data privacy refers to the policies and practices for handling data in ways that protect it from unauthorised access or disclosure. Data privacy policies and practices cover how information is collected and stored according to the organisation’s data strategy, how it may or may not be shared with third parties, and how to comply with regulatory restrictions. Data privacy is a business imperative that satisfies client expectations while protecting the integrity and safety of stored information.
What is data quality?
Data quality is a nebulous term describing the suitability and reliability of data. Good, quality data simply means that the data is accurate (truly representative of what it describes), reliable (consistent, auditable, properly managed, and protected), and complete to the extent that users and applications require. Data quality can only be ensured by a properly devised and executed data strategy carried out with industrial strength tools and systems along with scrupulously followed data management policies and procedures.
What is data validation?
Data validation is the process of determining the quality, accuracy, and validity of data before importing or using it. Validation can consist of a series of activities and processes for authenticating the data and generally “cleaning up” data items, including removal of duplicates, correction of obvious errors or missing items, and possible formatting changes (data cleansing). Data validation ensures the information you need for making important decisions is accurate and reliable.
What is data cleansing?
Data cleansing is the process of removing or correcting errors from a dataset, table, or database. These errors can include corrupt, inaccurate, irrelevant, or incomplete information. This process, also called data scrubbing, finds duplicate data and other inconsistencies, such as typos and numerical sets that don’t add up. Data cleansing may remove incorrect information or rectify obvious mistakes, such as empty fields or missing codes.
What is data integrity?
Data integrity refers to the truthfulness of data over the long term. Once data is entered or imported, wrangled, validated, cleansed, and stored, data integrity is a statement that data quality is maintained and users can rest assured that the data that went in has not and will not change. The data that is retrieved is the same as what was originally stored. Sometimes used as a synonym for data quality, data integrity is more about reliability and trustworthiness.
What is data governance?
Data governance is a set of policies and practices for ensuring proper data management across an organisation. It establishes the IT infrastructure and names the individuals (or positions) that have the authority and responsibility for the handling and safeguarding of specific types of data. Effective data governance ensures that data is available, trustworthy, secure, and compliant—and that it doesn’t get misused.
What is data stewardship?
Data stewardship is the implementation of data governance policies and procedures for establishing data accuracy, reliability, integrity, and security. Individuals assigned with data stewardship responsibilities manage and oversee the procedures and tools used to handle, store, and protect data.
What is data architecture?
Data architecture is the overall design for the structure, policies, and rules that define an organisation’s data and how it will be used and managed. Data architecture includes the details of how the data strategy is implemented in support of business needs and goals—and serves as the foundation for development of databases, procedures, safeguards, security, and data governance.
What is master data management?
Master data management (MDM) is the practice of creating one single, “master” reference source for all important business data. It includes policies and procedures for defining, managing, and controlling (or governing) the handling of master data. Centralised master data management eliminates conflict and confusion that stems from scattered databases with duplicate information and uncoordinated data that might be out-of-date, corrupted, or displaced in time—updated in one place but not in another. Having one version to serve the entire enterprise means that all parts of the organisation are working with the same definitions, standards, and assumptions.
What are analytics?
The term analytics refers to the systematic analysis of data. Analytics applications and toolkits contain mathematical algorithms and computational engines that can manipulate large datasets to uncover patterns, trends, relationships, and other intelligence that allow users to ask questions and gain useful insights about their business, operations, and markets. Many modern analytics toolkits are designed for use by non-technical business people, allowing them to perform these analyses with minimal assistance from data scientists or IT specialists.
What are augmented analytics?
Augmented analytics are analytics that are “augmented” with artificial intelligence technologies, including machine learning and natural language processing (NLP). Not only can augmented analytics help users uncover deeper insights, faster—they can automate many complicated steps in the process and allow even non-technical users to query data in a natural, conversational way.
What is data mining?
Data mining is the act of extracting useful information from large datasets. Data mining is often done by business users employing analytics tools to uncover patterns, trends, anomalies, relationships, dependencies, and other useful intelligence. Data mining has a broad range of applications, from detecting fraud and cybersecurity concerns to improving forecasts and finding performance improvement opportunities.
What is data profiling?
Data profiling is the practice of collecting statistics and traits about a dataset, such as its accuracy, completeness, and validity. Data profiling is one of the techniques used in data validation and data cleansing efforts, as it can help detect data quality issues such as redundancies, missing values, and inconsistencies.
SAP Product
What is data management?
Learn how your organisation can transform its data into a valuable asset.