Democratize Data Access with Data Products Approach

Democratize Data Access with Data Products Approach

Education
March 9, 2022 by Editor's Desk
652
For a data-intensive business, data is its most important asset. Data products assure the highest levels of data quality and accessibility. The Data Explosion As digital business grows, so does the amount of enterprise data. Add all the digital products, services, and business models, to the proliferation of devices and interconnectivity, and you get an
Using Blockchain to speed up your decarbonization path

For a data-intensive business, data is its most important asset. Data products assure the highest levels of data quality and accessibility.

The Data Explosion

As digital business grows, so does the amount of enterprise data. Add all the digital products, services, and business models, to the proliferation of devices and interconnectivity, and you get an explosion of data. With more than 90% of the globe’s data created in the past couple of years, enterprises are now data-intensive by definition.

According to a McKinsey Global Institute report, companies that are data-intensive are 23X more likely to acquire customers, 6X more likely to retain customers, and 19X more likely to be profitable.

 

What is a Data-Intensive Company?

A data-intensive company gets the most out of its data by treating it as a product. For example, it might divide up its data into different “product lines” based on quality (e.g., completeness, freshness, and accessibility), or by target market:

  • A telco rep talking to a customer and judging the likelihood to churn
  • A media company providing personalized content to its subscribers
  • A bank promoting a new financial product to a targeted client segment

 

What are Data Products?

Data products are a core concept of the data mesh model. Data products are produced to accomplish a particular objective. They contain everything a data consumer needs in order to generate value from the data. Here are a few examples:

  • Providing a 360° customer view to a CRM system, including all interactions, transactions, and master data
  • Obfuscating sensitive personal information, for use by operational and analytical workloads
  • Pipelining inventory data from outlets into a central data repository for AI/ML analysis
  • Masking and then integrating test data with a CI/CD pipeline, to support quick response times for a wealth management application

Data products typically correspond to business entities – such as a customer, order, device, loan, or location. Since a business entity’s data is usually fragmented across many different sources, a data product integrates, unifies, and constantly synchronizes its data with all underlying systems.

The data product is comprised of its definition and its data, as described below.

Data product definition:

  • Static metadata, including the data schema, integration and delivery methods, processing logic, sync rules, access controls, and more
  • Sync rules, detailing when and how data is synchronized with the source systems
  • Algorithms, responsible for transforming, processing, enriching, and masking the raw data
  • Data ingestion, delivery, and access methods, such as JDBC, web services, Kafka, CDC, messaging, and virtualization
  • Data pipelining, flows and iterations
  • Data lineage, from the data product, to the source systems 
  • Access controls, such as user validation and authentication

Data product data:

  • Managed, as a single unit, for easier processing and access
  • Unified, cleansed, masked, and enriched
  • Stored, in-memory, virtualized, or cached
  • Differentiated, by active metadata on usage and performance
  • Tracked, via data changes logged in an audit log

A data product’s definition and data are managed independently, in the sense that is has a single definition, but multiple instances of its data.

5 Steps to Producing and Using Data Products

To take a data product approach, data teams must adapt cross-functional product lifecycle methodology to data. The data product delivery lifecycle should follow agile principles, by being short and iterative, to deliver quick, incremental value to data consumers. Here are 5 steps to success:

  1. Define and Design the Data Products
    Define all data requirements in terms of the business objectives, the limitations of data privacy and governance, and all relevant existing data assets. Design the data structure, and how it will be served as a product, for consumption among authorized data consumers and services.
  2. Engineer the Data Products
    Make sure the data products address their requirements, by identifying, integrating, and collating the data from all sources, and then masking it as necessary. Create web services APIs to ensure that consuming applications have the proper credentials to access the data product, and assemble pipelines to securely provide the data to subscribers. 
  3. QA the Data Products
    Automatically test and validate the data to ascertain that it’s complete, compliant, and current – and ready for agile delivery and high-scale consumption. 
  4. Maintain the Data Products
    Track data usage, pipeline performance, and reliability on an ongoing basis – and collaborate with data engineering to resolve issues as per predefined SLAs.
  5. Appoint a Data Product Manager

In the case of software product development, a software product manager is responsible for gathering user requirements, prioritizing them, and working together with software development and QA to produce the right product, at the right time. 

A similar position is needed on the data team. A data product manager is responsible for collecting all data requirements from all data consumers (data analysts, data scientists, application owners), prioritizing them, and working closely with data engineering to deliver the data product on time, and within budget.

Just like any other product, a data product must deliver business value, in the form of better decision making, faster application development, and more. For this to happen, data delivery must contingent upon a particular timeline – sort of like a service level agreement between business and IT.

 

4 Best Practices for Data Products 

  1. Cross-functional collaboration 
    Data collectors must work closely with data consumers, in terms of experimentation, product evolution, and the rollout of developing new features (or rollback of changes, as required). 
  2. Agile development 
    Data products must be developed quickly and reliably, with data assets decoupled as much as possible. A well-designed data catalog would be an invaluable asset here. 
  3. Relentless QA 
    Producing data products is a process, by definition. Data teams should have a strong CI/CD backbone, and make sure to identify issues through automatic testing and data quality checks. And be sure to learn from mistakes, in effort to constantly improve the product. 
  4. Real-time access 
    Data products must be used by consumers in order to gauge their value, so data engineers must make them available as quickly and easily as possible. And standard interfaces should be the rule, to accommodate the needs of diverse teams.

 

Business Entities – the Rationale Behind Data Products

The most obvious way to engineer data products is to model them around the business entities that they support, such as a customer, vendors, devices, outlets, or anything else that’s important to the business. 

Each business entity (e.g., Sally Jones) should be complete in its attributes, enriched via analytics (e.g., likelihood to churn), and easily accessible to any authorized data consumer (e.g., application or person). 

The use of the business entity should be measurable. How was the data accessed, and how much time did it take to get to it? How often was it accessed, and by whom? Who tried to access it, but didn’t have the right credentials? Which insights did it derive? And the list goes on and on. 

The quality of the data product must be assured, in terms of completeness, integrity, and freshness.

 

Needed – a Platform Based on Data Products

A data product approach to managing enterprise data democratizes data access, assures data trust, and drives data-driven innovation and decision-making. 

A data product platform can manage millions of data products at the same time, and ensure that they’re constantly in sync with their sources, and available to operational and analytical workloads in real-time. It empowers enterprises to proactively adapt the data product approach necessary to sustain data-driven leadership.

This article has been contributed by the Author:

Ian Tick

CONTENT PRO, NATIVE ENGLISH SPEAKER and an EXPERT at Competitive Claim Analysis & USP Determination | COPYWRITING & STORYTELLING: Articles, Blogs, Product Briefs, Scripts, Storyboards & White Papers. WEBSITE: Content, Design & Programming | PRESENTATIONS: PowerPoint & Video.