FROM DATA TO INSIGHT WITH MDW
With all news about COVID and the US elections, the announcement almost went unnoticed: in early December, Microsoft launched a new data governance service known as Azure Purview. Microsoft CEO, Satya Nadella, presented the product at a digital event called“Shape Your Future with Azure Data and Analytics.” The presentation was a review of Microsoft’s vision about the future of data in the coming years. In short, the presentation addressed how Azure can help organizations make better strategic business decisions based on data and analytics.
But the most important news was Microsoft’s announcement of a new data governance service.
Does your company need a data governance brick?
Until now, Microsoft did not have a data governance brick in their modern data warehouse architecture. With Azure Purview, this gap is now filled. While some may argue that Microsoft previously addressed data governance with solutions like Microsoft Data Catalog and Power BI Data Source Management, these lack a precise tool for data lineage and are poorly adapted to modern data warehouse architecture.
In most projects on which we have worked, when the architecture starts to reach a critical size, business users start lacking a number of features such as:
– A unified service capable of managing and governing on-premise, multi-cloud, and software-as-a-service (SaaS) data;
– A tool to classify data sources and label sensitive data;
– A data dictionary where users can search the data using technical or business terms;
– A graphical interface (graph) showing the data lineage.
Challenges when building your modern data warehouse
At MDW, we believe that a project as important as defining your organization’s modern data warehouse architecture must be accompanied by a change management project, because of its impact across the whole organization. End users include C-level management, data analysts, data scientists, and report consumers. When calculating the return on investment (ROI) of your project, you need to consider the (intangible?) benefits of a tool capable of addressing the previous section’s needs. As already mentioned, this kind of project requires changes and investments in two main domains:
People will need to be trained and receive support, as some individuals will have a new role (data steward) in the organization, while others will need to share data ownership.
Business processes will benefit from the new architecture, since your organization will pass from a data silo approach to a data-driven approach.
One way to turn all these changes and investments into tangibles assets is by implementing Azure Purview to take advantage of data governance. To name but a few of these tangible assets:
Data monetization /data as an asset
A data governance tool like Azure Purview can also be used as a data monetization tool. In this way, you can easily calculate the price of your data assets.
Business user productivity
You can have the most elaborate and powerful analytical tool. But if you lack a data governance tool, it will be difficult for business users to rapidly find the information that they are looking for.
Operation productivity
With better data quality, a connected data-driven organization will generate operational benefits. Defining roles and responsibilities will ensure that you avoid a ping-pong game between users.
Risk mitigation
Data governance is capital for compliance and audit purposes. Having visibility on data lineage and ownership as well as tracking data consumption are elementary for general data protection regulations.
Purview as a data governance tool
This is not a technical blog in which we explain how to build a business glossary, classify your data, or create data lineage – there are many excellent tutorials on the internet that describe these steps. This blog rather presents an conceptual overview of the available tools.
In Microsoft Word, Azure Purview is a unified data governance platform that automates the discovery, cataloging, and mapping of data in addition to lineage tracking. It aims to give customers a good understanding of the breadth of the data estate that exists and to ensure that all the regulations such as the General Data Protection Regulation (GDPR) and California Consumer Privacy Act (CCPA) are managed across the entire data estate so that you do not violate any regulations.
Azure Purview has three main components.
Data discovery, classification, and mapping
Azure Purview uses Purview Data Map to automate and manage metadata from 15 hybrid sources. You can then classify and label them consistently across all your platforms: SQL Server, Azure, Microsoft 365, and Power BI.
The Azure Purview Data Catalog provides a large set of default classifications representing typical personal data types that you might have in your data estate.
You also can create custom classifications if the default classifications fail to meet your needs.
Data catalog
A data consumer can discover data using the familiar hierarchical namespace for each of the data sources using an explorer view. Once the data source is registered and scanned, the data map extracts information about the data source’s structure (hierarchical namespace). This information is used to build the browsing experience for data discovery.
Data lineage
After connecting Azure Purview to Azure Data Factory instances, you can automatically collect data lineage and quickly determine which analytics and reports already exist.
In Azure Purview, you can register and scan source types. Once the scan is complete, you can view the asset distribution in Asset Insights, which details the state of your data estate by classification and resource sets. It also summarizes any changes in the data size.
Purview as a data monetization tool
Azure Purview can also be used as a powerful tool for data monetization . Since the data map is exposed as an Apache Atlas Open API, you can interact with it in two different ways:
– After posting data on Purview, you can create a data factory pipeline to push any metadata in the lineage from any data system to expand it.
– Using the REST APIs, you can obtain the different entities/attributes of the concept map (see Figure 9 below)
We are interested in the second point, as you can create a data model on top of the Apache Atlas to add any entities needed for the allocation tool:
– Cost: You can include the price plans managed outside Azure in this entity, such as a Bloomberg data asset that you subscribed to. If it is an Azure asset, you can access the fees using the Azure Cost Management API.
– Security group: You can obtain this information from the API of each destination data asset: for example, access control lists (ACLs) for a data lake folder, database users, and Power BI report users.Recipient end-user: Once you have the security groups, you can use Microsoft -Graph API to obtain the active directory user list.
– Recipient end-user: Once you have the security groups, you can use Microsoft Graph API to obtain the active directory user list.
– Cost strategy: Manually input the cost strategy to define how the costs will be split between the data asset beneficiaries.
And then you need to start using your new data model via a PowerApp. You will have a fully automated allocation cost tool that is ready to use.
This is the kind of tool that we mentioned above when discussing the challenges of building a modern data warehouse. This tool has a real revenue impact on your organization and is easy to include in the ROI calculation of the whole project.
Did you know?
In the previous section on “Purview as a data monetization tool” we tried to show the minimum number of entities needed to build a data monetization tool. You can still develop a whole semi-automated framework by adding other entities to create a subscription platform and manage all your data subscriptions in the same place if you have several non-Azure subscriptions such as Bloomberg, S&P, and Reuters.
Currently, you can only register a limited number of Azure Purview data sources. All Azure, Power BI, and SQL source types are already present. By the end of January 2021, other non-Azure data sources should appear as summarized in the table below.
Next steps
Throughout this blog post, we discussed the importance of data governance in the implementation of a modern data warehouse and how Azure Pureview can help you calculate the ROI of the whole project. To achieve better results, your company should follow these steps:
1. Define a data governance strategy;
2. Design a change management plan;
3. Create roles, responsibilities, and rules;
4. Maximize information availability;
5. Design data governance metrics and reporting requirements.
Regardless of the type of strategy or data governance roles, consider choosing a trusted partner who shares the same data-driven approach. At MDW, as a triple MS Gold Partner, we are a group of data engineers with extensive expertise in cloud technologies. We help our clients define their cloud strategy (including the data governance strategy), design their modern data warehouse architecture, and implement all the necessary bricks. Just send us an email.