Data Quality in Microsoft Purview Data Governance
The version of Purview released by Microsoft in September 2024 marks an important milestone in terms of data governance. Including the Data Quality part has been, without a doubt, a great advance and with them they cover one of the most demanded needs by the Data teams of the organizations.
Thanks to the possibility of monitoring data quality, an organization (or team) can identify the state of health of its source systems. Those applications create (on many occasions) the data that later must be consumed and enriched to provide valuable Insights to the business. Therefore, with this type of tool, the Data Governance team can “take out” the colors to those teams that have developed applications that “swallow” almost anything with the so-used free text boxes and that, of course, have not taken into account the recommendations of the CoE of Data to apply these good practices in the development of the data insertion systems.
This may seem like a reproach, but that is what it is. Development teams rarely incorporate data experts, and that translates (usually) into problems with sovereignty and compliance with regulations (GDPR, HIPAA,…), integrations (physical deletions, lack of PK, no way to manage Deltas,…), performance (lack of solutions such as mirroring on the operational to avoid blockages between it and the analytical one,…), quality (as we are seeing in the past), and so on. ), performance (lack of solutions such as mirroring on the operational to avoid blockages between this and analytics), quality (as we are seeing now),… For all this and more, please organizations, hire data experts and give them responsibility to act in a cross-company way or you will never really be a data-driven company. I vindicate the CDO (but with and in place, not hanging from the CTO or similar).
Well, thanks to the possibility of creating different types of quality rules and assigning them to the attributes of the various data assets, Microsoft Purview allows you to monitor performance and find the root cause of the problems that often business reveal: incorrect addresses, phone numbers, ID / NIE / CIF, … incorrect. And in turn, that serves the Data team to go to the development team and propose improvements to solve downstream problems. It is never too late to evolve a data solution and align it to Data Governance standards and best practices.
This is an example of what the monitoring part of a data asset would look like. In this case, it is a Delta Lake file persisted in an Azure Data Lake. What would be the Bronze layer in case of using a Lakehouse approach.
If you click on the active quality rules, you can access the details of what type of rules they are and on which set of attributes they act. In the example, they are basic rules such as detecting empty or blank fields and that the values are unique. There is the possibility to customize the quality rules if there is no such type that you want to include and even Microsoft Purview has a wizard for the definition of the quality rules themselves.
And as I said, if you want to create a new rule, just click on the ‘+ New rule’ button and select or customize your own rule.
Microsoft Purview is a Data Governance tool that is constantly growing and incorporating new functionalities. In addition, it integrates perfectly with other solutions that extend its capabilities, such as Profisee or CludeIn for Master Data management and Golden Records. That is to say, when you think about Government, try to identify the main points of improvement that your organization would require and look for the solution that covers the largest number of them and also keep in mind the combinations of solutions, of which there are many: Purview + Unity Catalog for example.
*Article previously published on the website https://alb3rtoalonso.com/