{"id":36780,"date":"2020-07-07T15:00:41","date_gmt":"2020-07-07T14:00:41","guid":{"rendered":"https:\/\/cm-edgetun.pages.dev\/en-gb\/industry\/blog\/?p=36780"},"modified":"2020-07-02T20:43:01","modified_gmt":"2020-07-02T19:43:01","slug":"how-to-operationalise-your-data-lake","status":"publish","type":"post","link":"https:\/\/cm-edgetun.pages.dev\/en-gb\/industry\/blog\/technetuk\/2020\/07\/07\/how-to-operationalise-your-data-lake\/","title":{"rendered":"How to Operationalise your Data Lake"},"content":{"rendered":"<p><img loading=\"lazy\" decoding=\"async\" class=\"attachment-full size-full webp-format\" src=\"https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/04\/DataLakeHeader.jpg\" alt=\"The Data Lake Analytics logo, next to an illustration of Bit the Raccoon.\" width=\"1920\" height=\"700\" data-orig-srcset=\"https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/04\/DataLakeHeader.jpg 1920w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/04\/DataLakeHeader-300x109.jpg 300w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/04\/DataLakeHeader-1024x373.jpg 1024w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/04\/DataLakeHeader-768x280.jpg 768w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/04\/DataLakeHeader-1536x560.jpg 1536w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/04\/DataLakeHeader-330x120.jpg 330w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/04\/DataLakeHeader-800x292.jpg 800w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/04\/DataLakeHeader-400x146.jpg 400w\" data-orig-src=\"https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/04\/DataLakeHeader.jpg\" \/><\/p>\n<p><span data-contrast=\"auto\">Data lake operationalisation is a colossal topic<\/span><span data-contrast=\"auto\">\u00a0with many\u00a0<\/span><span data-contrast=\"auto\">deliberat<\/span><span data-contrast=\"auto\">ions\u00a0<\/span><span data-contrast=\"auto\">on either building the right data lake or defining the right strategy<\/span><span data-contrast=\"auto\">.\u00a0<\/span><span data-contrast=\"auto\">The <\/span><span data-contrast=\"auto\">five important <\/span><span data-contrast=\"auto\">points <\/span><span data-contrast=\"auto\">that everyone stresses on\u00a0<\/span><span data-contrast=\"auto\">prior to starting the process of building a data lake<\/span><span data-contrast=\"auto\">\u00a0are:<\/span><span data-contrast=\"auto\">\u00a0<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559739&quot;:160,&quot;335559740&quot;:240}\">\u00a0<\/span><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"attachment-full size-full webp-format\" src=\"https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/07\/dataop1.jpg\" alt=\"The five pointers: Sponsor, Prioritise, Strategise, Partner and Subscription\" width=\"1313\" height=\"268\" data-orig-srcset=\"https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/07\/dataop1.jpg 1313w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/07\/dataop1-300x61.jpg 300w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/07\/dataop1-1024x209.jpg 1024w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/07\/dataop1-768x157.jpg 768w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/07\/dataop1-330x67.jpg 330w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/07\/dataop1-800x163.jpg 800w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/07\/dataop1-400x82.jpg 400w\" data-orig-src=\"https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/07\/dataop1.jpg\" \/><\/p>\n<p><span data-contrast=\"auto\">T<\/span><span data-contrast=\"auto\">his blog\u00a0<\/span><span data-contrast=\"auto\">provides six mantras\u00a0<\/span><span data-contrast=\"auto\">for organisations <\/span><span data-contrast=\"auto\">to\u00a0<\/span><span data-contrast=\"auto\">ruminate on <\/span><span data-contrast=\"auto\">i<\/span><span data-contrast=\"auto\">n<\/span><span data-contrast=\"auto\">\u00a0order<\/span><span data-contrast=\"auto\">\u00a0<\/span><span data-contrast=\"auto\">to successfully tame the \u201cOperationalising\u201d of a data lake,<\/span><span data-contrast=\"auto\">\u00a0post production release<\/span><span data-contrast=\"auto\">.<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559739&quot;:160,&quot;335559740&quot;:240}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">\u202f<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559739&quot;:160,&quot;335559740&quot;:240}\">\u00a0<\/span><\/p>\n<h2>1. ALWAYS have a North star Architecture<\/h2>\n<p><span data-contrast=\"auto\">D<\/span><span data-contrast=\"auto\">ata lakes are not only about pooling data, but also <\/span><span data-contrast=\"auto\">dealing with\u00a0<\/span><span data-contrast=\"auto\">aspects of its consumption<\/span><span data-contrast=\"auto\">.<\/span><span data-contrast=\"auto\">\u00a0The choice of data lake pattern depends on the masterpiece one wants to paint<\/span><span data-contrast=\"auto\">.<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559685&quot;:540,&quot;335559739&quot;:160,&quot;335559740&quot;:240}\">\u00a0<\/span><\/p>\n<h3><span data-contrast=\"auto\">Central vs Federated<\/span><span data-contrast=\"auto\">\u00a0<\/span><span data-contrast=\"auto\">vs Hybrid\u00a0<\/span><span data-ccp-props=\"{&quot;134233279&quot;:true,&quot;201341983&quot;:0,&quot;335559739&quot;:160,&quot;335559740&quot;:240}\">\u00a0<\/span><\/h3>\n<p><span data-contrast=\"auto\">Depending on the ask of the organisation, you can choose <\/span><span data-contrast=\"auto\">to store the enterprise data either all in one\u00a0<\/span><span data-contrast=\"auto\">location<\/span><span data-contrast=\"auto\">\u00a0(Central)<\/span><span data-contrast=\"auto\">\u00a0closest to the\u00a0<\/span><span data-contrast=\"auto\">organisation\u2019s<\/span><span data-contrast=\"auto\"> headquarters, or\u00a0<\/span><span data-contrast=\"auto\">due to\u00a0<\/span><span data-contrast=\"auto\">sovereignty<\/span><span data-contrast=\"auto\">\u00a0<\/span><span data-contrast=\"auto\">requirements, keep the <\/span><span data-contrast=\"auto\">data<\/span><span data-contrast=\"auto\">\u00a0stored<\/span><span data-contrast=\"auto\">\u00a0<\/span><span data-contrast=\"auto\">in\u00a0<\/span><span data-contrast=\"auto\">their specific subsidiaries (Federated)<\/span><span data-contrast=\"auto\">.<\/span><\/p>\n<p><span data-contrast=\"auto\">If\u00a0<\/span><span data-contrast=\"auto\">a<\/span><span data-contrast=\"auto\">n enterprise has a Global footprint<\/span><span data-contrast=\"auto\">, adopting a Hub and Spoke model\u00a0<\/span><span data-contrast=\"auto\">(Hybrid) <\/span><span data-contrast=\"auto\">with a satellite<\/span><span data-contrast=\"auto\"> of<\/span><span data-contrast=\"auto\">\u00a0local data\u00a0<\/span><span data-contrast=\"auto\">closer to the reporting countries<\/span><span data-contrast=\"auto\">\u00a0would do the trick<\/span><span data-contrast=\"auto\">. Even though this model will have alignment issues (<\/span><span data-contrast=\"auto\">data replication etc.)<\/span><span data-contrast=\"auto\">\u00a0it will aid performance, regional governance and development<\/span><span data-contrast=\"auto\">.<\/span><span data-contrast=\"auto\">\u00a0(Fig 1)<\/span><span data-contrast=\"auto\">\u202f<\/span><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"attachment-full size-full webp-format\" src=\"https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/07\/dataop2.jpg\" alt=\"Figure\u00a01\u00a0\u2013\u00a0Hybrid\u00a0Architecture\u00a0\" width=\"1049\" height=\"272\" data-orig-srcset=\"https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/07\/dataop2.jpg 1049w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/07\/dataop2-300x78.jpg 300w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/07\/dataop2-1024x266.jpg 1024w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/07\/dataop2-768x199.jpg 768w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/07\/dataop2-330x86.jpg 330w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/07\/dataop2-800x207.jpg 800w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/07\/dataop2-400x104.jpg 400w\" data-orig-src=\"https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/07\/dataop2.jpg\" \/><\/p>\n<p style=\"text-align: center\"><em>Figure\u00a01\u00a0\u2013\u00a0Hybrid\u00a0Architecture\u00a0<\/em><\/p>\n<p><span data-contrast=\"auto\">\u202f<\/span><\/p>\n<h3><span data-contrast=\"auto\">Streamed vs. Batch vs. Near Real Time<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559685&quot;:1080,&quot;335559739&quot;:160,&quot;335559740&quot;:240}\">\u00a0<\/span><\/h3>\n<ul>\n<li data-leveltext=\"\u2022\" data-font=\"Calibri\" data-listid=\"25\" data-aria-posinset=\"0\" data-aria-level=\"1\"><span data-contrast=\"auto\">NRT Streaming \u2013 Every 15 minutes\/one hour and processed immediately, only where needed<\/span><span data-ccp-props=\"{&quot;134233279&quot;:true,&quot;201341983&quot;:0,&quot;335559739&quot;:160,&quot;335559740&quot;:240}\">\u00a0<\/span><\/li>\n<li data-leveltext=\"\u2022\" data-font=\"Calibri\" data-listid=\"25\" data-aria-posinset=\"0\" data-aria-level=\"1\"><span data-contrast=\"auto\">Lambda \u2013 Data fed in both batch layer and speed layer. Speed layer will compute real time views, while the batch layer will compute batch views at regular intervals. A combination of both covers all the needs<\/span><span data-contrast=\"auto\">\u00a0of data ingestion<\/span><span data-contrast=\"auto\">\u00a0and<\/span><span data-contrast=\"auto\"> distribution.<\/span><\/li>\n<li data-leveltext=\"\u2022\" data-font=\"Calibri\" data-listid=\"25\" data-aria-posinset=\"0\" data-aria-level=\"1\"><span style=\"font-size: 1.4rem\" data-contrast=\"auto\">Define your Hot and Cold Paths \u2013 <\/span><span style=\"font-size: 1.4rem\" data-contrast=\"auto\">Choose the right\u00a0<\/span><span style=\"font-size: 1.4rem\" data-contrast=\"auto\">storage(s) for your data lake.\u00a0<\/span><span style=\"font-size: 1.4rem\" data-contrast=\"auto\">Leverage Microsoft offerings of Azure Cosmos DB and ADLS Gen2 respectively. \u202f<\/span><span style=\"font-size: 1.4rem\" data-ccp-props=\"{&quot;134233279&quot;:true,&quot;201341983&quot;:0,&quot;335559685&quot;:720,&quot;335559739&quot;:160,&quot;335559740&quot;:240}\">\u00a0<\/span><\/li>\n<\/ul>\n<p><span data-contrast=\"auto\">S<\/span><span data-contrast=\"auto\">ample architecture patterns for <\/span><a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/architecture\/example-scenario\/dataplate2e\/data-platform-end-to-end\" target=\"_blank\" rel=\"noopener noreferrer\"><span data-contrast=\"none\">Data Platform<\/span><\/a><span data-contrast=\"auto\"> or <\/span><a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/cosmos-db\/lambda-architecture\" target=\"_blank\" rel=\"noopener noreferrer\"><span data-contrast=\"none\">Cosmos DB<\/span><\/a><span data-contrast=\"auto\"> Lambda Architecture.<\/span><\/p>\n<p><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559685&quot;:1080,&quot;335559739&quot;:160,&quot;335559740&quot;:240}\">\u00a0<\/span><\/p>\n<h3><span data-contrast=\"auto\">Build the right\u00a0<\/span><span data-contrast=\"auto\">HA-DR:<\/span><span data-contrast=\"auto\">\u00a0High Availability &amp;\u00a0<\/span><span data-contrast=\"auto\">Disaster Recovery Strategy<\/span><\/h3>\n<p><span data-contrast=\"auto\">High availability strategies are intended for handling temporary failure conditions to<\/span><span data-ccp-props=\"{&quot;134233279&quot;:true,&quot;201341983&quot;:0,&quot;335559685&quot;:720,&quot;335559739&quot;:160,&quot;335559740&quot;:240}\">\u00a0<\/span><span data-contrast=\"auto\">allow the system to continue functioning<\/span><span data-contrast=\"auto\"> while disaster recovery is recovering from catastrophic loss of application functionality.<\/span><span data-contrast=\"auto\"> For the right <\/span><a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/storage\/common\/storage-disaster-recovery-guidance\" target=\"_blank\" rel=\"noopener noreferrer\"><span data-contrast=\"none\">DR and HA<\/span><\/a><span data-contrast=\"auto\">\u00a0framework, keep\u00a0<\/span><span data-contrast=\"auto\">the following scenarios in mind\u00a0<\/span><span data-contrast=\"auto\">along with business c<\/span><span data-contrast=\"auto\">riticalities<\/span><span data-contrast=\"auto\">: d<\/span><span data-contrast=\"auto\">ata corruption; accidental data deletion, regional outage, n<\/span><span data-contrast=\"auto\">etwork\/connectivity issues and component<\/span><span data-contrast=\"auto\">\u00a0<\/span><span data-contrast=\"auto\">failures<\/span><span data-contrast=\"auto\">.<\/span><span data-ccp-props=\"{&quot;134233279&quot;:true,&quot;201341983&quot;:0,&quot;335559685&quot;:720,&quot;335559739&quot;:160,&quot;335559740&quot;:240}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">ADLS Gen2 now supports\u00a0<\/span><span data-contrast=\"auto\">replications\u00a0<\/span><span data-contrast=\"auto\">such as ZRS or GZRS (preview)<\/span><span data-contrast=\"auto\">\u00a0which<\/span><span data-contrast=\"auto\"> improve HA, while GRS and RA-GRS improve DR<\/span><span data-contrast=\"auto\">. Azure <\/span><span data-contrast=\"auto\">Cosmos DB is<\/span><span data-contrast=\"auto\">\u00a0known for\u00a0<\/span><span data-contrast=\"auto\">its<\/span><span data-contrast=\"auto\">\u00a0<\/span><span data-contrast=\"auto\">99.999%\u00a0<\/span><span data-contrast=\"auto\">high availability and\u00a0<\/span><span data-contrast=\"auto\">globally dis<\/span><span data-contrast=\"auto\">tribut<\/span><span data-contrast=\"auto\">ed<\/span><span data-contrast=\"auto\">\u00a0replications.\u00a0<\/span><span data-ccp-props=\"{&quot;134233279&quot;:true,&quot;201341983&quot;:0,&quot;335559685&quot;:720,&quot;335559739&quot;:160,&quot;335559740&quot;:240}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">Each Azure component checks most of these, so I e<\/span><span data-contrast=\"auto\">ncourage you to look at their product documentation. <\/span><span data-ccp-props=\"{&quot;134233279&quot;:true,&quot;201341983&quot;:0,&quot;335559685&quot;:720,&quot;335559739&quot;:160,&quot;335559740&quot;:240}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">\u202f<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559685&quot;:1080,&quot;335559739&quot;:160,&quot;335559740&quot;:240}\">\u00a0<\/span><\/p>\n<h2><b><\/b>2. Subscription Model<\/h2>\n<p><span data-contrast=\"auto\">Planning a Data Lake and then\u00a0<\/span><span data-contrast=\"auto\">s<\/span><span data-contrast=\"auto\">caling it\u00a0<\/span><span data-contrast=\"auto\">up requires\u00a0<\/span><span data-contrast=\"auto\">some\u00a0<\/span><span data-contrast=\"auto\">con<\/span><span data-contrast=\"auto\">templation<\/span><span data-contrast=\"auto\">.\u00a0<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559739&quot;:160,&quot;335559740&quot;:240}\">\u00a0<\/span><\/p>\n<h3>Technical Limitations<\/h3>\n<p><span data-contrast=\"auto\">Each product in Azure has a few boundary considerations and subscription<\/span> <a href=\"https:\/\/docs.microsoft.com\/api\/Redirect\/en-in\/documentation\/articles\/azure-subscription-service-limits\/\" target=\"_blank\" rel=\"noopener noreferrer\"><span data-contrast=\"none\">limits, quotas and<\/span><span data-contrast=\"none\"> c<\/span><span data-contrast=\"none\">onstraints<\/span><\/a><span data-contrast=\"auto\">.<\/span><span data-contrast=\"auto\">\u00a0<\/span><span data-contrast=\"auto\">Cautious treading will avoid hitting\u00a0<\/span><span data-contrast=\"auto\">t<\/span><span data-contrast=\"auto\">he\u00a0<\/span><span data-contrast=\"auto\">thresholds and limits of the products while scaling<\/span><span data-contrast=\"auto\">.\u00a0<\/span><span data-contrast=\"auto\">While defining the\u00a0<\/span><span data-contrast=\"auto\">lambda architecture you can choose your storage, and<\/span><span data-contrast=\"auto\"> ADLS Gen2\u00a0<\/span><span data-contrast=\"auto\">and Cosmos DB both do an\u00a0<\/span><span data-contrast=\"auto\">exceptional<\/span><span data-contrast=\"auto\">\u00a0job to o<\/span><span data-contrast=\"auto\">vercom<\/span><span data-contrast=\"auto\">e<\/span><span data-contrast=\"auto\">\u00a0<\/span><span data-contrast=\"auto\">throughput and limit<\/span><span data-contrast=\"auto\">\u00a0challenges<\/span><span data-contrast=\"auto\">.\u00a0<\/span><span data-contrast=\"auto\">Environment isolation\u00a0<\/span><span data-contrast=\"auto\">should\u00a0<\/span><span data-contrast=\"auto\">be thought about, especially <\/span><span data-contrast=\"auto\">during resource consumption <\/span><span data-contrast=\"auto\">for a laboratory experiment,<\/span><span data-contrast=\"auto\">\u00a0as well as<\/span><span data-contrast=\"auto\">\u00a0f<\/span><span data-contrast=\"auto\">eatures and functionality testing\u00a0<\/span><span data-contrast=\"auto\">such as\u00a0<\/span><span data-contrast=\"auto\">firewall rules or life-cycle management. \u202f<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559685&quot;:360,&quot;335559739&quot;:160,&quot;335559740&quot;:240}\">\u00a0<\/span><\/p>\n<h3>Business Constraints<\/h3>\n<p><span data-contrast=\"auto\">Businesses may want to keep the billing separate<\/span><span data-contrast=\"auto\"> or define a chargeback model<\/span><span data-contrast=\"auto\">\u00a0through different subscriptions<\/span><span data-contrast=\"auto\">\u00a0for\u00a0<\/span><span data-contrast=\"auto\">each\u00a0<\/span><span data-contrast=\"auto\">business layer,<\/span><span data-contrast=\"auto\">\u00a0and also consider other influencing f<\/span><span data-contrast=\"auto\">actors such as r<\/span><span data-contrast=\"auto\">egional legal obligations,\u00a0<\/span><span data-contrast=\"auto\">r<\/span><span data-contrast=\"auto\">egulatory constraints or data sovereignty.<\/span><\/p>\n<p><span data-contrast=\"auto\">Production costs for Dev<\/span><span data-contrast=\"auto\">\/Test environments<\/span><span data-contrast=\"auto\"> can be reduced by seeking out providers like M<\/span><span data-contrast=\"auto\">icrosoft\u00a0<\/span><span data-contrast=\"auto\">who\u00a0<\/span><span data-contrast=\"auto\">offer\u00a0<\/span><a href=\"https:\/\/azure.microsoft.com\/en-in\/offers\/ms-azr-0148p\/\" target=\"_blank\" rel=\"noopener noreferrer\"><span data-contrast=\"none\">great discounts<\/span><\/a><span data-contrast=\"auto\">\u00a0on lower environments<\/span><span data-contrast=\"auto\">. It is always advisable to have separate, split\u00a0<\/span><span data-contrast=\"auto\">subscriptions for Dev\/Test and Production<\/span><span data-contrast=\"auto\">\u00a0based on b<\/span><span data-contrast=\"auto\">usiness functions<\/span><span data-contrast=\"auto\">. Choose wisely and sa<\/span><span data-contrast=\"auto\">v<\/span><span data-contrast=\"auto\">e profusely<\/span><span data-contrast=\"auto\">.\u00a0<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559685&quot;:426,&quot;335559739&quot;:160,&quot;335559740&quot;:240}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">Owing to these constraints, you could rethink on North Star architecture and look at Hub and Spoke models if they&#8217;re suitable<\/span><span data-contrast=\"auto\">. <\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559739&quot;:160,&quot;335559740&quot;:240}\">\u00a0<\/span><\/p>\n<p><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559739&quot;:160,&quot;335559740&quot;:240}\">\u00a0<\/span><\/p>\n<h2>3. Ingestion<\/h2>\n<h3>Understand the soul of the\u00a0Data Sources<\/h3>\n<p><span data-contrast=\"auto\">It is\u00a0<\/span><span data-contrast=\"auto\">imperative to feel the pulse and the interaction of\u00a0<\/span><span data-contrast=\"auto\">different source systems, as this <\/span><span data-contrast=\"auto\">can give us a better idea of how to sufficiently <\/span><span data-contrast=\"auto\">hydrate\u00a0<\/span><span data-contrast=\"auto\">the\u00a0<\/span><span data-contrast=\"auto\">data lake.\u00a0<\/span><a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/data-factory\/connector-overview\" target=\"_blank\" rel=\"noopener noreferrer\"><span data-contrast=\"none\">ADF<\/span><\/a><span data-contrast=\"auto\">\u00a0does a great job in covering many data sources;\u00a0<\/span><span data-contrast=\"auto\">However,<\/span><span data-contrast=\"auto\">\u00a0for\u00a0<\/span><span data-contrast=\"auto\">the non-<\/span><span data-contrast=\"auto\">native connectors, identify a pattern for alternative source pull.\u00a0\u00a0<\/span><span data-contrast=\"auto\">This can be achieved with\u00a0<\/span><span data-contrast=\"auto\">API Pull,\u00a0<\/span><a href=\"https:\/\/databricks.com\/blog\/2020\/03\/06\/connect-90-data-sources-to-your-data-lake-with-azure-databricks-and-azure-data-factory.html\" target=\"_blank\" rel=\"noopener noreferrer\"><span data-contrast=\"none\">DataBricks<\/span><\/a><span data-contrast=\"auto\"> or using b<\/span><span data-contrast=\"auto\">lob provisioning <\/span><span data-contrast=\"auto\">as a landing zone for external files. If the external source system is also on a data lake you can use ADLS Share <\/span><span data-contrast=\"auto\">etc.<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559739&quot;:160,&quot;335559740&quot;:240}\">\u00a0<\/span><\/p>\n<h3>Choose\u00a0the right architecture for Ingestion patterns and Refreshes<\/h3>\n<ol>\n<li data-leveltext=\"%1.\" data-font=\"Calibri,Times New Roman\" data-listid=\"23\" data-aria-posinset=\"1\" data-aria-level=\"1\"><span data-contrast=\"auto\">Separate the\u00a0<\/span><span data-contrast=\"auto\">Batch and Strea<\/span><span data-contrast=\"auto\">m<\/span><span data-contrast=\"auto\">:<\/span><span data-contrast=\"auto\">\u00a0Our aim is to mitigate any throttling issues in spinning individual job clusters<\/span><span data-contrast=\"auto\">; we should h<\/span><span data-contrast=\"auto\">ave a central way to spin limited and central clusters to monitor<\/span><span data-contrast=\"auto\">\u00a0and\u00a0<\/span><span data-contrast=\"auto\">e<\/span><span data-contrast=\"auto\">ventually use cluster pools to leverage large clusters to run smaller jobs in a faster manner and with more control over the execution. <\/span><span data-contrast=\"auto\">Individualised c<\/span><span data-contrast=\"auto\">lusters avoid scaling<\/span><span data-contrast=\"auto\">,\u00a0<\/span><span data-contrast=\"auto\">throughput issues and limitations.<\/span><span data-contrast=\"auto\">\u00a0<\/span><a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/data-factory\/control-flow-for-each-activity#iterate-over-multiple-activities\" target=\"_blank\" rel=\"noopener noreferrer\"><span data-contrast=\"none\">Tip<\/span><\/a><span data-contrast=\"auto\">:<\/span><span data-contrast=\"auto\">\u00a0Use For each and Iterate in ADF for calling existing notebooks.\u00a0<\/span><span data-ccp-props=\"{&quot;134233279&quot;:true,&quot;201341983&quot;:0,&quot;335559739&quot;:160,&quot;335559740&quot;:240}\">\u00a0<\/span><\/li>\n<li data-leveltext=\"%1.\" data-font=\"Calibri,Times New Roman\" data-listid=\"23\" data-aria-posinset=\"1\" data-aria-level=\"1\"><a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/data-factory\/control-flow-for-each-activity#iterate-over-multiple-activities\" target=\"_blank\" rel=\"noopener noreferrer\"><span data-contrast=\"none\">ADF<\/span><\/a> <span data-contrast=\"auto\">Jobs<\/span><span data-contrast=\"auto\">\u00a0should be\u00a0<\/span><span data-contrast=\"auto\">run<\/span><span data-contrast=\"auto\">\u00a0in<\/span><span data-contrast=\"auto\">\u00a0parallel\u00a0<\/span><span data-contrast=\"auto\">for attaining\u00a0<\/span><span data-contrast=\"auto\">optimum\u00a0<\/span><span data-contrast=\"auto\">performance and\u00a0<\/span><span data-contrast=\"auto\">to\u00a0<\/span><span data-contrast=\"auto\">leverage Central\u00a0<\/span><a href=\"https:\/\/docs.databricks.com\/administration-guide\/capacity-planning\/cmbp.html#scenario-3-scheduled-batch-workloads-data-engineers-running-etl-jobs\" target=\"_blank\" rel=\"noopener noreferrer\"><span data-contrast=\"none\">DataBricks<\/span><span data-contrast=\"none\">\u00a0Cluster<\/span><\/a><span data-contrast=\"none\">.<\/span><span data-contrast=\"none\">\u00a0<\/span><span data-contrast=\"auto\">Choose your\u00a0<\/span><a href=\"https:\/\/github.com\/Azure\/AzureDatabricksBestPractices\/blob\/master\/toc.md#Deploying-Applications-on-ADB-Guidelines-for-Selecting-Sizing-and-Optimizing-Clusters-Performance\" target=\"_blank\" rel=\"noopener noreferrer\"><span data-contrast=\"none\">clusters<\/span><\/a><span data-contrast=\"auto\">\u00a0wisely\u00a0<\/span><span data-contrast=\"auto\">(and remember limits!)<\/span><span data-contrast=\"auto\">.<\/span><span data-contrast=\"auto\">\u00a0<\/span><\/li>\n<li data-leveltext=\"%1.\" data-font=\"Calibri,Times New Roman\" data-listid=\"23\" data-aria-posinset=\"1\" data-aria-level=\"1\"><span data-contrast=\"auto\">Data Refresh:<\/span><span data-contrast=\"auto\">\u00a0Each source has a different\u00a0<\/span><span data-contrast=\"auto\">way for handling the\u00a0<\/span><span data-contrast=\"auto\">Delta Refreshes. For the Raw Layer\u00a0<\/span><span data-contrast=\"auto\">keep<\/span><span data-contrast=\"auto\"> the pattern of data sources. For subsequent layers, metadata or mapping tables<\/span><span data-contrast=\"auto\">\/files in SQL DW for reference could be used post-strategising. A mapping file contain<\/span><span data-contrast=\"auto\">ing<\/span><span data-contrast=\"auto\">\u00a0the primary key column of data and the processing timestamp<\/span><span data-contrast=\"auto\"> will aid with delta loads<\/span><span data-contrast=\"auto\">.\u00a0<\/span><span data-contrast=\"auto\">\u00a0<\/span><a href=\"https:\/\/docs.databricks.com\/delta\/index.html\" target=\"_blank\" rel=\"noopener noreferrer\"><span data-contrast=\"none\">Databricks Delta<\/span><\/a><span data-contrast=\"auto\">\u00a0<\/span><span data-contrast=\"auto\">lets organisations remove complexity by getting the benefits of multiple storage systems in<\/span><span data-contrast=\"auto\">\u00a0one.<\/span><span data-contrast=\"auto\">\u00a0<\/span><span data-ccp-props=\"{&quot;134233279&quot;:true,&quot;201341983&quot;:0,&quot;335559739&quot;:160,&quot;335559740&quot;:240}\">\u00a0<\/span><\/li>\n<\/ol>\n<h3><span data-ccp-props=\"{&quot;134233279&quot;:true,&quot;201341983&quot;:0,&quot;335559685&quot;:786,&quot;335559739&quot;:160,&quot;335559740&quot;:240}\">CI\/CD<\/span><\/h3>\n<p><a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/data-factory\/continuous-integration-deployment\" target=\"_blank\" rel=\"noopener noreferrer\">CI\/CD<\/a>\u00a0<span data-contrast=\"auto\">should be well planned with the\u00a0<\/span><span data-contrast=\"auto\">right governance body to drive central guidelines.<\/span><span data-ccp-props=\"{&quot;134233279&quot;:true,&quot;201341983&quot;:0,&quot;335559685&quot;:709,&quot;335559739&quot;:160,&quot;335559740&quot;:240,&quot;335559991&quot;:283}\">\u00a0<\/span><span data-contrast=\"auto\">Build a one click deployment framework<\/span><span data-contrast=\"auto\">,\u00a0<\/span><span data-contrast=\"auto\">parameterise templates,<\/span><span data-contrast=\"auto\">\u00a0and b<\/span><span data-contrast=\"auto\">uild\u00a0<\/span><a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/azure-resource-manager\/templates\/quickstart-create-templates-use-the-portal\" target=\"_blank\" rel=\"noopener noreferrer\"><span data-contrast=\"none\">ARM templates<\/span><\/a> and<span data-contrast=\"auto\">\u00a0<\/span><span data-contrast=\"auto\">DataOps<\/span><span data-contrast=\"auto\">\u00a0wherever necessary<\/span><span data-contrast=\"auto\">.<\/span><span data-contrast=\"auto\">\u00a0<\/span><span data-ccp-props=\"{&quot;134233279&quot;:true,&quot;201341983&quot;:0,&quot;335559685&quot;:720,&quot;335559739&quot;:160,&quot;335559740&quot;:259}\">\u00a0<\/span><\/p>\n<p><span data-ccp-props=\"{&quot;134233279&quot;:true,&quot;201341983&quot;:0,&quot;335559685&quot;:720,&quot;335559739&quot;:160,&quot;335559740&quot;:259}\">\u00a0<\/span><\/p>\n<h2>4. Access to your Data Lake<\/h2>\n<p><span data-contrast=\"auto\">Data lake can be accessed at three time points; while <\/span><span data-contrast=\"auto\">hydrating\u00a0<\/span><span data-contrast=\"auto\">the data l<\/span><span data-contrast=\"auto\">ake, access between layers of the data\u00a0<\/span><span data-contrast=\"auto\">lake, and<\/span><span data-contrast=\"auto\"> while exposing the data lake for downstream systems.<\/span><\/p>\n<h3><span data-contrast=\"auto\">RBAC and ACLS<\/span><\/h3>\n<p><span data-contrast=\"auto\">Az<\/span><span data-contrast=\"auto\">ure role-based access control (RBAC)\u00a0<\/span><span data-contrast=\"auto\">lets you assign roles to security principals<\/span><span data-contrast=\"auto\">\u00a0and\u00a0<\/span><span data-contrast=\"auto\">helps control a higher level of resource<\/span><span data-contrast=\"auto\">s, <\/span><span data-contrast=\"auto\">whereas<\/span><span data-contrast=\"auto\">\u00a0<\/span><span data-contrast=\"auto\">POSIX-like access control lists (ACLs)<\/span><span data-contrast=\"auto\">\u00a0help us defining access to individual files or directory.\u00a0<\/span><a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/storage\/blobs\/data-lake-storage-access-control#access-control-lists-on-files-and-directories\" target=\"_blank\" rel=\"noopener noreferrer\"><span data-contrast=\"none\">ADLS Gen 2<\/span><\/a><span data-contrast=\"auto\">\u00a0<\/span><span data-contrast=\"auto\">offers security by supporting\u00a0<\/span><span data-contrast=\"auto\">both RBAC and ACLs based<\/span><span data-contrast=\"auto\">\u00a0access\u00a0<\/span><span data-contrast=\"auto\">controls.<\/span><span data-contrast=\"auto\">\u00a0<\/span><a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/cosmos-db\/role-based-access-control\" target=\"_blank\" rel=\"noopener noreferrer\"><span data-contrast=\"none\">Cosmos&#8217;s RBAC<\/span><\/a><span data-contrast=\"none\">\u00a0<\/span><span data-contrast=\"auto\">controls can be leveraged f<\/span><span data-contrast=\"auto\">or<\/span><span data-contrast=\"auto\">\u00a0<\/span><span data-contrast=\"auto\">streamed data<\/span><span data-contrast=\"auto\">\u00a0<\/span><span data-contrast=\"auto\">or hot path access<\/span><span data-contrast=\"auto\">.<\/span><span data-ccp-props=\"{&quot;134233279&quot;:true,&quot;201341983&quot;:0,&quot;335559685&quot;:850,&quot;335559739&quot;:160,&quot;335559740&quot;:259,&quot;335559991&quot;:425}\">\u00a0<\/span><\/p>\n<h3><span data-contrast=\"auto\">Pub<\/span><span data-contrast=\"auto\">lish<\/span><span data-contrast=\"auto\">\/<\/span><span data-contrast=\"auto\">Subscribe<\/span><\/h3>\n<p><span data-contrast=\"auto\">This is very critical for a <\/span><span data-contrast=\"auto\">Multi-Layer<\/span><span data-contrast=\"auto\"> Environment (Raw, Curated etc). <\/span>ADF Job Dependency needs to be managed and can be achieved b<span style=\"font-size: 1.4rem\" data-contrast=\"auto\">y implementing a service bus approach, where on completion of each job, an entry is made in<\/span><span style=\"font-size: 1.4rem\" data-ccp-props=\"{&quot;134233279&quot;:true,&quot;201341983&quot;:0,&quot;335559685&quot;:850,&quot;335559739&quot;:160,&quot;335559740&quot;:259}\"> s<\/span>ervice bus which publishes the status to all subscribers.\u00a0<span style=\"font-size: 1.4rem\" data-contrast=\"auto\">Downstream<\/span><span style=\"font-size: 1.4rem\" data-contrast=\"auto\"> ADF needs to subscribe to the<\/span><span style=\"font-size: 1.4rem\" data-contrast=\"auto\">\u00a0<\/span><span style=\"font-size: 1.4rem\" data-contrast=\"auto\">service bus to capture the completion of the job.<\/span><span style=\"font-size: 1.4rem\" data-ccp-props=\"{&quot;134233279&quot;:true,&quot;201341983&quot;:0,&quot;335559685&quot;:850,&quot;335559739&quot;:160,&quot;335559740&quot;:259}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">A sample pathway <\/span><span data-contrast=\"auto\">below depicts automation\u00a0<\/span><span data-contrast=\"auto\">of ADF for<\/span><span data-contrast=\"auto\">\u00a0<\/span><span data-contrast=\"auto\">downstream<\/span><span data-contrast=\"auto\"> systems us<\/span><span data-contrast=\"auto\">ing<\/span><span data-contrast=\"auto\"> a pub\/sub technique to alert<\/span><span data-contrast=\"auto\">\u00a0<\/span><span data-contrast=\"auto\">delta updates or new inserts:<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559685&quot;:50,&quot;335559739&quot;:160,&quot;335559740&quot;:240}\">\u00a0<\/span><\/p>\n<p><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559685&quot;:50,&quot;335559739&quot;:160,&quot;335559740&quot;:240}\"><img loading=\"lazy\" decoding=\"async\" class=\"attachment-full size-full webp-format\" src=\"https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/07\/dataop3.jpg\" alt=\"A Sample pathway depicting automation of ADF for downstream system using a pub\/ sub technique to alert delta updates or new inserts\" width=\"1119\" height=\"392\" data-orig-srcset=\"https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/07\/dataop3.jpg 1119w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/07\/dataop3-300x105.jpg 300w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/07\/dataop3-1024x359.jpg 1024w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/07\/dataop3-768x269.jpg 768w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/07\/dataop3-330x116.jpg 330w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/07\/dataop3-800x280.jpg 800w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/07\/dataop3-400x140.jpg 400w\" data-orig-src=\"https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/07\/dataop3.jpg\" \/><\/span><\/p>\n<p style=\"text-align: center\"><em>Figure\u00a02\u00a0&#8211; Pub\/Sub\u00a0Pattern\u00a0<\/em><\/p>\n<p>&nbsp;<\/p>\n<h3>Databricks: Working Group Concept<\/h3>\n<p><span data-contrast=\"auto\">We should aim at\u00a0<\/span><span data-contrast=\"auto\">build<\/span><span data-contrast=\"auto\">ing<\/span><span data-contrast=\"auto\">\u00a0an end-to-end data pipeline comprised of functional components<\/span><span data-contrast=\"auto\">\u00a0of\u00a0<\/span><a href=\"https:\/\/docs.databricks.com\/getting-started\/concepts.html\" target=\"_blank\" rel=\"noopener noreferrer\"><span data-contrast=\"none\">Databricks workspace<\/span><\/a><span data-contrast=\"auto\">, per \u201cworking group<\/span><span data-contrast=\"auto\">\u201d, to cater to the consumer of the data lake. Namely <\/span><span data-contrast=\"auto\">data engineering<\/span><span data-contrast=\"auto\">,\u00a0<\/span><span data-contrast=\"auto\">data analysis<\/span><span data-contrast=\"auto\">\u00a0and\u00a0<\/span><span data-contrast=\"auto\">machine learning<\/span><span data-contrast=\"auto\">.<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559685&quot;:709,&quot;335559739&quot;:160,&quot;335559740&quot;:240}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">Each \u201cworking group\u201d\u00a0<\/span><span data-contrast=\"auto\">may provide a<\/span><span data-contrast=\"auto\">\u00a0Unified Analytics Platform that brings together\u00a0<\/span><span data-contrast=\"auto\">Big D<\/span><span data-contrast=\"auto\">ata and AI, and allows the different people\/users\/analysts of <\/span><span data-contrast=\"auto\">the\u00a0<\/span><span data-contrast=\"auto\">organisation to come together and collaborate in a common and <\/span><span data-contrast=\"none\">secure space.<\/span><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"attachment-full size-full webp-format\" src=\"https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/07\/dataop4.jpg\" alt=\"\u202fFigure\u00a03\u00a0\u2013 Working Group\u00a0\" width=\"1148\" height=\"366\" data-orig-srcset=\"https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/07\/dataop4.jpg 1148w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/07\/dataop4-300x96.jpg 300w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/07\/dataop4-1024x326.jpg 1024w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/07\/dataop4-768x245.jpg 768w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/07\/dataop4-330x105.jpg 330w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/07\/dataop4-800x255.jpg 800w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/07\/dataop4-400x128.jpg 400w\" data-orig-src=\"https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/07\/dataop4.jpg\" \/><\/p>\n<p style=\"text-align: center\"><em>\u202fFigure\u00a03 &#8211; Working Group\u00a0<\/em><\/p>\n<p><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559739&quot;:160,&quot;335559740&quot;:240}\">\u00a0<\/span><\/p>\n<h3><span data-contrast=\"auto\">External Share within Azure<\/span><\/h3>\n<p><span data-contrast=\"auto\">There is a need to share the data withi<\/span><span data-contrast=\"auto\">n<\/span><span data-contrast=\"auto\"> and across organisations.\u00a0<\/span><a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/data-share\/overview\" target=\"_blank\" rel=\"noopener noreferrer\"><span data-contrast=\"none\">Azure Data Share<\/span><\/a><span data-contrast=\"auto\"> enables organisations to simply and securely share data with multiple customers and partners<\/span><span data-contrast=\"auto\">.\u00a0<\/span><span data-contrast=\"auto\">\u00a0A comprehensive list of ways to access ADLS is shared\u00a0<\/span><a href=\"https:\/\/www.jamesserra.com\/archive\/2019\/09\/ways-to-access-data-in-adls-gen2\/\" target=\"_blank\" rel=\"noopener noreferrer\"><span data-contrast=\"none\">here<\/span><\/a><span data-contrast=\"auto\">. A similar thought process could go into the other storage accounts.\u00a0<\/span><span data-ccp-props=\"{&quot;134233279&quot;:true,&quot;201341983&quot;:0,&quot;335559685&quot;:1134,&quot;335559739&quot;:160,&quot;335559740&quot;:259,&quot;335559991&quot;:425}\">\u00a0<\/span><\/p>\n<p><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559739&quot;:160,&quot;335559740&quot;:240}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">\u202f<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559739&quot;:160,&quot;335559740&quot;:240}\">\u00a0<\/span><\/p>\n<h2>5. Operations and Logging Framework<\/h2>\n<p><span data-contrast=\"none\">Building a data lake is a continuous and interactive process. Large organisations usually have built-in processes for <\/span><span data-contrast=\"none\">application maintenance<\/span><span data-contrast=\"none\">\u00a0activities with which the data lake integrates.\u00a0<\/span><span data-contrast=\"none\">There\u00a0<\/span><span data-contrast=\"none\">are al<\/span><span data-contrast=\"none\">erts<\/span><span data-contrast=\"none\"> and monitoring frameworks that are set up to look at things like the health and performance of a system<\/span><span data-contrast=\"none\">, <\/span><span data-contrast=\"none\">and it is\u00a0<\/span><span data-contrast=\"none\">essential to monitor<\/span><span data-contrast=\"none\">\u00a0the\u00a0<\/span><span data-contrast=\"none\">data flow to alter functioning during operational snags<\/span><span data-contrast=\"none\"> and i<\/span><span data-contrast=\"none\">ntegrate<\/span><span data-contrast=\"none\"> it into the<\/span><span data-contrast=\"none\">\u00a0<\/span><span data-contrast=\"none\">ITSM\u00a0<\/span><span data-contrast=\"none\">systems of the organisation<\/span><span data-contrast=\"none\">.<\/span><span data-ccp-props=\"{&quot;134233117&quot;:true,&quot;134233118&quot;:true,&quot;201341983&quot;:0,&quot;335559685&quot;:720,&quot;335559740&quot;:240}\">\u00a0<\/span><\/p>\n<p><a href=\"https:\/\/www.cloudtp.com\/doppler\/demystifying-native-logging-and-monitoring-in-azure\/\" target=\"_blank\" rel=\"noopener noreferrer\"><span data-contrast=\"none\">Azure Monitor<\/span><\/a><span data-contrast=\"none\">\u202fand\u202f<\/span><a href=\"https:\/\/docs.microsoft.com\/en-us\/previous-versions\/azure\/security\/fundamentals\/azure-log-integration-overview\" target=\"_blank\" rel=\"noopener noreferrer\"><span data-contrast=\"none\">Log Analytics<\/span><\/a><span data-contrast=\"none\">\u202fprovide microscopic details of the functioning of A<\/span><span data-contrast=\"none\">zure components, pipelines and notebook by assisting in central operation and logging framework. However, detailed info capturing requirements in log analytics could burn a hole in your pocket. Building in a custom monitoring and logging framework with a central database, which enables logging throughout the orchestration pathway by ADF, could help establish a healthy pipeline framework for ADF and event-based data flow within the system<\/span><span data-contrast=\"none\">.<\/span><span data-ccp-props=\"{&quot;134233117&quot;:true,&quot;134233118&quot;:true,&quot;201341983&quot;:0,&quot;335559685&quot;:720,&quot;335559740&quot;:240}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">\u202f<img loading=\"lazy\" decoding=\"async\" class=\"attachment-full size-full webp-format\" src=\"https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/07\/dataop5.jpg\" alt=\"Figure\u00a04- Operations and Logging Framework\u00a0\" width=\"1136\" height=\"156\" data-orig-srcset=\"https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/07\/dataop5.jpg 1136w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/07\/dataop5-300x41.jpg 300w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/07\/dataop5-1024x141.jpg 1024w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/07\/dataop5-768x105.jpg 768w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/07\/dataop5-330x45.jpg 330w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/07\/dataop5-800x110.jpg 800w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/07\/dataop5-400x55.jpg 400w\" data-orig-src=\"https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/07\/dataop5.jpg\" \/><\/span><\/p>\n<p style=\"text-align: center\"><i><span data-contrast=\"none\">Figure\u00a0<\/span><\/i><i><span data-contrast=\"none\">4 &#8211; Operations and Logging Framework<\/span><\/i><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335551550&quot;:2,&quot;335551620&quot;:2,&quot;335559739&quot;:200,&quot;335559740&quot;:240}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">\u202f<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559739&quot;:160,&quot;335559740&quot;:240}\">\u00a0<\/span><\/p>\n<h2>6. Cataloguing<\/h2>\n<p><span data-contrast=\"none\">Trust your data. One single version of the truth. Solo copy. Redundancy. True definition. Data owner.<\/span><\/p>\n<p><span data-contrast=\"none\">These are the key challenges companies and data handlers face, fuelling the need of a data lake. It traces the lineage, source of data origin and enlightens the transformation which is the norm even in machine learning operations \u2013 it&#8217;s a stepping stone for\u202f<\/span><i><span data-contrast=\"none\">Responsible AI<\/span><\/i><span data-contrast=\"none\">.\u00a0<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559685&quot;:360,&quot;335559739&quot;:160,&quot;335559740&quot;:259}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"none\">Data cataloguing provides a comprehensive view of every detail across databases and data sources. A good data catalogue output should be Data Dictionary, Data Lineage and Entity <\/span><span data-contrast=\"none\">Relationships<\/span><span data-contrast=\"none\">.<\/span><span data-contrast=\"none\">\u00a0Third<\/span><span data-contrast=\"none\"> party solutions like Informatica serve this purpose as per organisational needs. Economical solutions like a combination of SharePoint and Excel documents, and a manually or automated SQL <\/span><span data-contrast=\"none\">db<\/span><span data-contrast=\"none\"> instance for search, provide for end-to-end cataloguing.\u202f<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559685&quot;:360,&quot;335559739&quot;:160,&quot;335559740&quot;:259}\">\u00a0<\/span><\/p>\n<p><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559739&quot;:160,&quot;335559740&quot;:240}\">\u00a0<\/span><\/p>\n<p><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559739&quot;:160,&quot;335559740&quot;:240}\">\u00a0<\/span><\/p>\n<h2>Additional Reading<\/h2>\n<p><span data-contrast=\"auto\">Data Ops<\/span><\/p>\n<ul>\n<li><a href=\"https:\/\/github.com\/davedoesdemos\/DataDevOps\/blob\/master\/Databricks\/DatabricksDevOps.md\" target=\"_blank\" rel=\"noopener noreferrer\"><span data-contrast=\"none\">https:\/\/github.com\/davedoesdemos\/DataDevOps\/blob\/master\/Databricks\/DatabricksDevOps.md<\/span><\/a><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559739&quot;:160,&quot;335559740&quot;:259}\">\u00a0<\/span><\/li>\n<li><a style=\"background-color: #ffffff;font-size: 1.4rem\" href=\"https:\/\/github.com\/davedoesdemos\/DataDevOps\/blob\/master\/Data_Factory\/ADFDevOps.md\" target=\"_blank\" rel=\"noopener noreferrer\"><span data-contrast=\"none\">https:\/\/github.com\/davedoesdemos\/DataDevOps\/blob\/master\/Data_Factory\/ADFDevOps.md<\/span><\/a><span style=\"font-size: 1.4rem\" data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559739&quot;:160,&quot;335559740&quot;:259}\">\u00a0<\/span><\/li>\n<\/ul>\n<p><span data-contrast=\"auto\">Choosing the right storage for your needs:<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559739&quot;:160,&quot;335559740&quot;:240}\">\u00a0<\/span><\/p>\n<ul>\n<li><a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/architecture\/data-guide\/technology-choices\/data-storage\" target=\"_blank\" rel=\"noopener noreferrer\"><span data-contrast=\"none\">https:\/\/docs.microsoft.com\/en-us\/azure\/architecture\/data-guide\/technology-choices\/data-storage<\/span><\/a><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559739&quot;:160,&quot;335559740&quot;:240}\">\u00a0<\/span><\/li>\n<\/ul>\n<p><span data-contrast=\"auto\">ADLS Gen2:<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559739&quot;:160,&quot;335559740&quot;:240}\">\u00a0<\/span><\/p>\n<ul>\n<li><a href=\"https:\/\/github.com\/rukmani-msft\/adlsguidancedoc\/blob\/master\/Hitchhikers_Guide_to_the_Datalake.md#what-data-format-do-i-choose\" target=\"_blank\" rel=\"noopener noreferrer\"><span data-contrast=\"none\">https:\/\/github.com\/rukmani-msft\/adlsguidancedoc\/blob\/master\/Hitchhikers_Guide_to_the_Datalake.md#what-data-format-do-i-choose<\/span><\/a><\/li>\n<li><a style=\"background-color: #ffffff;font-size: 1.4rem\" href=\"https:\/\/cm-edgetun.pages.dev\/en-gb\/industry\/blog\/technetuk\/2020\/04\/09\/building-your-data-lake-on-azure-data-lake-storage-gen2-part-1\/\" target=\"_blank\" rel=\"noopener noreferrer\"><span data-contrast=\"none\">https:\/\/cm-edgetun.pages.dev\/en-gb\/industry\/blog\/technetuk\/2020\/04\/09\/building-your-data-lake-on-azure-data-lake-storage-gen2-part-1\/<\/span><\/a><span style=\"font-size: 1.4rem\" data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559739&quot;:160,&quot;335559740&quot;:240}\">\u00a0<\/span><\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>Data lake operationalisation is a colossal topic\u00a0with many\u00a0deliberations\u00a0on either building a right data lake or defining the right strategy.\u00a0This blog\u00a0provides six mantras\u00a0for organisations to\u00a0ruminate\u00a0in\u00a0order\u00a0to successfully tame the \u201cOperationalising\u201d of a data lake\u00a0post production release.<\/p>\n","protected":false},"author":430,"featured_media":31965,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"ep_exclude_from_search":false,"_classifai_error":"","_classifai_text_to_speech_error":"","footnotes":""},"categories":[594],"post_tag":[519],"content-type":[],"coauthors":[1413],"class_list":["post-36780","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-technetuk","tag-technet-uk"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.4 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>How to Operationalise your Data Lake - Microsoft Industry Blogs - United Kingdom<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/cm-edgetun.pages.dev\/en-gb\/industry\/blog\/technetuk\/2020\/07\/07\/how-to-operationalise-your-data-lake\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"How to Operationalise your Data Lake - Microsoft Industry Blogs - United Kingdom\" \/>\n<meta property=\"og:description\" content=\"Data lake operationalisation is a colossal topic\u00a0with many\u00a0deliberations\u00a0on either building a right data lake or defining the right strategy.\u00a0This blog\u00a0provides six mantras\u00a0for organisations to\u00a0ruminate\u00a0in\u00a0order\u00a0to successfully tame the \u201cOperationalising\u201d of a data lake\u00a0post production release.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/cm-edgetun.pages.dev\/en-gb\/industry\/blog\/technetuk\/2020\/07\/07\/how-to-operationalise-your-data-lake\/\" \/>\n<meta property=\"og:site_name\" content=\"Microsoft Industry Blogs - United Kingdom\" \/>\n<meta property=\"article:published_time\" content=\"2020-07-07T14:00:41+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/cm-edgetun.pages.dev\/en-gb\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/04\/DataLakeThumb.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"800\" \/>\n\t<meta property=\"og:image:height\" content=\"450\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"Nidhi Sinha\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Nidhi Sinha\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"6 min read\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/cm-edgetun.pages.dev\\\/en-gb\\\/industry\\\/blog\\\/technetuk\\\/2020\\\/07\\\/07\\\/how-to-operationalise-your-data-lake\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/cm-edgetun.pages.dev\\\/en-gb\\\/industry\\\/blog\\\/technetuk\\\/2020\\\/07\\\/07\\\/how-to-operationalise-your-data-lake\\\/\"},\"author\":[{\"@id\":\"https:\\\/\\\/cm-edgetun.pages.dev\\\/en-gb\\\/industry\\\/blog\\\/author\\\/nidhi-sinha\\\/\",\"@type\":\"Person\",\"@name\":\"Nidhi Sinha\"}],\"headline\":\"How to Operationalise your Data Lake\",\"datePublished\":\"2020-07-07T14:00:41+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/cm-edgetun.pages.dev\\\/en-gb\\\/industry\\\/blog\\\/technetuk\\\/2020\\\/07\\\/07\\\/how-to-operationalise-your-data-lake\\\/\"},\"wordCount\":1669,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\\\/\\\/cm-edgetun.pages.dev\\\/en-gb\\\/industry\\\/blog\\\/#organization\"},\"image\":{\"@id\":\"https:\\\/\\\/cm-edgetun.pages.dev\\\/en-gb\\\/industry\\\/blog\\\/technetuk\\\/2020\\\/07\\\/07\\\/how-to-operationalise-your-data-lake\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/cm-edgetun.pages.dev\\\/en-gb\\\/industry\\\/blog\\\/wp-content\\\/uploads\\\/sites\\\/22\\\/2020\\\/04\\\/DataLakeThumb.jpg\",\"keywords\":[\"TechNet UK\"],\"articleSection\":[\"TechNet UK\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\\\/\\\/cm-edgetun.pages.dev\\\/en-gb\\\/industry\\\/blog\\\/technetuk\\\/2020\\\/07\\\/07\\\/how-to-operationalise-your-data-lake\\\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/cm-edgetun.pages.dev\\\/en-gb\\\/industry\\\/blog\\\/technetuk\\\/2020\\\/07\\\/07\\\/how-to-operationalise-your-data-lake\\\/\",\"url\":\"https:\\\/\\\/cm-edgetun.pages.dev\\\/en-gb\\\/industry\\\/blog\\\/technetuk\\\/2020\\\/07\\\/07\\\/how-to-operationalise-your-data-lake\\\/\",\"name\":\"How to Operationalise your Data Lake - Microsoft Industry Blogs - United Kingdom\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/cm-edgetun.pages.dev\\\/en-gb\\\/industry\\\/blog\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/cm-edgetun.pages.dev\\\/en-gb\\\/industry\\\/blog\\\/technetuk\\\/2020\\\/07\\\/07\\\/how-to-operationalise-your-data-lake\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/cm-edgetun.pages.dev\\\/en-gb\\\/industry\\\/blog\\\/technetuk\\\/2020\\\/07\\\/07\\\/how-to-operationalise-your-data-lake\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/cm-edgetun.pages.dev\\\/en-gb\\\/industry\\\/blog\\\/wp-content\\\/uploads\\\/sites\\\/22\\\/2020\\\/04\\\/DataLakeThumb.jpg\",\"datePublished\":\"2020-07-07T14:00:41+00:00\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/cm-edgetun.pages.dev\\\/en-gb\\\/industry\\\/blog\\\/technetuk\\\/2020\\\/07\\\/07\\\/how-to-operationalise-your-data-lake\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/cm-edgetun.pages.dev\\\/en-gb\\\/industry\\\/blog\\\/technetuk\\\/2020\\\/07\\\/07\\\/how-to-operationalise-your-data-lake\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/cm-edgetun.pages.dev\\\/en-gb\\\/industry\\\/blog\\\/technetuk\\\/2020\\\/07\\\/07\\\/how-to-operationalise-your-data-lake\\\/#primaryimage\",\"url\":\"https:\\\/\\\/cm-edgetun.pages.dev\\\/en-gb\\\/industry\\\/blog\\\/wp-content\\\/uploads\\\/sites\\\/22\\\/2020\\\/04\\\/DataLakeThumb.jpg\",\"contentUrl\":\"https:\\\/\\\/cm-edgetun.pages.dev\\\/en-gb\\\/industry\\\/blog\\\/wp-content\\\/uploads\\\/sites\\\/22\\\/2020\\\/04\\\/DataLakeThumb.jpg\",\"width\":800,\"height\":450,\"caption\":\"The Data Lake Analytics logo, next to an illustration of Bit the Raccoon.\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/cm-edgetun.pages.dev\\\/en-gb\\\/industry\\\/blog\\\/technetuk\\\/2020\\\/07\\\/07\\\/how-to-operationalise-your-data-lake\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/cm-edgetun.pages.dev\\\/en-gb\\\/industry\\\/blog\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"How to Operationalise your Data Lake\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/cm-edgetun.pages.dev\\\/en-gb\\\/industry\\\/blog\\\/#website\",\"url\":\"https:\\\/\\\/cm-edgetun.pages.dev\\\/en-gb\\\/industry\\\/blog\\\/\",\"name\":\"Microsoft Industry Blogs - United Kingdom\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\\\/\\\/cm-edgetun.pages.dev\\\/en-gb\\\/industry\\\/blog\\\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/cm-edgetun.pages.dev\\\/en-gb\\\/industry\\\/blog\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\\\/\\\/cm-edgetun.pages.dev\\\/en-gb\\\/industry\\\/blog\\\/#organization\",\"name\":\"Microsoft Industry Blogs - United Kingdom\",\"url\":\"https:\\\/\\\/cm-edgetun.pages.dev\\\/en-gb\\\/industry\\\/blog\\\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/cm-edgetun.pages.dev\\\/en-gb\\\/industry\\\/blog\\\/#\\\/schema\\\/logo\\\/image\\\/\",\"url\":\"https:\\\/\\\/cm-edgetun.pages.dev\\\/en-gb\\\/industry\\\/blog\\\/wp-content\\\/uploads\\\/sites\\\/22\\\/2019\\\/08\\\/Microsoft-Logo.png\",\"contentUrl\":\"https:\\\/\\\/cm-edgetun.pages.dev\\\/en-gb\\\/industry\\\/blog\\\/wp-content\\\/uploads\\\/sites\\\/22\\\/2019\\\/08\\\/Microsoft-Logo.png\",\"width\":259,\"height\":194,\"caption\":\"Microsoft Industry Blogs - United Kingdom\"},\"image\":{\"@id\":\"https:\\\/\\\/cm-edgetun.pages.dev\\\/en-gb\\\/industry\\\/blog\\\/#\\\/schema\\\/logo\\\/image\\\/\"}}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"How to Operationalise your Data Lake - Microsoft Industry Blogs - United Kingdom","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/cm-edgetun.pages.dev\/en-gb\/industry\/blog\/technetuk\/2020\/07\/07\/how-to-operationalise-your-data-lake\/","og_locale":"en_US","og_type":"article","og_title":"How to Operationalise your Data Lake - Microsoft Industry Blogs - United Kingdom","og_description":"Data lake operationalisation is a colossal topic\u00a0with many\u00a0deliberations\u00a0on either building a right data lake or defining the right strategy.\u00a0This blog\u00a0provides six mantras\u00a0for organisations to\u00a0ruminate\u00a0in\u00a0order\u00a0to successfully tame the \u201cOperationalising\u201d of a data lake\u00a0post production release.","og_url":"https:\/\/cm-edgetun.pages.dev\/en-gb\/industry\/blog\/technetuk\/2020\/07\/07\/how-to-operationalise-your-data-lake\/","og_site_name":"Microsoft Industry Blogs - United Kingdom","article_published_time":"2020-07-07T14:00:41+00:00","og_image":[{"width":800,"height":450,"url":"https:\/\/cm-edgetun.pages.dev\/en-gb\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/04\/DataLakeThumb.jpg","type":"image\/jpeg"}],"author":"Nidhi Sinha","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Nidhi Sinha","Est. reading time":"6 min read"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/cm-edgetun.pages.dev\/en-gb\/industry\/blog\/technetuk\/2020\/07\/07\/how-to-operationalise-your-data-lake\/#article","isPartOf":{"@id":"https:\/\/cm-edgetun.pages.dev\/en-gb\/industry\/blog\/technetuk\/2020\/07\/07\/how-to-operationalise-your-data-lake\/"},"author":[{"@id":"https:\/\/cm-edgetun.pages.dev\/en-gb\/industry\/blog\/author\/nidhi-sinha\/","@type":"Person","@name":"Nidhi Sinha"}],"headline":"How to Operationalise your Data Lake","datePublished":"2020-07-07T14:00:41+00:00","mainEntityOfPage":{"@id":"https:\/\/cm-edgetun.pages.dev\/en-gb\/industry\/blog\/technetuk\/2020\/07\/07\/how-to-operationalise-your-data-lake\/"},"wordCount":1669,"commentCount":0,"publisher":{"@id":"https:\/\/cm-edgetun.pages.dev\/en-gb\/industry\/blog\/#organization"},"image":{"@id":"https:\/\/cm-edgetun.pages.dev\/en-gb\/industry\/blog\/technetuk\/2020\/07\/07\/how-to-operationalise-your-data-lake\/#primaryimage"},"thumbnailUrl":"https:\/\/cm-edgetun.pages.dev\/en-gb\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/04\/DataLakeThumb.jpg","keywords":["TechNet UK"],"articleSection":["TechNet UK"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/cm-edgetun.pages.dev\/en-gb\/industry\/blog\/technetuk\/2020\/07\/07\/how-to-operationalise-your-data-lake\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/cm-edgetun.pages.dev\/en-gb\/industry\/blog\/technetuk\/2020\/07\/07\/how-to-operationalise-your-data-lake\/","url":"https:\/\/cm-edgetun.pages.dev\/en-gb\/industry\/blog\/technetuk\/2020\/07\/07\/how-to-operationalise-your-data-lake\/","name":"How to Operationalise your Data Lake - Microsoft Industry Blogs - United Kingdom","isPartOf":{"@id":"https:\/\/cm-edgetun.pages.dev\/en-gb\/industry\/blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/cm-edgetun.pages.dev\/en-gb\/industry\/blog\/technetuk\/2020\/07\/07\/how-to-operationalise-your-data-lake\/#primaryimage"},"image":{"@id":"https:\/\/cm-edgetun.pages.dev\/en-gb\/industry\/blog\/technetuk\/2020\/07\/07\/how-to-operationalise-your-data-lake\/#primaryimage"},"thumbnailUrl":"https:\/\/cm-edgetun.pages.dev\/en-gb\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/04\/DataLakeThumb.jpg","datePublished":"2020-07-07T14:00:41+00:00","breadcrumb":{"@id":"https:\/\/cm-edgetun.pages.dev\/en-gb\/industry\/blog\/technetuk\/2020\/07\/07\/how-to-operationalise-your-data-lake\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/cm-edgetun.pages.dev\/en-gb\/industry\/blog\/technetuk\/2020\/07\/07\/how-to-operationalise-your-data-lake\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/cm-edgetun.pages.dev\/en-gb\/industry\/blog\/technetuk\/2020\/07\/07\/how-to-operationalise-your-data-lake\/#primaryimage","url":"https:\/\/cm-edgetun.pages.dev\/en-gb\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/04\/DataLakeThumb.jpg","contentUrl":"https:\/\/cm-edgetun.pages.dev\/en-gb\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/04\/DataLakeThumb.jpg","width":800,"height":450,"caption":"The Data Lake Analytics logo, next to an illustration of Bit the Raccoon."},{"@type":"BreadcrumbList","@id":"https:\/\/cm-edgetun.pages.dev\/en-gb\/industry\/blog\/technetuk\/2020\/07\/07\/how-to-operationalise-your-data-lake\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/cm-edgetun.pages.dev\/en-gb\/industry\/blog\/"},{"@type":"ListItem","position":2,"name":"How to Operationalise your Data Lake"}]},{"@type":"WebSite","@id":"https:\/\/cm-edgetun.pages.dev\/en-gb\/industry\/blog\/#website","url":"https:\/\/cm-edgetun.pages.dev\/en-gb\/industry\/blog\/","name":"Microsoft Industry Blogs - United Kingdom","description":"","publisher":{"@id":"https:\/\/cm-edgetun.pages.dev\/en-gb\/industry\/blog\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/cm-edgetun.pages.dev\/en-gb\/industry\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/cm-edgetun.pages.dev\/en-gb\/industry\/blog\/#organization","name":"Microsoft Industry Blogs - United Kingdom","url":"https:\/\/cm-edgetun.pages.dev\/en-gb\/industry\/blog\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/cm-edgetun.pages.dev\/en-gb\/industry\/blog\/#\/schema\/logo\/image\/","url":"https:\/\/cm-edgetun.pages.dev\/en-gb\/industry\/blog\/wp-content\/uploads\/sites\/22\/2019\/08\/Microsoft-Logo.png","contentUrl":"https:\/\/cm-edgetun.pages.dev\/en-gb\/industry\/blog\/wp-content\/uploads\/sites\/22\/2019\/08\/Microsoft-Logo.png","width":259,"height":194,"caption":"Microsoft Industry Blogs - United Kingdom"},"image":{"@id":"https:\/\/cm-edgetun.pages.dev\/en-gb\/industry\/blog\/#\/schema\/logo\/image\/"}}]}},"_links":{"self":[{"href":"https:\/\/cm-edgetun.pages.dev\/en-gb\/industry\/blog\/wp-json\/wp\/v2\/posts\/36780","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/cm-edgetun.pages.dev\/en-gb\/industry\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/cm-edgetun.pages.dev\/en-gb\/industry\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-gb\/industry\/blog\/wp-json\/wp\/v2\/users\/430"}],"replies":[{"embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-gb\/industry\/blog\/wp-json\/wp\/v2\/comments?post=36780"}],"version-history":[{"count":0,"href":"https:\/\/cm-edgetun.pages.dev\/en-gb\/industry\/blog\/wp-json\/wp\/v2\/posts\/36780\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-gb\/industry\/blog\/wp-json\/wp\/v2\/media\/31965"}],"wp:attachment":[{"href":"https:\/\/cm-edgetun.pages.dev\/en-gb\/industry\/blog\/wp-json\/wp\/v2\/media?parent=36780"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-gb\/industry\/blog\/wp-json\/wp\/v2\/categories?post=36780"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-gb\/industry\/blog\/wp-json\/wp\/v2\/post_tag?post=36780"},{"taxonomy":"content-type","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-gb\/industry\/blog\/wp-json\/wp\/v2\/content-type?post=36780"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-gb\/industry\/blog\/wp-json\/wp\/v2\/coauthors?post=36780"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}