{"id":38976,"date":"2020-08-18T15:40:52","date_gmt":"2020-08-18T14:40:52","guid":{"rendered":"https:\/\/cm-edgetun.pages.dev\/en-gb\/industry\/blog\/?p=38976"},"modified":"2020-09-01T15:06:05","modified_gmt":"2020-09-01T14:06:05","slug":"just-in-time-azure-databricks-access-tokens-and-instance-pools-for-azure-data-factory-pipelines-using-workspace-automation","status":"publish","type":"post","link":"https:\/\/cm-edgetun.pages.dev\/en-gb\/industry\/blog\/technetuk\/2020\/08\/18\/just-in-time-azure-databricks-access-tokens-and-instance-pools-for-azure-data-factory-pipelines-using-workspace-automation\/","title":{"rendered":"Just-in-time Azure Databricks access tokens and instance pools for Azure Data Factory pipelines using workspace automation"},"content":{"rendered":"<p><img loading=\"lazy\" decoding=\"async\" class=\"attachment-full size-full webp-format\" src=\"https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/02\/databricksheader.jpg\" alt=\"An image representing Data Bricks, next to an illustration of Bit the Raccoon.\" width=\"1920\" height=\"700\" data-orig-srcset=\"https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/02\/databricksheader.jpg 1920w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/02\/databricksheader-300x109.jpg 300w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/02\/databricksheader-1024x373.jpg 1024w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/02\/databricksheader-768x280.jpg 768w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/02\/databricksheader-1536x560.jpg 1536w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/02\/databricksheader-330x120.jpg 330w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/02\/databricksheader-800x292.jpg 800w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/02\/databricksheader-400x146.jpg 400w\" data-orig-src=\"https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/02\/databricksheader.jpg\" \/><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"attachment-full webp-format aligncenter\" src=\"https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD1.png\" alt=\"An illustration of an example Azure Data Factory setup\" width=\"568\" height=\"366\" data-orig-srcset=\"https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD1.png 1260w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD1-300x193.png 300w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD1-1024x659.png 1024w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD1-768x494.png 768w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD1-330x212.png 330w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD1-800x515.png 800w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD1-400x257.png 400w\" data-orig-src=\"https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD1.png\" \/><\/p>\n<h2>Introduction<\/h2>\n<p>Using AAD tokens it is now possible to generate an Azure Databricks personal access token programmatically, and provision an instance pool using the Instance Pools API. The token can be generated and utilised at run-time to provide \u201cjust-in-time\u201d access to the Databricks workspace. Using the same AAD token, an instance pool can also be provisioned and used to run a series of Databricks activities in the same ADF pipeline.<\/p>\n<p>For those orchestrating Databricks activities via Azure Data Factory, this can offer a number of potential advantages:<\/p>\n<ul>\n<li><strong>Increases agility<\/strong>, reduces <a href=\"https:\/\/medium.com\/@carsonyeung\/my-development-notebook-got-executed-on-production-databricks-cluster-how-did-that-happen-940a21b90b3a\" target=\"_blank\" rel=\"noopener noreferrer\">potential human-error<\/a> and decreases dependency on platform teams<\/li>\n<li><span style=\"font-size: 1.4rem\"><strong>Reduces spin-up time<\/strong> in scenarios where a series of Databricks activities are run in a pipeline or set of chained pipelines.<\/span><\/li>\n<li><span style=\"font-size: 1.4rem\">Implement <strong>ADF activity based workflow<\/strong> as an alternative to <a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/databricks\/notebooks\/notebook-workflows\" target=\"_blank\" rel=\"noopener noreferrer\">notebook workflows<\/a>.<\/span><\/li>\n<li><span style=\"font-size: 1.4rem\"><strong>Establish guard rails<\/strong>, business logic and validation during the provisioning processes.<\/span><\/li>\n<li><span style=\"font-size: 1.4rem\"><strong>Increased governance<\/strong> of tokens and instance pools. Implements best practice by <a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/security\/fundamentals\/identity-management-best-practices#lower-exposure-of-privileged-accounts\" target=\"_blank\" rel=\"noopener noreferrer\">reducing exposure time<\/a> of privileged tokens.<\/span><\/li>\n<\/ul>\n<p>&nbsp;<\/p>\n<h2>The Just-in-time Solution<\/h2>\n<p>The following diagram depicts the architecture and flow of events:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"attachment-full webp-format aligncenter\" src=\"https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD2.png\" alt=\"An illustration of an example Azure Data Factory setup\" width=\"598\" height=\"361\" data-orig-srcset=\"https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD2.png 1380w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD2-300x181.png 300w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD2-1024x618.png 1024w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD2-768x464.png 768w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD2-330x199.png 330w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD2-800x483.png 800w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD2-400x241.png 400w\" data-orig-src=\"https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD2.png\" \/><\/p>\n<ol>\n<li>A pipeline invokes an Azure Function<\/li>\n<li><span style=\"font-size: 1.4rem\">The Function App uses <a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/databricks\/dev-tools\/api\/latest\/aad\/service-prin-aad-token#--get-an-azure-active-directory-access-token\" target=\"_blank\" rel=\"noopener noreferrer\">client credential flow<\/a> to get an access token with the Azure Databricks login application as the resource.<\/span><\/li>\n<li><span style=\"font-size: 1.4rem\">Using the access token the Function App generates a Databricks access token (PAT) using the Token API and creates an instance pool using the Instance Pool API.<\/span><\/li>\n<li><span style=\"font-size: 1.4rem\">The Function App stores the Databricks access token and Pool ID in Azure Key Vault<\/span><\/li>\n<li><span style=\"font-size: 1.4rem\">The Databricks activities run utilising both the access token and instance pools created, retrieving these details from Key Vault at run time.<\/span><\/li>\n<\/ol>\n<p>Extending this approach a little further can provide excellent <a href=\"https:\/\/docs.microsoft.com\/en-us\/dotnet\/architecture\/modern-web-apps-azure\/architectural-principles#separation-of-concerns\" target=\"_blank\" rel=\"noopener noreferrer\">separation of concerns<\/a> between the platform team responsible for provisioning the infrastructure i.e. the Databricks runtime environment, and the data team depending on this environment to run their data pipelines. Using the technique described <a href=\"https:\/\/cm-edgetun.pages.dev\/en-gb\/industry\/blog\/technetuk\/2020\/03\/19\/enterprise-wide-orchestration-using-multiple-data-factories\/\" target=\"_blank\" rel=\"noopener noreferrer\">in this blog<\/a>, it would be possible for the platform team to manage an \u201cinitialisation\u201d pipeline which takes care of provisioning the environment as well as any validation and repeatable business logic. This \u201cenvironment initialisation\u201d pipeline may run in the same or another Data Factory, which then invokes or is invoked by the engineering pipeline managed by the data team running the Databricks workloads.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"attachment-full webp-format aligncenter\" src=\"https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD3.png\" alt=\"An illustration of an example Azure Data Factory setup\" width=\"590\" height=\"346\" data-orig-srcset=\"https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD3.png 1380w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD3-300x176.png 300w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD3-1024x600.png 1024w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD3-768x450.png 768w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD3-330x193.png 330w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD3-800x469.png 800w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD3-400x234.png 400w\" data-orig-src=\"https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD3.png\" \/><\/p>\n<p>&nbsp;<\/p>\n<h2>Demonstration<\/h2>\n<p>The following demo will provide a step-by-step tutorial to setup a Function app to generate the token and provision the pool, and an ADF pipeline which is provided just-in-time access to the workspace at run-time, leveraging cluster pools to run a series of Databricks activities.<\/p>\n<p><strong>Note: Any code provided should not be regarded as production ready but is simply functional for demonstration purposes.<\/strong><\/p>\n<p>&nbsp;<\/p>\n<h2>Prerequisites<\/h2>\n<p>If you wish to complete this demonstration you will need to provision the following services:<\/p>\n<ul>\n<li>Azure Data Factory<\/li>\n<li><span style=\"font-size: 1.4rem\">Azure Key Vault<\/span><\/li>\n<li><span style=\"font-size: 1.4rem\">Azure Databricks<\/span><\/li>\n<li><span style=\"font-size: 1.4rem\">Azure Function App (see additional steps)<\/span><\/li>\n<\/ul>\n<p>Additional steps:<\/p>\n<ul>\n<li>Review the readme in the <a href=\"https:\/\/github.com\/khowling\/fn-databricks-automate\" target=\"_blank\" rel=\"noopener noreferrer\">Github repo<\/a> which includes steps to create the service principal, provision and deploy the Function App. Whilst the code referenced in this repo is written in JavaScript, an example Python script can be found <a href=\"https:\/\/github.com\/hurtn\/databricks\/tree\/master\/workspace-automation\" target=\"_blank\" rel=\"noopener noreferrer\">here<\/a>.<\/li>\n<li><span style=\"font-size: 1.4rem\">As a once-off activity, the service principal will need to be added to the admin group of the workspace using the <a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/databricks\/dev-tools\/api\/latest\/aad\/service-prin-aad-token#admin-user-login\" target=\"_blank\" rel=\"noopener noreferrer\">admin login<\/a>, as shown in <a href=\"https:\/\/github.com\/hurtn\/databricks\/blob\/master\/Add%20Service%20Principal%20to%20Workspace.ipynb\" target=\"_blank\" rel=\"noopener noreferrer\">this sample code<\/a>. The service principal must also be a granted the contributor role in the workspace.<\/span><\/li>\n<li><span style=\"font-size: 1.4rem\">The Databricks workspace can be premium or standard tier. To simulate some workload, create at least one Python notebook in the workspace which runs a simple command such as:<\/span><\/li>\n<\/ul>\n<pre>print(\"Workload goes here\")<\/pre>\n<p>&nbsp;<\/p>\n<h2>Key Vault Configuration<\/h2>\n<p>As per the steps in the repo ensure that the service principal has been granted access to the secrets in Key Vault via an access policy.<\/p>\n<p>In a production scenario one would need at least two Key Vaults, one for the Platform team to store their secrets that will be used by the Function App, and another for the Data team to store Databricks tokens and cluster pool IDs. For the purposes of this demo only one Key Vault is required.<\/p>\n<p>&nbsp;<\/p>\n<h2>Function App Configuration<\/h2>\n<p>After running the setup scripts and publishing the code to the function app, there is no further configuration required. The setup script creates the app config settings which store the key vault name and Databricks workspace ID, and the app is also <a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/app-service\/configure-authentication-provider-aad\" target=\"_blank\" rel=\"noopener noreferrer\">configured<\/a> to use the service principal for authentication \/ authorisation. In the <a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/azure-functions\/functions-bindings-microsoft-graph#token-input\" target=\"_blank\" rel=\"noopener noreferrer\">auth token input binding<\/a>, found in the function.json file, the correct resources are specified, namely Key Vault and Databricks and the identity is set to clientcredentials, which means that the identity of the function app is used i.e. the service principal.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"attachment-full webp-format aligncenter\" src=\"https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD4.png\" alt=\"Azure Active Directory settings\" width=\"563\" height=\"342\" data-orig-srcset=\"https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD4.png 1014w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD4-300x182.png 300w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD4-768x467.png 768w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD4-330x200.png 330w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD4-800x486.png 800w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD4-400x243.png 400w\" data-orig-src=\"https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD4.png\" \/><\/p>\n<p>This means that when the AAD token is generated or when the resource API is invoked, the service principal\u2019s credentials will be used.<\/p>\n<p>Once the app has been published, use the portal development experience to \u201cCode+Test\u201d. Starting with the function to generate the Databricks access token, use the Test functionality, enter a query name \u201cpatsecretname\u201d and value and click Run.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"attachment-full webp-format aligncenter\" src=\"https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD5.png\" alt=\"Editing the input on createDBPAT\" width=\"621\" height=\"352\" data-orig-srcset=\"https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD5.png 2470w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD5-300x170.png 300w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD5-1024x581.png 1024w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD5-768x436.png 768w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD5-1536x871.png 1536w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD5-2048x1162.png 2048w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD5-330x187.png 330w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD5-800x454.png 800w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD5-400x227.png 400w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD5-235x132.png 235w\" data-orig-src=\"https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD5.png\" \/><\/p>\n<p>One should receive a 200 OK response and find that a new secret has been stored in Key Vault with the specified name.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"attachment-full webp-format aligncenter\" src=\"https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD6.png\" alt=\"Looking at the output on createDBPAT\" width=\"740\" height=\"487\" data-orig-srcset=\"https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD6.png 2138w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD6-300x197.png 300w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD6-1024x673.png 1024w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD6-768x505.png 768w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD6-1536x1010.png 1536w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD6-2048x1347.png 2048w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD6-330x217.png 330w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD6-800x526.png 800w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD6-400x263.png 400w\" data-orig-src=\"https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD6.png\" \/><\/p>\n<p>Next test the function which creates the Databricks pool, enter a poolsecretname query parameter and ensure that a new pool has been created with the name of the query parameter specified.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"attachment-full webp-format aligncenter\" src=\"https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD7.png\" alt=\"The Clusters section on Azure Databricks\" width=\"715\" height=\"225\" data-orig-srcset=\"https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD7.png 1620w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD7-300x94.png 300w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD7-1024x322.png 1024w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD7-768x242.png 768w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD7-1536x484.png 1536w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD7-330x104.png 330w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD7-800x252.png 800w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD7-400x126.png 400w\" data-orig-src=\"https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD7.png\" \/><\/p>\n<p>This pool will not incur any cost until the instance pool is used by the ADF pipeline &#8211; so long as the min_idle_instances parameter in the request payload of the Instance Pool API was set at 0. Any other value and the pool will provision VMs at the minimum threshold specified, incurring standard VM costs.<\/p>\n<p>There will be an associated key vault secret which stores the pool ID, to be used ADF in the linked service configuration.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"attachment-full webp-format aligncenter\" src=\"https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD8.png\" alt=\"Showing the PAT test is enabled\" width=\"773\" height=\"248\" data-orig-srcset=\"https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD8.png 1681w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD8-300x96.png 300w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD8-1024x328.png 1024w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD8-768x246.png 768w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD8-1536x493.png 1536w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD8-330x106.png 330w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD8-800x257.png 800w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD8-400x128.png 400w\" data-orig-src=\"https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD8.png\" \/><\/p>\n<p>Note: Whilst a key is required to invoke the functions app, it is still accessible over the public internet. If security is a concern, consider <a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/app-service\/app-service-ip-restrictions\" target=\"_blank\" rel=\"noopener noreferrer\">access restrictions<\/a> and whitelisting the IP of the ADF <a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/data-factory\/create-self-hosted-integration-runtime\" target=\"_blank\" rel=\"noopener noreferrer\">self-hosted integration runtime<\/a> and\/or <a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/azure-functions\/functions-networking-options#private-site-access\" target=\"_blank\" rel=\"noopener noreferrer\">private site access<\/a> and deploying the integration runtime into a <a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/data-factory\/managed-virtual-network-private-endpoint\" target=\"_blank\" rel=\"noopener noreferrer\">managed vnet<\/a>.<\/p>\n<p>&nbsp;<\/p>\n<h2>Data Factory Configuration<\/h2>\n<p>Using a combination of key vault, parameters and the dynamic contents setting (in the advanced section of the linked service) it is possible to create a more dynamic linked service, into which the configuration details can be \u201cinjected\u201d at runtime.<\/p>\n<p>1. To begin, <a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/data-factory\/store-credentials-in-key-vault#steps\" target=\"_blank\" rel=\"noopener noreferrer\">grant the managed identity of ADF access to your Azure Key Vault<\/a>.<\/p>\n<p><span style=\"font-size: 1.4rem\">2. Then configuring a Key Vault linked service as described in <a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/data-factory\/store-credentials-in-key-vault#azure-key-vault-linked-service\" target=\"_blank\" rel=\"noopener noreferrer\">this tutorial<\/a>.<\/span><\/p>\n<p><span style=\"font-size: 1.4rem\">3. Next create a new linked service for Azure Databricks, define a name, then scroll down to the advanced section, tick the box to specify dynamic contents in JSON format. Enter the following JSON, substituting the capitalised placeholders with your values which refer to the <a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/databricks\/workspace\/workspace-details#--workspace-instance-names-urls-and-ids\" target=\"_blank\" rel=\"noopener noreferrer\">Databricks Workspace URL<\/a> and the Key Vault linked service created above. Note the new format &#8211; adb-..azuredatabricks.net. It is unique per-workspace! This workspace URL could have also been stored in Key Vault also, which is better practice!<\/span><\/p>\n<pre>{ \r\n    \"properties\": { \r\n        \"type\": \"AzureDatabricks\", \r\n        \"parameters\": { \r\n            \"myadbpatsecretname\": { \r\n                \"type\": \"string\" \r\n            }, \r\n            \"myadbpoolsecretname\": { \r\n                \"type\": \"string\" \r\n            }\r\n        },\r\n        \"annotations\": [], \r\n        \"typeProperties\": { \r\n            \"domain\": \"<a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/databricks\/workspace\/workspace-details#--workspace-instance-names-urls-and-ids\" target=\"_blank\" rel=\"noopener noreferrer\">WORKSPACE URL<\/a>\", \r\n            \"accessToken\": { \r\n                \"type\": \"AzureKeyVaultSecret\", \r\n                \"store\": { \r\n                    \"referenceName\": \"KEY VAULT LINKED SERVICE NAME\", \r\n                    \"type\": \"LinkedServiceReference\" \r\n                }, \r\n                \"secretName\": \"@{linkedService().myadbpatsecretname}\" \r\n            }, \r\n            \"instancePoolId\": { \r\n                \"type\": \"AzureKeyVaultSecret\", \r\n                \"store\": { \r\n                    \"referenceName\": \"KEY VAULT LINKED SERVICE NAME\", \r\n                    \"type\": \"LinkedServiceReference\" \r\n                }, \r\n                \"secretName\": \"@{linkedService().myadbpoolsecretname}\" \r\n            }, \r\n            \"newClusterNodeType\": \"Standard_DS3_v2\", \r\n            \"newClusterNumOfWorker\": \"2\", \r\n            \"newClusterVersion\": \"6.4.x-scala2.11\" \r\n        } \r\n    } \r\n}<\/pre>\n<p>Note that two parameters are created to represent the KV secrets which contain the PAT and the Pool ID. These will be the parameters passed into the pipeline at trigger time.<\/p>\n<p>After the linked service is created it should look as follows:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"attachment-full webp-format aligncenter\" src=\"https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD9.png\" alt=\"The linked service after being created\" width=\"699\" height=\"710\" data-orig-srcset=\"https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD9.png 1223w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD9-296x300.png 296w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD9-1009x1024.png 1009w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD9-768x779.png 768w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD9-246x250.png 246w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD9-330x335.png 330w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD9-800x812.png 800w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD9-400x406.png 400w\" data-orig-src=\"https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD9.png\" \/><\/p>\n<p>Note: Personal Access Tokens created by a service principal via the API are not displayed when you are logged into the Workspace UI because they have been generated by a different security principal. They are however visible via token LIST API using the AAD token generated from the service principal created above.<\/p>\n<p>4. Create another linked service to authenticate to the Azure Function app as shown in <a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/data-factory\/control-flow-azure-function-activity\" target=\"_blank\" rel=\"noopener noreferrer\">the documentation<\/a>.<\/p>\n<p>5. Next, create a pipeline and add two parameters which will represent the names of the secrets in Key Vault which will contain the access token and pool ID.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"attachment-full webp-format aligncenter\" src=\"https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD10.png\" alt=\"The parameters section in the new pipeline\" width=\"686\" height=\"294\" data-orig-srcset=\"https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD10.png 1345w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD10-300x128.png 300w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD10-1024x439.png 1024w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD10-768x329.png 768w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD10-330x141.png 330w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD10-800x343.png 800w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD10-400x171.png 400w\" data-orig-src=\"https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD10.png\" \/><\/p>\n<p>6. Drop two Function activities on to the canvas.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"attachment-full webp-format aligncenter\" src=\"https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD11.png\" alt=\"Creating two new Azure Functions\" width=\"693\" height=\"518\" data-orig-srcset=\"https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD11.png 1326w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD11-300x224.png 300w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD11-1024x765.png 1024w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD11-768x574.png 768w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD11-330x247.png 330w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD11-800x598.png 800w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD11-400x299.png 400w\" data-orig-src=\"https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD11.png\" \/><\/p>\n<p>7. Specify the Function linked service, and using the function name specify each function to be invoked as well as their associated query parameter. For generating the access token use the following expression substituting the function name if necessary:<\/p>\n<pre>@concat('createADBPAT?patsecretname=',pipeline().parameters.patsecretname)<\/pre>\n<p>8. In the next function activity specify a function name which will create the instance pool, for example:<\/p>\n<pre>@concat('createADBPool?poolsecretname=',pipeline().parameters.poolsecretname)<\/pre>\n<p>9. Connect these two activities and Publish the changes.<\/p>\n<p>10. Next, create another pipeline and add two parameters which will be passed to the pipeline which execute the Function apps.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"attachment-full webp-format aligncenter\" src=\"https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD12.png\" alt=\"Editing the parameters of a new pipeline\" width=\"706\" height=\"312\" data-orig-srcset=\"https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD12.png 1401w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD12-300x133.png 300w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD12-1024x452.png 1024w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD12-768x339.png 768w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD12-330x146.png 330w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD12-800x353.png 800w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD12-400x177.png 400w\" data-orig-src=\"https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD12.png\" \/><\/p>\n<p>11. On the canvas add an Execute Pipeline activity. Specify the pipeline created above as the invoked pipeline in the Execute Pipeline activity. In the parameters section click on the value section and add the associated pipeline parameters to pass to the invoked pipeline.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"attachment-full webp-format aligncenter\" src=\"https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD13.png\" alt=\"Creating an Execute Pipeline activity\" width=\"673\" height=\"416\" data-orig-srcset=\"https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD13.png 1617w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD13-300x186.png 300w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD13-1024x633.png 1024w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD13-768x475.png 768w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD13-1536x950.png 1536w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD13-330x204.png 330w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD13-800x495.png 800w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD13-400x247.png 400w\" data-orig-src=\"https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD13.png\" \/><\/p>\n<p>12. Add a Databricks notebook activity and specify the Databricks linked service which requires the Key Vault secrets to retrieve the access token and pool ID at run time.<\/p>\n<p>13. Add these pipeline parameters to the linked service properties so that they are passed through to the link service at trigger time.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"attachment-full webp-format aligncenter\" src=\"https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD14.png\" alt=\"Adding the pipeline parameters to the linked service\" width=\"709\" height=\"500\" data-orig-srcset=\"https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD14.png 1428w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD14-300x212.png 300w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD14-1024x722.png 1024w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD14-768x542.png 768w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD14-330x233.png 330w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD14-800x564.png 800w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD14-400x282.png 400w\" data-orig-src=\"https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD14.png\" \/><\/p>\n<p>14. Under the settings tab enter the path of the notebook created in the prerequisites. The path will similar to the following:<\/p>\n<pre>\/Users\/[USERNAME]\/Workload1<\/pre>\n<p>15. Copy and paste the Databricks activity three times and connect all the activities.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"attachment-full webp-format aligncenter\" src=\"https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD15.png\" alt=\"Connecting the other activities in Databricks\" width=\"772\" height=\"95\" data-orig-srcset=\"https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD15.png 1935w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD15-300x37.png 300w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD15-1024x126.png 1024w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD15-768x95.png 768w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD15-1536x190.png 1536w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD15-330x41.png 330w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD15-800x99.png 800w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD15-400x49.png 400w\" data-orig-src=\"https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD15.png\" \/><\/p>\n<p>16. Optionally, create another function app and activity which will revoke the access token and delete the instance pool at the end of the pipeline.<\/p>\n<p>17. Publish the changes and trigger this pipeline, monitoring the results.<\/p>\n<p>&nbsp;<\/p>\n<h2>Results<\/h2>\n<h3>With job clusters<\/h3>\n<p>Using only job clusters which spin-up with each Databricks activity, the total time for the same workload is around 18 and half minutes.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"attachment-full webp-format aligncenter\" src=\"https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD16.png\" alt=\"The estimated duration of the Databricks workload\" width=\"762\" height=\"174\" data-orig-srcset=\"https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD16.png 950w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD16-300x69.png 300w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD16-768x175.png 768w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD16-330x75.png 330w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD16-800x183.png 800w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD16-400x91.png 400w\" data-orig-src=\"https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD16.png\" \/><\/p>\n<p>Notice how each cluster takes between 4 and 5 minutes per activity.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"attachment-full webp-format aligncenter\" src=\"https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD17.png\" alt=\"A breakdown of the time per activity\" width=\"771\" height=\"434\" data-orig-srcset=\"https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD17.png 1088w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD17-300x169.png 300w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD17-1024x576.png 1024w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD17-768x432.png 768w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD17-330x186.png 330w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD17-800x450.png 800w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD17-400x225.png 400w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD17-235x132.png 235w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD17-960x540.png 960w\" data-orig-src=\"https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD17.png\" \/><\/p>\n<p>&nbsp;<\/p>\n<h2>With Instance Pools<\/h2>\n<p>Using instance pools the total time dramatically reduces to under 10 minutes.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"attachment-full webp-format aligncenter\" src=\"https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD18.png\" alt=\"The estimated duration with instance pools\" width=\"760\" height=\"201\" data-orig-srcset=\"https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD18.png 1080w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD18-300x79.png 300w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD18-1024x270.png 1024w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD18-768x203.png 768w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD18-330x87.png 330w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD18-800x211.png 800w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD18-400x106.png 400w\" data-orig-src=\"https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD18.png\" \/><\/p>\n<p>Notice how despite the first Databricks activity which took the usual 4\u20135 minutes, the remaining activities are around a minute and a half, most of the time reflected is the time taken to initialise the Spark cluster.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"attachment-full webp-format aligncenter\" src=\"https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD19.png\" alt=\"The ADBWorkloads dialogue window\" width=\"822\" height=\"483\" data-orig-srcset=\"https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD19.png 1080w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD19-300x176.png 300w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD19-1024x601.png 1024w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD19-768x451.png 768w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD19-330x194.png 330w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD19-800x470.png 800w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD19-400x235.png 400w\" data-orig-src=\"https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD19.png\" \/><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"attachment-full webp-format aligncenter\" src=\"https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD20.png\" alt=\"A breakdown of the duration each workload takes to run\" width=\"769\" height=\"256\" data-orig-srcset=\"https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD20.png 1063w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD20-300x100.png 300w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD20-1024x341.png 1024w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD20-768x256.png 768w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD20-330x110.png 330w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD20-800x266.png 800w, https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD20-400x133.png 400w\" data-orig-src=\"https:\/\/cm-edgetun.pages.dev\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/08\/JiTAD20.png\" \/><\/p>\n<h2>Conclusion<\/h2>\n<p>Automation and security improvements to the Databricks workspace are being constantly released such as AAD tokens, permissions API, token management API, cluster policies, etc. Instance pools are also another example of these improvements and can make a dramatic improvement to the completion times of your ADF-based Databricks workloads &#8211; particularly so when running a series of chained Databricks activities. Managing access to the workspace and provisioning instance pools no longer requires manual intervention when using AAD tokens for workspace automation. Granting just-in-time access to these resources reduces the chance of manual error, promotes better governance and reduces the risk of improper access token practices.<\/p>\n<p>&nbsp;<\/p>\n<h2>Useful Links<\/h2>\n<ul>\n<li><a href=\"https:\/\/cm-edgetun.pages.dev\/en-gb\/industry\/blog\/technetuk\/2020\/07\/01\/securing-access-to-azure-data-lake-gen-2-from-azure-databricks\/\" target=\"_blank\" rel=\"noopener noreferrer\">Configure access to Azure Data Lake Gen 2 from Azure Databricks<\/a><\/li>\n<li><a href=\"https:\/\/docs.microsoft.com\/en-us\/learn\/modules\/describe-azure-databricks-best-practices\/\" target=\"_blank\" rel=\"noopener noreferrer\">Describe Azure Databricks best practices<\/a><\/li>\n<li><a href=\"https:\/\/docs.microsoft.com\/en-us\/learn\/paths\/perform-data-science-azure-databricks\/\" target=\"_blank\" rel=\"noopener noreferrer\">Perform data science with Azure Databricks<\/a><\/li>\n<li><a href=\"https:\/\/docs.microsoft.com\/en-us\/learn\/modules\/integrate-azure-databricks-other-azure-services\/\" target=\"_blank\" rel=\"noopener noreferrer\">Integrate Azure Databricks with other Azure services<\/a><\/li>\n<\/ul>\n<p>&nbsp;<\/p>\n<h2>Resources for your decision making team<\/h2>\n<ul>\n<li><a title=\"Original URL: https:\/\/cm-edgetun.pages.dev\/en-gb\/industry\/blog\/cross-industry\/2020\/06\/12\/5-steps-to-create-a-strong-data-strategy-for-your-business\/. Click or tap if you trust this link.\" href=\"https:\/\/cm-edgetun.pages.dev\/en-gb\/industry\/blog\/cross-industry\/2020\/06\/12\/5-steps-to-create-a-strong-data-strategy-for-your-business\/\" target=\"_blank\" rel=\"noopener noreferrer\" data-auth=\"Verified\">5 steps to create a strong data strategy for your business<\/a><\/li>\n<li><a title=\"Original URL: https:\/\/cm-edgetun.pages.dev\/en-gb\/industry\/blog\/cross-industry\/2020\/08\/18\/4-ways-data-and-analytics-can-help-the-utilities-sector-in-the-new-normal\/. Click or tap if you trust this link.\" href=\"https:\/\/cm-edgetun.pages.dev\/en-gb\/industry\/blog\/cross-industry\/2020\/08\/18\/4-ways-data-and-analytics-can-help-the-utilities-sector-in-the-new-normal\/\" target=\"_blank\" rel=\"noopener noreferrer\" data-auth=\"Verified\">4 ways data and analytics can help the utilities sector in the new normal<\/a><\/li>\n<li><a title=\"Original URL: https:\/\/cm-edgetun.pages.dev\/en-gb\/industry\/blog\/retail\/2020\/07\/24\/how-a-modern-data-strategy-can-position-your-business-for-success-in-the-new-normal\/. Click or tap if you trust this link.\" href=\"https:\/\/cm-edgetun.pages.dev\/en-gb\/industry\/blog\/retail\/2020\/07\/24\/how-a-modern-data-strategy-can-position-your-business-for-success-in-the-new-normal\/\" target=\"_blank\" rel=\"noopener noreferrer\" data-auth=\"Verified\">How a modern data strategy can position retail businesses for success<\/a><\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>Using AAD tokens it is now possible to generate an Azure Databricks personal access token programmatically, and provision an instance pool using the Instance Pools API. The token can be generated and utilised at run-time to provide \u201cjust-in-time\u201d access to the Databricks workspace. Using the same AAD token, an instance pool can also be provisioned and used to run a series of Databricks activities in the same ADF pipeline.<\/p>\n","protected":false},"author":430,"featured_media":36909,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"ep_exclude_from_search":false,"_classifai_error":"","_classifai_text_to_speech_error":"","footnotes":""},"categories":[594],"post_tag":[519],"content-type":[],"coauthors":[1197,1476],"class_list":["post-38976","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-technetuk","tag-technet-uk"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.4 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Just-in-time Azure Databricks access tokens and instance pools for Azure Data Factory pipelines using workspace automation - Microsoft Industry Blogs - United Kingdom<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/cm-edgetun.pages.dev\/en-gb\/industry\/blog\/technetuk\/2020\/08\/18\/just-in-time-azure-databricks-access-tokens-and-instance-pools-for-azure-data-factory-pipelines-using-workspace-automation\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Just-in-time Azure Databricks access tokens and instance pools for Azure Data Factory pipelines using workspace automation - Microsoft Industry Blogs - United Kingdom\" \/>\n<meta property=\"og:description\" content=\"Using AAD tokens it is now possible to generate an Azure Databricks personal access token programmatically, and provision an instance pool using the Instance Pools API. The token can be generated and utilised at run-time to provide \u201cjust-in-time\u201d access to the Databricks workspace. Using the same AAD token, an instance pool can also be provisioned and used to run a series of Databricks activities in the same ADF pipeline.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/cm-edgetun.pages.dev\/en-gb\/industry\/blog\/technetuk\/2020\/08\/18\/just-in-time-azure-databricks-access-tokens-and-instance-pools-for-azure-data-factory-pipelines-using-workspace-automation\/\" \/>\n<meta property=\"og:site_name\" content=\"Microsoft Industry Blogs - United Kingdom\" \/>\n<meta property=\"article:published_time\" content=\"2020-08-18T14:40:52+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2020-09-01T14:06:05+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/cm-edgetun.pages.dev\/en-gb\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/07\/DatabricksThumb2.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"800\" \/>\n\t<meta property=\"og:image:height\" content=\"450\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"Nicholas Hurt, Keith Howling\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Nicholas Hurt, Keith Howling\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"8 min read\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/cm-edgetun.pages.dev\\\/en-gb\\\/industry\\\/blog\\\/technetuk\\\/2020\\\/08\\\/18\\\/just-in-time-azure-databricks-access-tokens-and-instance-pools-for-azure-data-factory-pipelines-using-workspace-automation\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/cm-edgetun.pages.dev\\\/en-gb\\\/industry\\\/blog\\\/technetuk\\\/2020\\\/08\\\/18\\\/just-in-time-azure-databricks-access-tokens-and-instance-pools-for-azure-data-factory-pipelines-using-workspace-automation\\\/\"},\"author\":[{\"@id\":\"https:\\\/\\\/cm-edgetun.pages.dev\\\/en-gb\\\/industry\\\/blog\\\/author\\\/nicholas-hurt\\\/\",\"@type\":\"Person\",\"@name\":\"Nicholas Hurt\"},{\"@id\":\"https:\\\/\\\/cm-edgetun.pages.dev\\\/en-gb\\\/industry\\\/blog\\\/author\\\/keith-howling\\\/\",\"@type\":\"Person\",\"@name\":\"Keith Howling\"}],\"headline\":\"Just-in-time Azure Databricks access tokens and instance pools for Azure Data Factory pipelines using workspace automation\",\"datePublished\":\"2020-08-18T14:40:52+00:00\",\"dateModified\":\"2020-09-01T14:06:05+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/cm-edgetun.pages.dev\\\/en-gb\\\/industry\\\/blog\\\/technetuk\\\/2020\\\/08\\\/18\\\/just-in-time-azure-databricks-access-tokens-and-instance-pools-for-azure-data-factory-pipelines-using-workspace-automation\\\/\"},\"wordCount\":1834,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\\\/\\\/cm-edgetun.pages.dev\\\/en-gb\\\/industry\\\/blog\\\/#organization\"},\"image\":{\"@id\":\"https:\\\/\\\/cm-edgetun.pages.dev\\\/en-gb\\\/industry\\\/blog\\\/technetuk\\\/2020\\\/08\\\/18\\\/just-in-time-azure-databricks-access-tokens-and-instance-pools-for-azure-data-factory-pipelines-using-workspace-automation\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/cm-edgetun.pages.dev\\\/en-gb\\\/industry\\\/blog\\\/wp-content\\\/uploads\\\/sites\\\/22\\\/2020\\\/07\\\/DatabricksThumb2.jpg\",\"keywords\":[\"TechNet UK\"],\"articleSection\":[\"TechNet UK\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\\\/\\\/cm-edgetun.pages.dev\\\/en-gb\\\/industry\\\/blog\\\/technetuk\\\/2020\\\/08\\\/18\\\/just-in-time-azure-databricks-access-tokens-and-instance-pools-for-azure-data-factory-pipelines-using-workspace-automation\\\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/cm-edgetun.pages.dev\\\/en-gb\\\/industry\\\/blog\\\/technetuk\\\/2020\\\/08\\\/18\\\/just-in-time-azure-databricks-access-tokens-and-instance-pools-for-azure-data-factory-pipelines-using-workspace-automation\\\/\",\"url\":\"https:\\\/\\\/cm-edgetun.pages.dev\\\/en-gb\\\/industry\\\/blog\\\/technetuk\\\/2020\\\/08\\\/18\\\/just-in-time-azure-databricks-access-tokens-and-instance-pools-for-azure-data-factory-pipelines-using-workspace-automation\\\/\",\"name\":\"Just-in-time Azure Databricks access tokens and instance pools for Azure Data Factory pipelines using workspace automation - Microsoft Industry Blogs - United Kingdom\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/cm-edgetun.pages.dev\\\/en-gb\\\/industry\\\/blog\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/cm-edgetun.pages.dev\\\/en-gb\\\/industry\\\/blog\\\/technetuk\\\/2020\\\/08\\\/18\\\/just-in-time-azure-databricks-access-tokens-and-instance-pools-for-azure-data-factory-pipelines-using-workspace-automation\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/cm-edgetun.pages.dev\\\/en-gb\\\/industry\\\/blog\\\/technetuk\\\/2020\\\/08\\\/18\\\/just-in-time-azure-databricks-access-tokens-and-instance-pools-for-azure-data-factory-pipelines-using-workspace-automation\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/cm-edgetun.pages.dev\\\/en-gb\\\/industry\\\/blog\\\/wp-content\\\/uploads\\\/sites\\\/22\\\/2020\\\/07\\\/DatabricksThumb2.jpg\",\"datePublished\":\"2020-08-18T14:40:52+00:00\",\"dateModified\":\"2020-09-01T14:06:05+00:00\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/cm-edgetun.pages.dev\\\/en-gb\\\/industry\\\/blog\\\/technetuk\\\/2020\\\/08\\\/18\\\/just-in-time-azure-databricks-access-tokens-and-instance-pools-for-azure-data-factory-pipelines-using-workspace-automation\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/cm-edgetun.pages.dev\\\/en-gb\\\/industry\\\/blog\\\/technetuk\\\/2020\\\/08\\\/18\\\/just-in-time-azure-databricks-access-tokens-and-instance-pools-for-azure-data-factory-pipelines-using-workspace-automation\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/cm-edgetun.pages.dev\\\/en-gb\\\/industry\\\/blog\\\/technetuk\\\/2020\\\/08\\\/18\\\/just-in-time-azure-databricks-access-tokens-and-instance-pools-for-azure-data-factory-pipelines-using-workspace-automation\\\/#primaryimage\",\"url\":\"https:\\\/\\\/cm-edgetun.pages.dev\\\/en-gb\\\/industry\\\/blog\\\/wp-content\\\/uploads\\\/sites\\\/22\\\/2020\\\/07\\\/DatabricksThumb2.jpg\",\"contentUrl\":\"https:\\\/\\\/cm-edgetun.pages.dev\\\/en-gb\\\/industry\\\/blog\\\/wp-content\\\/uploads\\\/sites\\\/22\\\/2020\\\/07\\\/DatabricksThumb2.jpg\",\"width\":800,\"height\":450,\"caption\":\"An image representing Data Bricks, next to an illustration of Bit the Raccoon.\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/cm-edgetun.pages.dev\\\/en-gb\\\/industry\\\/blog\\\/technetuk\\\/2020\\\/08\\\/18\\\/just-in-time-azure-databricks-access-tokens-and-instance-pools-for-azure-data-factory-pipelines-using-workspace-automation\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/cm-edgetun.pages.dev\\\/en-gb\\\/industry\\\/blog\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Just-in-time Azure Databricks access tokens and instance pools for Azure Data Factory pipelines using workspace automation\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/cm-edgetun.pages.dev\\\/en-gb\\\/industry\\\/blog\\\/#website\",\"url\":\"https:\\\/\\\/cm-edgetun.pages.dev\\\/en-gb\\\/industry\\\/blog\\\/\",\"name\":\"Microsoft Industry Blogs - United Kingdom\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\\\/\\\/cm-edgetun.pages.dev\\\/en-gb\\\/industry\\\/blog\\\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/cm-edgetun.pages.dev\\\/en-gb\\\/industry\\\/blog\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\\\/\\\/cm-edgetun.pages.dev\\\/en-gb\\\/industry\\\/blog\\\/#organization\",\"name\":\"Microsoft Industry Blogs - United Kingdom\",\"url\":\"https:\\\/\\\/cm-edgetun.pages.dev\\\/en-gb\\\/industry\\\/blog\\\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/cm-edgetun.pages.dev\\\/en-gb\\\/industry\\\/blog\\\/#\\\/schema\\\/logo\\\/image\\\/\",\"url\":\"https:\\\/\\\/cm-edgetun.pages.dev\\\/en-gb\\\/industry\\\/blog\\\/wp-content\\\/uploads\\\/sites\\\/22\\\/2019\\\/08\\\/Microsoft-Logo.png\",\"contentUrl\":\"https:\\\/\\\/cm-edgetun.pages.dev\\\/en-gb\\\/industry\\\/blog\\\/wp-content\\\/uploads\\\/sites\\\/22\\\/2019\\\/08\\\/Microsoft-Logo.png\",\"width\":259,\"height\":194,\"caption\":\"Microsoft Industry Blogs - United Kingdom\"},\"image\":{\"@id\":\"https:\\\/\\\/cm-edgetun.pages.dev\\\/en-gb\\\/industry\\\/blog\\\/#\\\/schema\\\/logo\\\/image\\\/\"}}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Just-in-time Azure Databricks access tokens and instance pools for Azure Data Factory pipelines using workspace automation - Microsoft Industry Blogs - United Kingdom","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/cm-edgetun.pages.dev\/en-gb\/industry\/blog\/technetuk\/2020\/08\/18\/just-in-time-azure-databricks-access-tokens-and-instance-pools-for-azure-data-factory-pipelines-using-workspace-automation\/","og_locale":"en_US","og_type":"article","og_title":"Just-in-time Azure Databricks access tokens and instance pools for Azure Data Factory pipelines using workspace automation - Microsoft Industry Blogs - United Kingdom","og_description":"Using AAD tokens it is now possible to generate an Azure Databricks personal access token programmatically, and provision an instance pool using the Instance Pools API. The token can be generated and utilised at run-time to provide \u201cjust-in-time\u201d access to the Databricks workspace. Using the same AAD token, an instance pool can also be provisioned and used to run a series of Databricks activities in the same ADF pipeline.","og_url":"https:\/\/cm-edgetun.pages.dev\/en-gb\/industry\/blog\/technetuk\/2020\/08\/18\/just-in-time-azure-databricks-access-tokens-and-instance-pools-for-azure-data-factory-pipelines-using-workspace-automation\/","og_site_name":"Microsoft Industry Blogs - United Kingdom","article_published_time":"2020-08-18T14:40:52+00:00","article_modified_time":"2020-09-01T14:06:05+00:00","og_image":[{"width":800,"height":450,"url":"https:\/\/cm-edgetun.pages.dev\/en-gb\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/07\/DatabricksThumb2.jpg","type":"image\/jpeg"}],"author":"Nicholas Hurt, Keith Howling","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Nicholas Hurt, Keith Howling","Est. reading time":"8 min read"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/cm-edgetun.pages.dev\/en-gb\/industry\/blog\/technetuk\/2020\/08\/18\/just-in-time-azure-databricks-access-tokens-and-instance-pools-for-azure-data-factory-pipelines-using-workspace-automation\/#article","isPartOf":{"@id":"https:\/\/cm-edgetun.pages.dev\/en-gb\/industry\/blog\/technetuk\/2020\/08\/18\/just-in-time-azure-databricks-access-tokens-and-instance-pools-for-azure-data-factory-pipelines-using-workspace-automation\/"},"author":[{"@id":"https:\/\/cm-edgetun.pages.dev\/en-gb\/industry\/blog\/author\/nicholas-hurt\/","@type":"Person","@name":"Nicholas Hurt"},{"@id":"https:\/\/cm-edgetun.pages.dev\/en-gb\/industry\/blog\/author\/keith-howling\/","@type":"Person","@name":"Keith Howling"}],"headline":"Just-in-time Azure Databricks access tokens and instance pools for Azure Data Factory pipelines using workspace automation","datePublished":"2020-08-18T14:40:52+00:00","dateModified":"2020-09-01T14:06:05+00:00","mainEntityOfPage":{"@id":"https:\/\/cm-edgetun.pages.dev\/en-gb\/industry\/blog\/technetuk\/2020\/08\/18\/just-in-time-azure-databricks-access-tokens-and-instance-pools-for-azure-data-factory-pipelines-using-workspace-automation\/"},"wordCount":1834,"commentCount":0,"publisher":{"@id":"https:\/\/cm-edgetun.pages.dev\/en-gb\/industry\/blog\/#organization"},"image":{"@id":"https:\/\/cm-edgetun.pages.dev\/en-gb\/industry\/blog\/technetuk\/2020\/08\/18\/just-in-time-azure-databricks-access-tokens-and-instance-pools-for-azure-data-factory-pipelines-using-workspace-automation\/#primaryimage"},"thumbnailUrl":"https:\/\/cm-edgetun.pages.dev\/en-gb\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/07\/DatabricksThumb2.jpg","keywords":["TechNet UK"],"articleSection":["TechNet UK"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/cm-edgetun.pages.dev\/en-gb\/industry\/blog\/technetuk\/2020\/08\/18\/just-in-time-azure-databricks-access-tokens-and-instance-pools-for-azure-data-factory-pipelines-using-workspace-automation\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/cm-edgetun.pages.dev\/en-gb\/industry\/blog\/technetuk\/2020\/08\/18\/just-in-time-azure-databricks-access-tokens-and-instance-pools-for-azure-data-factory-pipelines-using-workspace-automation\/","url":"https:\/\/cm-edgetun.pages.dev\/en-gb\/industry\/blog\/technetuk\/2020\/08\/18\/just-in-time-azure-databricks-access-tokens-and-instance-pools-for-azure-data-factory-pipelines-using-workspace-automation\/","name":"Just-in-time Azure Databricks access tokens and instance pools for Azure Data Factory pipelines using workspace automation - Microsoft Industry Blogs - United Kingdom","isPartOf":{"@id":"https:\/\/cm-edgetun.pages.dev\/en-gb\/industry\/blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/cm-edgetun.pages.dev\/en-gb\/industry\/blog\/technetuk\/2020\/08\/18\/just-in-time-azure-databricks-access-tokens-and-instance-pools-for-azure-data-factory-pipelines-using-workspace-automation\/#primaryimage"},"image":{"@id":"https:\/\/cm-edgetun.pages.dev\/en-gb\/industry\/blog\/technetuk\/2020\/08\/18\/just-in-time-azure-databricks-access-tokens-and-instance-pools-for-azure-data-factory-pipelines-using-workspace-automation\/#primaryimage"},"thumbnailUrl":"https:\/\/cm-edgetun.pages.dev\/en-gb\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/07\/DatabricksThumb2.jpg","datePublished":"2020-08-18T14:40:52+00:00","dateModified":"2020-09-01T14:06:05+00:00","breadcrumb":{"@id":"https:\/\/cm-edgetun.pages.dev\/en-gb\/industry\/blog\/technetuk\/2020\/08\/18\/just-in-time-azure-databricks-access-tokens-and-instance-pools-for-azure-data-factory-pipelines-using-workspace-automation\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/cm-edgetun.pages.dev\/en-gb\/industry\/blog\/technetuk\/2020\/08\/18\/just-in-time-azure-databricks-access-tokens-and-instance-pools-for-azure-data-factory-pipelines-using-workspace-automation\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/cm-edgetun.pages.dev\/en-gb\/industry\/blog\/technetuk\/2020\/08\/18\/just-in-time-azure-databricks-access-tokens-and-instance-pools-for-azure-data-factory-pipelines-using-workspace-automation\/#primaryimage","url":"https:\/\/cm-edgetun.pages.dev\/en-gb\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/07\/DatabricksThumb2.jpg","contentUrl":"https:\/\/cm-edgetun.pages.dev\/en-gb\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/07\/DatabricksThumb2.jpg","width":800,"height":450,"caption":"An image representing Data Bricks, next to an illustration of Bit the Raccoon."},{"@type":"BreadcrumbList","@id":"https:\/\/cm-edgetun.pages.dev\/en-gb\/industry\/blog\/technetuk\/2020\/08\/18\/just-in-time-azure-databricks-access-tokens-and-instance-pools-for-azure-data-factory-pipelines-using-workspace-automation\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/cm-edgetun.pages.dev\/en-gb\/industry\/blog\/"},{"@type":"ListItem","position":2,"name":"Just-in-time Azure Databricks access tokens and instance pools for Azure Data Factory pipelines using workspace automation"}]},{"@type":"WebSite","@id":"https:\/\/cm-edgetun.pages.dev\/en-gb\/industry\/blog\/#website","url":"https:\/\/cm-edgetun.pages.dev\/en-gb\/industry\/blog\/","name":"Microsoft Industry Blogs - United Kingdom","description":"","publisher":{"@id":"https:\/\/cm-edgetun.pages.dev\/en-gb\/industry\/blog\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/cm-edgetun.pages.dev\/en-gb\/industry\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/cm-edgetun.pages.dev\/en-gb\/industry\/blog\/#organization","name":"Microsoft Industry Blogs - United Kingdom","url":"https:\/\/cm-edgetun.pages.dev\/en-gb\/industry\/blog\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/cm-edgetun.pages.dev\/en-gb\/industry\/blog\/#\/schema\/logo\/image\/","url":"https:\/\/cm-edgetun.pages.dev\/en-gb\/industry\/blog\/wp-content\/uploads\/sites\/22\/2019\/08\/Microsoft-Logo.png","contentUrl":"https:\/\/cm-edgetun.pages.dev\/en-gb\/industry\/blog\/wp-content\/uploads\/sites\/22\/2019\/08\/Microsoft-Logo.png","width":259,"height":194,"caption":"Microsoft Industry Blogs - United Kingdom"},"image":{"@id":"https:\/\/cm-edgetun.pages.dev\/en-gb\/industry\/blog\/#\/schema\/logo\/image\/"}}]}},"_links":{"self":[{"href":"https:\/\/cm-edgetun.pages.dev\/en-gb\/industry\/blog\/wp-json\/wp\/v2\/posts\/38976","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/cm-edgetun.pages.dev\/en-gb\/industry\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/cm-edgetun.pages.dev\/en-gb\/industry\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-gb\/industry\/blog\/wp-json\/wp\/v2\/users\/430"}],"replies":[{"embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-gb\/industry\/blog\/wp-json\/wp\/v2\/comments?post=38976"}],"version-history":[{"count":0,"href":"https:\/\/cm-edgetun.pages.dev\/en-gb\/industry\/blog\/wp-json\/wp\/v2\/posts\/38976\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-gb\/industry\/blog\/wp-json\/wp\/v2\/media\/36909"}],"wp:attachment":[{"href":"https:\/\/cm-edgetun.pages.dev\/en-gb\/industry\/blog\/wp-json\/wp\/v2\/media?parent=38976"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-gb\/industry\/blog\/wp-json\/wp\/v2\/categories?post=38976"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-gb\/industry\/blog\/wp-json\/wp\/v2\/post_tag?post=38976"},{"taxonomy":"content-type","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-gb\/industry\/blog\/wp-json\/wp\/v2\/content-type?post=38976"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/cm-edgetun.pages.dev\/en-gb\/industry\/blog\/wp-json\/wp\/v2\/coauthors?post=38976"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}