Skip to main content Microsoft Intelligent Data Platform Azure Arc Azure databases Power BI SQL Server 2025 SQL Server BI SQL Server 2022 SQL Server 2019 SQL Server 2017 SQL Server 2016 SQL Server 2005 - 2014 Downloads Community SQL End of Support Data Security - SQL Server Encryption SQL Server blog SQL Server and Azure SQL workshops Browse Microsoft Solutions Hub SQL Server Tech Community Azure Databases Tech Community Azure Synapse Analytics Tech Community Developer Find a partner Become a partner Partner resources Try SQL Server 2025 Microsoft Security Azure Dynamics 365 Microsoft 365 Microsoft Teams Windows 365 Microsoft AI Azure Space Mixed reality Microsoft HoloLens Microsoft Viva Quantum computing Sustainability Education Automotive Financial services Government Healthcare Manufacturing Retail Find a partner Become a partner Partner Network Microsoft Marketplace Software companies Blog Microsoft Advertising Developer Center Documentation Events Licensing Microsoft Learn Microsoft Research View Sitemap
·
1 min read

Azure HDInsight Clusters Allows Custom Installation of Spark Using Script Action

Apache Spark is a popular open source framework for distributed cluster computing. Spark has been gaining popularity for its ability to handle both batch and stream processing as well as supporting in-memory and conventional disk processing. Starting today, Azure HDInsight will make it possible to install Spark as well as other Hadoop sub-projects on its clusters. This is delivered through a new customization feature called Script Action. This will allow you to experiment and deploy Hadoop projects to HDInsight clusters that were not possible before. We are making this easier specifically for Spark and R by documenting the process to install these modules.

To do this, you will have to create an HDInsight cluster with Spark Script Action. Script Action allow users to specify PowerShell scripts that will be executed on cluster nodes during cluster setup. One of the sample scripts that are released with the preview is Script Action to install Spark. During preview the feature is available through PowerShell, so you will need to run PowerShell scripts to create your Spark cluster. Below is the snippet of the PowerShell code where “spark-installer-v01.ps1” is the Script Action that installs Spark on HDInsight:

New-AzureHDInsightClusterConfig -ClusterSizeInNodes $clusterNodes

| Set-AzureHDInsightDefaultStorage -StorageAccountName $storageAccountName
-StorageAccountKey $storageAccountKey -StorageContainerName $containerName

| Add-AzureHDInsightScriptAction -Name “Install Spark”
-ClusterRoleCollection HeadNode,DataNode
-Uri https://hdiconfigactions.blob.core.windows.net/sparkconfigactionv01/spark-installer-v01.ps1

| New-AzureHDInsightCluster -Name $clusterName -Location $location

Once the cluster is provisioned it will have the Spark component installed on it. You can RDP into the cluster and use Spark shell:

  • In Hadoop command line window change directory to C:\apps\dist\spark-1.0.2
  • Run the following command to start the Spark shell.

.\bin\spark-shell

  • On the Scala prompt, enter the spark query to count words in a sample file stored in Azure Blob storage account:

val file = sc.textFile(“example/data/gutenberg/davinci.txt”)
val counts = file.flatMap(line => line.split(” “)).map(word => (word, 1)).reduceByKey(_ + _)
counts.toArray().foreach(println)

For more information on Azure HDInsight:

English (United States)
Your Privacy Choices Opt-Out Icon Your Privacy Choices
Consumer Health Privacy Sitemap Contact Microsoft Privacy Manage cookies Terms of use Trademarks Safety & eco Recycling About our ads