Running Naiad programs on Microsoft AzureNaiad Help

[This is preliminary documentation and is subject to change.]

Naiad programs can run on Microsoft Azure using the HDInsight 3.0 platform as a service. This walkthrough describes the helper methods and tools that Naiad includes to help you process data in Azure.

Setting up your environment

  1. The Naiad tools assume that you have configured Azure PowerShell, and configured the PowerShell environment with your Azure subscription. Use these instructions to configure Azure PowerShell. Under "How to: Connect your subscription", follow the instructions under "Use the certificate method". To ensure that your environment is correctly set up, run the Get-AzureSubscription cmdlet and verify that your subscription has "IsDefault:True" and information identifying the management certificate.

  2. Create a HDInsight 3.0 cluster using these instructions. The following instructions assume that you have given your cluster the name mycluster.

  3. Implement a Naiad program, such as the tutorial Naiad program in this article. Although the following instructions are general, for concreteness they assume that you are using the tutorial program.

The first two steps can take some time, but in the mean time the next section will tell you how to modify your Naiad program to take advantage of data stored in Azure.

Modifying a Naiad program to run well on Azure

  1. The Naiad distribution includes an Azure Support framework, which includes helper methods for accessing Azure Storage. Add a reference to this framework, by installing the NuGet package. To make use of the Azure Support framework, add the following using statement:

    using Microsoft.Research.Naiad.Frameworks.Azure;

    See the Microsoft.Research.Naiad.Frameworks.Azure documentation for more details of the helper methods in this framework.

  2. After creating a new Computation, it can be useful to redirect Console.Out (and Console.Error) to write to Azure blobs. This can be done by invoking the SetConsoleOut(Controller, CloudBlobContainer, String) and SetConsoleError(Controller, CloudBlobContainer, String) methods on a computation's controller. These methods require an Azure container, and a format string which accepts one parameter, the process identifier. For example, the lines:

    var container = computation.DefaultBlobContainer("naiad-output");
    computation.Controller.SetConsoleOut(container, "naiad/out-{0}.txt");

    would redirect Console.Out for each process to a blob in the naiad-output container where "{0}" in the blob name replaced by the process identifier.

    To ensure that all writes are flushed to Azure before your program exits, you should also add a Console.Out.Close(); line after calling Join on the computation.

  3. The tutorial Naiad program reads lines from the Console, and so we must change this to read data from elsewhere. Instead of the loop (step 6), the following lines ingest constant data:

    source.OnNext("hi there".Split());
    source.OnNext("hello there".Split());
    source.OnNext("good bye".Split());

    Alternatively, the ReadTextFromAzureBlobs(Computation, CloudBlobContainer, String) method provides a mechanism to read textual data one line at a time from an Azure blob.

  4. Naiad's Microsoft.Research.Naiad.Frameworks.Azure framework also contains helper methods to write streams to Azure blobs and read streams from Azure blobs. If your Naiad program previously took inputs from and wrote outputs to your local file systems, it should be modified to read them from and write them to Azure blob storage instead. Note that your Naiad progams can read from and write to Azure storage without running on Azure; the same calls will work on your local computer or cluster, but the data movement is more efficient (and cheaper) when run in Azure.

Running Naiad programs on Azure HDInsight 3.0

  1. Install "Naiad - Azure submission tool" package from the NuGet repository.

  2. To see the arguments that Naiad's Azure tools require, run: .\NaiadAzureSubmission.exe with no arguments.

  3. Double-check that you have followed the steps above to bind your Azure subscription ID in Azure PowerShell.

  4. To run a Naiad program in Azure you must provide an HDInsight cluster name to NaiadAzureSubmission.exe. If your cluster name is mycluster and your program compiles to NaiadProgram.exe, you would type

    .\NaiadAzureSubmission --clustername mycluster NaiadProgram.exe [args]

If all is working well, your Naiad program should be launched on the corresponding HDInsight cluster, using the number of processes you have specified. When it completes, you should be able to find its outputs in Azure storage at the locations you have specified.