Quick StartDryadLINQ documentation

Instructions for quickly getting started with DryadLINQ:

Get the core DryadLINQ libraries and example code

  1. Create a new Visual Studio Visual C# Console Application project. You need Visual Studio 2012 or later.

  2. Go to the Configuration manager menu, select your project, and add a new project platform x64, making sure to check the Create new solution platforms box.

  3. Right-click on the project and select "Manage NuGet Packages...".

  4. Change the "Stable Only" dropdown to "Include Prerelease".

  5. Choose 'Online/nuget.org' in the left pane.

  6. Enter "Dryad" into the search box, then select and install "Sample DryadLINQ Programs".

  7. The sample includes a Word Count example in a separate WordCount.cs file. Add WordCount.WordCountExample(); to the Main method in Program.cs.

You should now be able to build the project.

Running the sample on your local computer

  • The sample is initially configured to run locally. All you have to do is build and run the program with no arguments, and it will perform a word count on a single line of text embedded in the lines variable, and output the results to the console. The output will look something like:

    QueryPlan: C:\Users\JaneDoe\AppData\Local\Temp\DRYADLINQ\DryadSample.vshost_5004\DryadLinqProgram__0.xml
    line: 1
    short: 1
    a: 2
    job: 1
    dummy: 1
    for: 1
    is: 1
    This: 1

Running the sample on an Azure cluster

  1. If you don't already have one, you must first create an HDInsight cluster.

  2. Comment the #define local statement at the top of WordCount.cs to modify the sample to run on Azure.

  3. Specify your cluster name, default storage account and default container in the clusterName, accountName and containerName variable constructors.

  4. Now rebuild and run, and the program will execute on your cluster, writing output to Azure blob storage. The first time you run a DryadLINQ program on a cluster you will see lots of output log lines telling you that binaries are being uploaded to 'staging' blobs so that they will be available for use by the running program. Subsequent runs won't need nearly as many files to be uploaded. Once all the staging files have been uploaded there will be a delay of a minute or so while HDInsight launches the DryadLINQ job. For a short test like this one above almost all the time is taken by job launch, but for serious computation this startup time won't make a big difference to the overall running time. Eventually the submission program will return, and if you go look at the Azure storage container you specified above, there will be a blob called wc-out.txt/part-00000000-0001-0000 which contains the counted output. For this simple program there is only a single output part, though in general there may be many.

See Also

Other Resources