Raymond Raymond

Machine Learning with .NET in Jupyter Notebooks

event 2020-01-02 visibility 2,544 comment 0 insights toc
more_vert
insights Stats
Machine Learning with .NET in Jupyter Notebooks

In this article, I'm going to show you how to install Jupyter in Windows and then install .NET kernel for Jupyter notebooks. It also shows a machine learning example using ML.NET. The target audience are .NET developers who want to expand their skills in data engineering and science domain with existing .NET programming skills. 

*The above logos are registered trademarks of Microsoft, Apache and Jupyter.

Why .NET

I always believe that a programming language/framework is only a tool to a developer and there are numerous tools available to achieve the same goal. Thus what's most important is to find the right tool you love and get the job done. If you are a .NET developer and wants to expand your skills in the data world, you now don't necessarily need to learn a new programming language. Of course in many companies, there are always preferred tool set configured but when the Cloud journey continues, it should get more and more flexible to choose your comfortable tool to work with. 

2019 is a great year for the .NET community with .NET 3.0/3.1 officially published and ML.NET is getting momentum. In Spark, you can now choose C# or F# as your programming language instead of Scala, Python, R or Java (Find more details here: .NET for Apache Spark Preview with Examples). Microsoft's investment in data frameworks in .NET ecosystem may come a little bit late compared with others but it's definitely worthwhile to explore or adopt further especially if you have a big .NET developer in your organisation. In many scenario, .NET Core based applications runs much quicker compared with other counter-parties. 

*Image from ML.NET. Data sourced from Machine Learning at Microsoft with ML.NET paper. Results for sentiment analysis, using ~900 MB of an Amazon review dataset. Higher accuracy and lower runtime are better.

In 2020, the adoption rate may grow significantly as these products are now commonly used in Cloud based products such as Azure Synapse Analytics. It is really a great time for .NET developers to get started in the data journey.

Prerequisites

You can definitely install all the tools in Windows directly. For this article, I am going to show you how to install them in  Windows Subsystem for Linux (WSL) so that the steps can work both in Windows (via WSL) and Linux. 

  • Windows 10 with WSL enabled:

Refer to this article to install WSL.

Or alternatively, you can also follow this page to install it. Install Windows Subsystem for Linux on a Non-System Drive

$ cat /etc/os-release 
NAME="Ubuntu"
VERSION="18.04.1 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.1 LTS"
VERSION_ID="18.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="..."
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionic
  • Python 3.6.9 (or later version):

Python version in my WSL distro:

$ python --version
Python 3.6.9
  • pip
$ pip --version
pip 9.0.1 from /usr/lib/python3/dist-packages (python 3.6)

Step 1 - Install .NET SDK 3.1

Install .NET SDK 3.1 from https://dotnet.microsoft.com/download.

For the WSL Ubuntu, follow this official page to install it:

Ubuntu 19.04 Package Manager - Install .NET Core

The main commands I used for my installation are:

  • Download package

wget -q https://packages.microsoft.com/config/ubuntu/19.04/packages-microsoft-prod.deb -O packages-microsoft-prod.deb
  • Install Microsoft package and setup Microsoft feed
sudo dpkg -i packages-microsoft-prod.deb
  • Install .NET SDK 
sudo apt-get update
sudo apt-get install apt-transport-https
sudo apt-get update
sudo apt-get install dotnet-sdk-3.1
It make take a few minutes to complete the installation. Once it is done, you should be able to verify the installation:

~$ dotnet --version
3.1.100

Step 2 - Install JupyterLab

Run the following command to install Jupyter on WSL Ubuntu distro on Windows 10:

sudo pip install jupyterlab

Verify the installation using the following command:

~$ jupyter kernelspec list
Available kernels:
  python3    /usr/local/share/jupyter/kernels/python3

As you can see from the above command output, currently only python3 kernels are enabled for Jupyter.

Step 3 - Install dotnet try global tool

dotnet tool install -g dotnet-try

The output looks like the following:

$ dotnet tool install -g dotnet-try
Since you just installed the .NET Core SDK, you will need to logout or restart your session before running the tool you installed.
You can invoke the tool using the following command: dotnet-try
Tool 'dotnet-try' (version '1.0.19553.4') was successfully installed.

As the command suggests, please logout from WSL bash window and then open it again to verify the installation:

$ dotnet try

Welcome to Try .NET!
---------------------
Telemetry
---------
The.NET Core tools collect usage data in order to help us improve your experience.The data is anonymous and doesn't include command-line arguments. The data is collected by Microsoft and shared with the community. You can opt-out of telemetry by setting the DOTNET_TRY_CLI_TELEMETRY_OPTOUT environment variable to '1' or 'true' using your favorite shell.

Hosting environment: Production
Content root path: /home/tangr/.dotnet/tools/.store/dotnet-try/1.0.19553.4/dotnet-try/1.0.19553.4/tools/netcoreapp3.0/any
Now listening on: https://localhost:3621
Application started. Press Ctrl+C to shut down.

Step 4 - Install .NET kernel for Jupyter

Now we can install .NET kernel for Jupyter using the following command:

$ dotnet try jupyter install

[InstallKernelSpec] Installed kernelspec .net-csharp in /home/tangr/.local/share/jupyter/kernels/.net-csharp
.NET kernel installation succeeded

[InstallKernelSpec] Installed kernelspec .net-fsharp in /home/tangr/.local/share/jupyter/kernels/.net-fsharp
.NET kernel installation succeeded

Verify the result by using the previous kernel list command:

$ jupyter kernelspec list
Available kernels:
  .net-csharp    /home/tangr/.local/share/jupyter/kernels/.net-csharp
  .net-fsharp    /home/tangr/.local/share/jupyter/kernels/.net-fsharp
  python3        /usr/local/share/jupyter/kernels/python3

As you can see from the output, we now have both C# and F# kernel enabled.

Play with .NET kernel in Jupyter

Now we have everything ready and we can play with .NET kernel in Jupyter.

First, let's start Jupyter notbooks using this command:

$ jupyter lab
[I 18:07:28.278 LabApp] Writing notebook server cookie secret to /home/tangr/.local/share/jupyter/runtime/notebook_cookie_secret
[I 18:07:28.579 LabApp] JupyterLab extension loaded from /usr/local/lib/python3.6/dist-packages/jupyterlab
[I 18:07:28.580 LabApp] JupyterLab application directory is /usr/local/share/jupyter/lab
[I 18:07:28.597 LabApp] Serving notebooks from local directory: /home/tangr
[I 18:07:28.597 LabApp] The Jupyter Notebook is running at:
[I 18:07:28.598 LabApp] http://localhost:8888/?token=69634aea67757618e17df111b45a85e1ab8cfbda96efd80a
[I 18:07:28.599 LabApp]  or http://127.0.0.1:8888/?token=69634aea67757618e17df111b45a85e1ab8cfbda96efd80a
[I 18:07:28.600 LabApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[W 18:07:28.628 LabApp] No web browser found: could not locate runnable browser.
[C 18:07:28.629 LabApp]

    To access the notebook, open this file in a browser:
        file:///home/tangr/.local/share/jupyter/runtime/nbserver-1538-open.html
    Or copy and paste one of these URLs:
        http://localhost:8888/?token=69634aea67757618e17df111b45a85e1ab8cfbda96efd80a
     or http://127.0.0.1:8888/?token=69634aea67757618e17df111b45a85e1ab8cfbda96efd80a

You can access Jupyter lab web interface via the URLs printed out. Please keep the WSL bash window open.

The notebook looks like the following in my computer. You can click either .NET (C#) for .NET (F#) to create a new notebook.

2020010271016-image.png

For this article, I am going to use C# for all the demos.

Display data

Helper method display can be used to display the data. Code snippet looks like the following:

class Customer{
    public string FirstName{get;set;}
    public string LastName{get;set;}
}
var oneCustomer = new Customer{FirstName="Raymond",LastName="Tang"};
display(oneCustomer);
var anotherCustomer = new Customer{FirstName="John",LastName="Citizen"};
var allCustomers = new []{oneCustomer,anotherCustomer};
display(allCustomers);

The output looks like the following:

2020010271843-image.png

Output HTML

You can output HTML very easily by using HTML helper method. 

var html=HTML("<a href='https://kontext.tech/'>Kontext - Cloud, Data & AI Community</a>");
display(html);

The output looks like the following:

2020010272258-image.png

Import packages from NuGet

You can easily import NuGet packages using the following syntax:

#r "nuget:<package name>,<package version>"

For example, the following command imports Sqlite package from nuget, which can be used to operate with SQLite database.

#r "nuget:Microsoft.EntityFrameworkCore.Sqlite,2.1.1"

The output looks like the following:

2020010272837-image.png

Plot

XPlot.Plotly can be used to plot and there are many examples on the official documentation about using. The following is one example (The example is converted from this F# example):

using XPlot.Plotly; 

var title = "Basic Bar Chart";
var series1 = new Graph.Bar{
        x = new[]{"giraffes", "orangutans", "monkeys"},
        y = new[]{20,14,23},
        name= "SF Zoo"
    };
var series2 = new Graph.Bar{
        x = new[]{"giraffes", "orangutans", "monkeys"},
        y = new[]{12,18,29},
        name= "LA Zoo"
    };

var chart = Chart.Plot(new []{series1, series2});
chart.WithTitle(title);
display(chart);

The output looks like the following:

2020010280521-image.png

You can find more API references for this library here: https://fslab.org/XPlot/reference/xplot-plotly-graph.html.

Machine learning example

Now let's create a machine learning example using .NET (C#) kernel in Jupyter with ML.NET packages.

To save time, we will just use this example Sentiment Analysis for User Reviews.

The complete code base looks like the following (each code block is a cell in Jupyter notebook):

// Import dependent packages
#r "nuget:Microsoft.ML,1.4.0"
#r "nuget:Microsoft.ML.AutoML,0.16.0"
#r "nuget:Microsoft.Data.Analysis,0.1.0"
// models
using Microsoft.ML.Data;
public class SentimentIssue
    {
        [LoadColumn(0)]
        public bool Label { get; set; }
        [LoadColumn(2)]
        public string Text { get; set; }
    }
public class SentimentPrediction
    {
        // ColumnName attribute is used to change the column name from
        // its default value, which is the name of the field.
        [ColumnName("PredictedLabel")]
        public bool Prediction { get; set; }

        // No need to specify ColumnName attribute, because the field
        // name "Probability" is the column name we want.
        public float Probability { get; set; }

        public float Score { get; set; }
    }
using System;
using System.IO;
using System.Net;
using Microsoft.ML;
using static Microsoft.ML.DataOperationsCatalog;

static string GetAbsolutePath(string relativePath)
{
    var currentDir = Directory.GetCurrentDirectory();
    string fullPath = Path.Combine(currentDir, relativePath);
    return fullPath;
}

static readonly string DataRelativePath = $"./wikiDetoxAnnotated40kRows.tsv";
var dataUrl ="https://github.com/dotnet/machinelearning-samples/raw/master/samples/csharp/getting-started/BinaryClassification_SentimentAnalysis/SentimentAnalysis/Data/wikiDetoxAnnotated40kRows.tsv";
// download the data
using (var client = new WebClient())
{
    client.DownloadFile(dataUrl, DataRelativePath);
}
static readonly string DataPath = GetAbsolutePath(DataRelativePath);
static readonly string ModelRelativePath = $"./SentimentModel.zip";
static readonly string ModelPath = GetAbsolutePath(ModelRelativePath);
// Create MLContext to be shared across the model creation workflow objects 
// Set a random seed for repeatable/deterministic results across multiple trainings.
var mlContext = new MLContext(seed: 1);
// STEP 1: Common data loading configuration
IDataView dataView = mlContext.Data.LoadFromTextFile<SentimentIssue>(DataPath, hasHeader: true);

TrainTestData trainTestSplit = mlContext.Data.TrainTestSplit(dataView, testFraction: 0.2);
IDataView trainingData = trainTestSplit.TrainSet;
IDataView testData = trainTestSplit.TestSet;
// STEP 2: Common data process configuration with pipeline data transformations          
var dataProcessPipeline = mlContext.Transforms.Text.FeaturizeText(outputColumnName: "Features", inputColumnName: nameof(SentimentIssue.Text));
// STEP 3: Set the training algorithm, then create and config the modelBuilder                            
var trainer = mlContext.BinaryClassification.Trainers.SdcaLogisticRegression(labelColumnName: "Label", featureColumnName: "Features");
var trainingPipeline = dataProcessPipeline.Append(trainer);
// STEP 4: Train the model fitting to the DataSet
ITransformer trainedModel = trainingPipeline.Fit(trainingData);
// STEP 5: Evaluate the model and show accuracy stats
var predictions = trainedModel.Transform(testData);
var metrics = mlContext.BinaryClassification.Evaluate(data: predictions, labelColumnName: "Label", scoreColumnName: "Score");
// STEP 6: Save/persist the trained model to a .ZIP file
mlContext.Model.Save(trainedModel, trainingData.Schema, ModelPath);
display($"The model is saved to {ModelPath}");
// TRY IT: Make a single test prediction, loading the model from .ZIP file
SentimentIssue sampleStatement = new SentimentIssue { Text = "I love this movie!" };

// Create prediction engine related to the loaded trained model
var predEngine = mlContext.Model.CreatePredictionEngine<SentimentIssue, SentimentPrediction>(trainedModel);
// Score
var resultprediction = predEngine.Predict(sampleStatement);
Console.WriteLine($"=============== Single Prediction  ===============");
Console.WriteLine($"Text: {sampleStatement.Text} | Prediction: {(Convert.ToBoolean(resultprediction.Prediction) ? "Toxic" : "Non Toxic")} sentiment | Probability of being toxic: {resultprediction.Probability} ");
Console.WriteLine($"================End of Process.Hit any key to exit==================================");
Console.ReadLine();

Final outcome:

The screenshot of my Jupyter notebook:

2020010285143-_E__Downloads_csharp-ml-example.html.png

Integrates with Spark

Of course, you can easily integrates Jupyter .NET kernel now with Spark.

Refer to this article about how to install Spark .NET.  I will publish one more example later on. Please stay tuned.

References

Here are some references for further readings and explorations.

Any questions, please let me know. 

More from Kontext
comment Comments
No comments yet.

Please log in or register to comment.

account_circle Log in person_add Register

Log in with external accounts