Machine Learning with .NET in Jupyter Notebooks
- Why .NET
- Prerequisites
- Step 1 - Install .NET SDK 3.1
- Step 2 - Install JupyterLab
- Step 3 - Install dotnet try global tool
- Step 4 - Install .NET kernel for Jupyter
- Play with .NET kernel in Jupyter
- Display data
- Output HTML
- Import packages from NuGet
- Plot
- Machine learning example
- Integrates with Spark
- References
In this article, I'm going to show you how to install Jupyter in Windows and then install .NET kernel for Jupyter notebooks. It also shows a machine learning example using ML.NET. The target audience are .NET developers who want to expand their skills in data engineering and science domain with existing .NET programming skills.
*The above logos are registered trademarks of Microsoft, Apache and Jupyter.
Why .NET
I always believe that a programming language/framework is only a tool to a developer and there are numerous tools available to achieve the same goal. Thus what's most important is to find the right tool you love and get the job done. If you are a .NET developer and wants to expand your skills in the data world, you now don't necessarily need to learn a new programming language. Of course in many companies, there are always preferred tool set configured but when the Cloud journey continues, it should get more and more flexible to choose your comfortable tool to work with.
2019 is a great year for the .NET community with .NET 3.0/3.1 officially published and ML.NET is getting momentum. In Spark, you can now choose C# or F# as your programming language instead of Scala, Python, R or Java (Find more details here: .NET for Apache Spark Preview with Examples). Microsoft's investment in data frameworks in .NET ecosystem may come a little bit late compared with others but it's definitely worthwhile to explore or adopt further especially if you have a big .NET developer in your organisation. In many scenario, .NET Core based applications runs much quicker compared with other counter-parties.
*Image from ML.NET. Data sourced from Machine Learning at Microsoft with ML.NET paper. Results for sentiment analysis, using ~900 MB of an Amazon review dataset. Higher accuracy and lower runtime are better.
In 2020, the adoption rate may grow significantly as these products are now commonly used in Cloud based products such as Azure Synapse Analytics. It is really a great time for .NET developers to get started in the data journey.
Prerequisites
You can definitely install all the tools in Windows directly. For this article, I am going to show you how to install them in Windows Subsystem for Linux (WSL) so that the steps can work both in Windows (via WSL) and Linux.
- Windows 10 with WSL enabled:
Refer to this article to install WSL.
Or alternatively, you can also follow this page to install it. Install Windows Subsystem for Linux on a Non-System Drive
$ cat /etc/os-release NAME="Ubuntu" VERSION="18.04.1 LTS (Bionic Beaver)" ID=ubuntu ID_LIKE=debian PRETTY_NAME="Ubuntu 18.04.1 LTS" VERSION_ID="18.04" HOME_URL="https://www.ubuntu.com/" SUPPORT_URL="https://help.ubuntu.com/" BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/" PRIVACY_POLICY_URL="..." VERSION_CODENAME=bionic UBUNTU_CODENAME=bionic
- Python 3.6.9 (or later version):
Python version in my WSL distro:
$ python --version Python 3.6.9
- pip
$ pip --version pip 9.0.1 from /usr/lib/python3/dist-packages (python 3.6)
Step 1 - Install .NET SDK 3.1
Install .NET SDK 3.1 from https://dotnet.microsoft.com/download.
For the WSL Ubuntu, follow this official page to install it:
Ubuntu 19.04 Package Manager - Install .NET Core
The main commands I used for my installation are:
- Download package
wget -q https://packages.microsoft.com/config/ubuntu/19.04/packages-microsoft-prod.deb -O packages-microsoft-prod.deb
- Install Microsoft package and setup Microsoft feed
sudo dpkg -i packages-microsoft-prod.deb
- Install .NET SDK
sudo apt-get update sudo apt-get install apt-transport-https sudo apt-get update sudo apt-get install dotnet-sdk-3.1It make take a few minutes to complete the installation. Once it is done, you should be able to verify the installation:
~$ dotnet --version 3.1.100
Step 2 - Install JupyterLab
Run the following command to install Jupyter on WSL Ubuntu distro on Windows 10:
sudo pip install jupyterlab
Verify the installation using the following command:
~$ jupyter kernelspec list Available kernels: python3 /usr/local/share/jupyter/kernels/python3
As you can see from the above command output, currently only python3 kernels are enabled for Jupyter.
Step 3 - Install dotnet try global tool
dotnet tool install -g dotnet-try
The output looks like the following:
$ dotnet tool install -g dotnet-try Since you just installed the .NET Core SDK, you will need to logout or restart your session before running the tool you installed. You can invoke the tool using the following command: dotnet-try Tool 'dotnet-try' (version '1.0.19553.4') was successfully installed.
As the command suggests, please logout from WSL bash window and then open it again to verify the installation:
$ dotnet try Welcome to Try .NET! --------------------- Telemetry --------- The.NET Core tools collect usage data in order to help us improve your experience.The data is anonymous and doesn't include command-line arguments. The data is collected by Microsoft and shared with the community. You can opt-out of telemetry by setting the DOTNET_TRY_CLI_TELEMETRY_OPTOUT environment variable to '1' or 'true' using your favorite shell. Hosting environment: Production Content root path: /home/tangr/.dotnet/tools/.store/dotnet-try/1.0.19553.4/dotnet-try/1.0.19553.4/tools/netcoreapp3.0/any Now listening on: https://localhost:3621 Application started. Press Ctrl+C to shut down.
Step 4 - Install .NET kernel for Jupyter
Now we can install .NET kernel for Jupyter using the following command:
$ dotnet try jupyter install [InstallKernelSpec] Installed kernelspec .net-csharp in /home/tangr/.local/share/jupyter/kernels/.net-csharp .NET kernel installation succeeded [InstallKernelSpec] Installed kernelspec .net-fsharp in /home/tangr/.local/share/jupyter/kernels/.net-fsharp .NET kernel installation succeeded
Verify the result by using the previous kernel list command:
$ jupyter kernelspec list Available kernels: .net-csharp /home/tangr/.local/share/jupyter/kernels/.net-csharp .net-fsharp /home/tangr/.local/share/jupyter/kernels/.net-fsharp python3 /usr/local/share/jupyter/kernels/python3
As you can see from the output, we now have both C# and F# kernel enabled.
Play with .NET kernel in Jupyter
Now we have everything ready and we can play with .NET kernel in Jupyter.
First, let's start Jupyter notbooks using this command:
$ jupyter lab [I 18:07:28.278 LabApp] Writing notebook server cookie secret to /home/tangr/.local/share/jupyter/runtime/notebook_cookie_secret [I 18:07:28.579 LabApp] JupyterLab extension loaded from /usr/local/lib/python3.6/dist-packages/jupyterlab [I 18:07:28.580 LabApp] JupyterLab application directory is /usr/local/share/jupyter/lab [I 18:07:28.597 LabApp] Serving notebooks from local directory: /home/tangr [I 18:07:28.597 LabApp] The Jupyter Notebook is running at: [I 18:07:28.598 LabApp] http://localhost:8888/?token=69634aea67757618e17df111b45a85e1ab8cfbda96efd80a [I 18:07:28.599 LabApp] or http://127.0.0.1:8888/?token=69634aea67757618e17df111b45a85e1ab8cfbda96efd80a [I 18:07:28.600 LabApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation). [W 18:07:28.628 LabApp] No web browser found: could not locate runnable browser. [C 18:07:28.629 LabApp] To access the notebook, open this file in a browser: file:///home/tangr/.local/share/jupyter/runtime/nbserver-1538-open.html Or copy and paste one of these URLs: http://localhost:8888/?token=69634aea67757618e17df111b45a85e1ab8cfbda96efd80a or http://127.0.0.1:8888/?token=69634aea67757618e17df111b45a85e1ab8cfbda96efd80a
You can access Jupyter lab web interface via the URLs printed out. Please keep the WSL bash window open.
The notebook looks like the following in my computer. You can click either .NET (C#) for .NET (F#) to create a new notebook.
For this article, I am going to use C# for all the demos.
Display data
Helper method display can be used to display the data. Code snippet looks like the following:
class Customer{ public string FirstName{get;set;} public string LastName{get;set;} } var oneCustomer = new Customer{FirstName="Raymond",LastName="Tang"}; display(oneCustomer);
var anotherCustomer = new Customer{FirstName="John",LastName="Citizen"}; var allCustomers = new []{oneCustomer,anotherCustomer}; display(allCustomers);
The output looks like the following:
Output HTML
You can output HTML very easily by using HTML helper method.
var html=HTML("<a href='https://kontext.tech/'>Kontext - Cloud, Data & AI Community</a>"); display(html);
The output looks like the following:
Import packages from NuGet
You can easily import NuGet packages using the following syntax:
#r "nuget:<package name>,<package version>"
For example, the following command imports Sqlite package from nuget, which can be used to operate with SQLite database.
#r "nuget:Microsoft.EntityFrameworkCore.Sqlite,2.1.1"
The output looks like the following:
Plot
XPlot.Plotly can be used to plot and there are many examples on the official documentation about using. The following is one example (The example is converted from this F# example):
using XPlot.Plotly; var title = "Basic Bar Chart"; var series1 = new Graph.Bar{ x = new[]{"giraffes", "orangutans", "monkeys"}, y = new[]{20,14,23}, name= "SF Zoo" }; var series2 = new Graph.Bar{ x = new[]{"giraffes", "orangutans", "monkeys"}, y = new[]{12,18,29}, name= "LA Zoo" }; var chart = Chart.Plot(new []{series1, series2}); chart.WithTitle(title); display(chart);
The output looks like the following:
You can find more API references for this library here: https://fslab.org/XPlot/reference/xplot-plotly-graph.html.
Machine learning example
Now let's create a machine learning example using .NET (C#) kernel in Jupyter with ML.NET packages.
To save time, we will just use this example Sentiment Analysis for User Reviews.
The complete code base looks like the following (each code block is a cell in Jupyter notebook):
// Import dependent packages #r "nuget:Microsoft.ML,1.4.0" #r "nuget:Microsoft.ML.AutoML,0.16.0" #r "nuget:Microsoft.Data.Analysis,0.1.0"
// models using Microsoft.ML.Data; public class SentimentIssue { [LoadColumn(0)] public bool Label { get; set; } [LoadColumn(2)] public string Text { get; set; } } public class SentimentPrediction { // ColumnName attribute is used to change the column name from // its default value, which is the name of the field. [ColumnName("PredictedLabel")] public bool Prediction { get; set; } // No need to specify ColumnName attribute, because the field // name "Probability" is the column name we want. public float Probability { get; set; } public float Score { get; set; } }
using System; using System.IO; using System.Net; using Microsoft.ML; using static Microsoft.ML.DataOperationsCatalog; static string GetAbsolutePath(string relativePath) { var currentDir = Directory.GetCurrentDirectory(); string fullPath = Path.Combine(currentDir, relativePath); return fullPath; } static readonly string DataRelativePath = $"./wikiDetoxAnnotated40kRows.tsv"; var dataUrl ="https://github.com/dotnet/machinelearning-samples/raw/master/samples/csharp/getting-started/BinaryClassification_SentimentAnalysis/SentimentAnalysis/Data/wikiDetoxAnnotated40kRows.tsv"; // download the data using (var client = new WebClient()) { client.DownloadFile(dataUrl, DataRelativePath); } static readonly string DataPath = GetAbsolutePath(DataRelativePath); static readonly string ModelRelativePath = $"./SentimentModel.zip"; static readonly string ModelPath = GetAbsolutePath(ModelRelativePath);
// Create MLContext to be shared across the model creation workflow objects // Set a random seed for repeatable/deterministic results across multiple trainings. var mlContext = new MLContext(seed: 1);
// STEP 1: Common data loading configuration IDataView dataView = mlContext.Data.LoadFromTextFile<SentimentIssue>(DataPath, hasHeader: true); TrainTestData trainTestSplit = mlContext.Data.TrainTestSplit(dataView, testFraction: 0.2); IDataView trainingData = trainTestSplit.TrainSet; IDataView testData = trainTestSplit.TestSet;
// STEP 2: Common data process configuration with pipeline data transformations var dataProcessPipeline = mlContext.Transforms.Text.FeaturizeText(outputColumnName: "Features", inputColumnName: nameof(SentimentIssue.Text));
// STEP 3: Set the training algorithm, then create and config the modelBuilder var trainer = mlContext.BinaryClassification.Trainers.SdcaLogisticRegression(labelColumnName: "Label", featureColumnName: "Features"); var trainingPipeline = dataProcessPipeline.Append(trainer);
// STEP 4: Train the model fitting to the DataSet ITransformer trainedModel = trainingPipeline.Fit(trainingData);
// STEP 5: Evaluate the model and show accuracy stats var predictions = trainedModel.Transform(testData); var metrics = mlContext.BinaryClassification.Evaluate(data: predictions, labelColumnName: "Label", scoreColumnName: "Score");
// STEP 6: Save/persist the trained model to a .ZIP file mlContext.Model.Save(trainedModel, trainingData.Schema, ModelPath); display($"The model is saved to {ModelPath}");
// TRY IT: Make a single test prediction, loading the model from .ZIP file SentimentIssue sampleStatement = new SentimentIssue { Text = "I love this movie!" }; // Create prediction engine related to the loaded trained model var predEngine = mlContext.Model.CreatePredictionEngine<SentimentIssue, SentimentPrediction>(trainedModel); // Score var resultprediction = predEngine.Predict(sampleStatement); Console.WriteLine($"=============== Single Prediction ==============="); Console.WriteLine($"Text: {sampleStatement.Text} | Prediction: {(Convert.ToBoolean(resultprediction.Prediction) ? "Toxic" : "Non Toxic")} sentiment | Probability of being toxic: {resultprediction.Probability} "); Console.WriteLine($"================End of Process.Hit any key to exit=================================="); Console.ReadLine();
Final outcome:
=============== Single Prediction =============== Text: I love this movie! | Prediction: Non Toxic sentiment | Probability of being toxic: 0.09407813 ================End of Process.Hit any key to exit==================================
The screenshot of my Jupyter notebook:
Integrates with Spark
Of course, you can easily integrates Jupyter .NET kernel now with Spark.
Refer to this article about how to install Spark .NET. I will publish one more example later on. Please stay tuned.
References
Here are some references for further readings and explorations.
- ML.NET
- dotnet/try
- dotnet/machinelearning-samples (many .NET machine learning examples)
- dotnet/spark
- High performance data pipelines
- More live examples about .NET in Jupyter
Any questions, please let me know.