Get Started with Apache Kylin - OLAP for Big Data

event 2023-09-14 visibility 759 comment 0 insights
more_vert
insights Stats
Get Started with Apache Kylin - OLAP for Big Data
Raymond Raymond The Data Engineering Column

The column is for data engineering. This column talks about small topics related to data engineering. 

Apache Kylin is an open source analytical data warehouse for Big Data. It supports OLAP workloads with sub-second latency.  You can use Kylin to build cubes from identified tables. The official project site is hosted at: Apache Kylin | Analytical Data Warehouse for Big Data. This tutorial provides how to setup a Kylin environment quickly using Docker.

Apache Kylin architecture

The following diagram shows how Apache Kylin works on big data.

2023091485628-image.png

*Image credit - https://kylin.apache.org/assets/images/kylin_diagram.png

Prerequisites

Apache Kylin can be configured in your big data cluster as Spark or other frameworks does. To save time and effort, we will use the official docker image. Please install the latest Docker Desktop if it is not available in your system.

If you use WSL 2 in Docker, please ensure sufficient memory is configured in .wslconfig file:

[wsl2]
memory=8GB   # Limits VM memory in WSL 2 

Pull image

Run the following command to pull the latest image (as at 14/09/2023):

docker pull apachekylin/apache-kylin-standalone:5.0-beta

The above command pulls the latest 5.0.0-beta release.

In the image, Hadoop, Hive (incl. metastore database MySQL), Spark and ZooKeeper are also included to support Apache Kylin.

Start the container

Run the following command to start the container:

Bash:

docker run -d \
  --name Kylin5-Machine \
  --hostname Kylin5-Machine \
  -m 8G \
  -p 7070:7070 \
  -p 8088:8088 \
  -p 9870:9870 \
  -p 8032:8032 \
  -p 8042:8042 \
  -p 2181:2181 \
  apachekylin/apache-kylin-standalone:5.0-beta

PowerShell:

docker run -d `
  --name Kylin5-Machine `
  --hostname Kylin5-Machine `
  -m 8G `
  -p 7070:7070 `
  -p 8088:8088 `
  -p 9870:9870 `
  -p 8032:8032 `
  -p 8042:8042 `
  -p 2181:2181 `
  apachekylin/apache-kylin-standalone:5.0-beta

If any port is used by other programs in the host machine, you can change the port mapping to other ports, for example -p 10088:8088.

And then run the following command to display the logs:

docker logs --follow Kylin5-Machine

Wait until all services are started. It may take quite a few minutes as it performs the following actions

  • MySQL service

  • Init Hive schema for metastore

  • HDFS format

  • HDFS (NameNode and DataNode)

  • Hive services

  • YARN (ResourceManager and NodeManager)

  • Load sample data into HDFS for Kylin and create tables: ssb.customer, ssb.dates, ssb.lineorder, ssb.part, ssb.supplier

  • Create sample model

  • Start Kylin instance

2023091494610-image.png

When all services are started, you should be able to see the following log:

Kylin service is already available for you to preview.

Services in the container

The following services are available:

If you cannot open Kylin web UI, the service might not started successfully. You can try run the following command in the container's terminal:

${KYLIN_HOME}/bin/kylin.sh start

Sometimes you may need to wait for a while before the web service is up.

About the sample data model

The sample data model is a star-schema as the following screenshot shows:

2023091494820-image.png

*Image credit: https://kylin.apache.org/5.0/assets/images/dataset-d22cdf576e3d87e0f1a2b4531b6a5d60.png

The fact table is linked to the dimensional tables. For more information about the sample dataset, please refer to Sample dataset | Welcome to Kylin 5 (apache.org).

Explore Kylin UI

Open http://localhost:7070/kylin in a browser, we can explore the UI of Kylin. Please login with the following credential:

  • username: ADMIN

  • password: KYLIN

20230914111633-image.png

The UI provides pages to create projects, add data sources and design models and indexes, load data (load data from source, build indexes and pre-calculation), query data using ANSI SQL, monitor jobs, etc.


20230914111850-image.png

Dashboard

The following screenshot shows the dashboard about stats.

20230914111738-image.png

Query the data

Run the following sample query in the SQL editor:

SELECT LO_PARTKEY, SUM(LO_REVENUE) AS TOTAL_REVENUE
FROM SSB.P_LINEORDERWHERE LO_ORDERDATE between '1993-06-01' AND '1994-06-01' group by LO_PARTKEYorder by SUM(LO_REVENUE) DESC 


The output looks like the following screenshot:

20230914112043-image.png


Stop the container

To stop the container, please run the following command:

docker stop Kylin5-Machine

Remove the container

If you also want to remove the container, please run the following command:

docker rm Kylin5-Machine

Summary

If you are building cubes for your OLAP projects on traditional relational database and would like to migrate over to a big data, horizontally scalable platform, Apache Kylin can be a good choice.

More from Kontext
comment Comments
No comments yet.

Please log in or register to comment.

account_circle Log in person_add Register

Log in with external accounts