Kontext Data Lake

Learn how to use Kontext Data Lake for secure data storage, analytics, and exploration with project-level isolation.

Kontext Data Lake enables secure ingestion, storage, and querying of structured data files within each project. It provides analytics and data exploration with project-level isolation and access control, allowing users to derive insights from their data while maintaining security.

Kontext Data Lake Query Example

Overview

The Data Lake component supports:

  • File Formats: CSV, TSV, Parquet, JSON/JSONL, and growing
  • Storage: Cloud storages with automatic Parquet optimization
  • Query Execution: Run modern SQL queries
  • Access Control: Project-based roles and permissions and data access via private endpoints/link only

Key Features

  • Automatic Data Ingestion & Profiling: Uploaded files are automatically detected, ingested, and profiled using the serverless job framework. Users can query the data lake immediately after upload—no manual setup required.
  • Cloud-Native Storage: Files are stored in Cloud storages.
  • SQL Query Engine: Query data directly from cloud storage, with intelligent execution mode selection
  • Project Isolation & Security: Each project has its own isolated DuckDB instance with:
    • Role-based access control (Reader, Contributor, Owner, Administrator)
    • Complete data isolation between projects
    • Audit logging for all operations
  • Data Catalog & Query UI: Rich interface for data exploration:
    • Visual schema browser
    • SQL query editor with syntax highlighting
    • Result visualization and export
    • Query history tracking
  • Zero-Config Data Processing:
    • Format-specific optimizations
    • External table registration
    • Statistics generation

How It Works

Upload Files

Add semi-structured and/or structured data files to your project through the web interface (Resources) or API.

Kontext Data Lake Resources Example

The uploaded data files stay with your other resources file like images within the same project.

Automatic Processing

Data file uploads will trigger Kontext jobs to automatically profile your files:

  • File detection and format identification
  • Schema analysis and profiling
  • Parquet optimization (when beneficial)
  • Table registration with metadata

The process creates a dataset named raw for each project.

Query Your Data

Once data profiling is completed, you can start to analyze your data with SQL.

  • Write SQL queries in the web interface
  • Visualize results with built-in charts (coming soon)
  • Export data in various formats (coming soon)
  • Track query history

Security & Governance

Kontext prioritizes data security and governance as core pillars of the platform, offering a range of features designed to support these objectives:

  • Project-level access control
  • Query auditing and monitoring
  • Resource usage tracking