data.

apache-2.0

Data pipelines,
designed for the agent world.

Seeknal is an all-in-one platform for data and AI/ML engineering. Define pipelines in YAML or Python, run a safe draft → dry-run → apply workflow, and materialize to PostgreSQL and Iceberg in one go — then ask questions in natural language and get answers grounded in your own data.

See it run $ pip install seeknal

runtime

py 3.11+

targets

postgres · iceberg

engines

duckdb · sql · python

ai providers

gemini · ollama · openai

the platform

Three verbs. One CLI. Everything else is a command away.

01 / organize

organize.

Transform raw data into structured, reusable knowledge. Point-in-time joins, entity consolidation, watermark-based incrementals — built in.

user_idevent_tsvalue

0012024-01-15142.30

0022024-01-1698.70

0032024-01-17211.00

feature_store · point-in-time ready

sources · transforms · joins

02 / expose

expose.

Publish dashboards. Serve features offline and online. Ask anything in natural language — the agent knows your schema, your pipelines, your exposures.

dashboards · agent · gateway

03 / action

action.

Turn answers into outcomes. Trigger downstream jobs, publish reports, serve features via API, or kick off Telegram alerts — all from the same agent that found the insight.

insight

→

report

→

serve

→

alert

report · api · trigger

everything included

The parts of your stack that kept you up at night — now in one package.

Dual Pipeline Authoring

Write pipelines in YAML, Python decorators, or both — side by side in the same project. Declarative when you want clarity, imperative when you need logic. The compiler treats them as one graph, so refs, types, and lineage work across the boundary.

yaml · python · one graph

Feature store, both modes

Define ML features in YAML or Python. Entity keys, point-in-time joins, automatic versioning. Offline batch and online real-time serving with a TTL flag.

offline · online · versioned

Draft · Dry-run · Apply

A git-like workflow for pipelines. Know exactly what will change before it runs.

safety · preview · plan

Interactive SQL REPL

Auto-registers parquets, PostgreSQL, Iceberg. Iterate on SQL without leaving the terminal.

duckdb · live

Environment isolation

Per-env profiles, namespace prefixing, one codebase from dev to prod.

dev · staging · prod

Lineage

Column-level lineage across YAML, Python, SQL, and warehouse targets. Click any field to see upstream sources and downstream consumers — no extra instrumentation.

column-level · auto · graph

Build · data checks

Declarative assertions on every transform — nulls, uniqueness, row-count deltas, custom SQL. Fail-loud on dry-run, gate on apply, log on run.

assert · gate · alert

Common

Reusable concepts — entities, metrics, dimensions, time grains — defined once and referenced everywhere. One source of truth for your semantic layer.

entity · metric · semantic

Incremental by default

Watermark-based detection on PostgreSQL, snapshot-based on Iceberg. If fingerprints match, execution is skipped entirely — dependents cascade-invalidate automatically.

watermark · snapshot · 0-wasted-runs

Report & Gateway servers

Self-host published reports with unique share URLs. Expose seeknal ask as an API — WebSocket, SSE, REST, or a Telegram bot.

share · stream · integrate

⬢ seeknal ask

A thinking partner that actually knows your data — and now ingests it.

Drop an .xlsx, .csv, .json or a bank-transfer screenshot into chat. The agent reads your pipelines, entity schema, and exposures — then uses purpose-built tools and built-in skills to move from raw file to queryable table to published report.

Conversational ingest, record text or image. NEW Drop an .xlsx / .csv / .json, paste a URL, type /record fitra, 1 mie ayam, or attach a bank-transfer screenshot — the data-ingest skill walks schema preview → business key → append-or-create, writes a reusable SKILL.md, and surfaces drift + dedup before any write lands in the right ingest_* table.

Confirmation-first. The agent proposes a plan and waits for your go-ahead before acting.

Thin tools, fat skills. Report generation, data profiling, pipeline building — loaded on demand.

Private by default. Run fully local with Ollama, or use Gemini. Your data never leaves your machine unless you say so.

Sandboxed execution. Python runs in an isolated subprocess with restricted imports. Every write emits a provenance JSON sidecar with SHA-256, row counts, and drift decisions.

Data pipelines,
designed for the agent world.

Three verbs. One CLI. Everything else is a command away.

From zero to materialized in four commands.

A safe workflow, by default.

The parts of your stack that kept you up at night — now in one package.

A thinking partner that actually knows your data — and now ingests it.

Your next pipeline is
one command away.

Tweaks

Data pipelines, designed for the agent world.

Three verbs. One CLI. Everything else is a command away.

From zero to materialized in four commands.

A safe workflow, by default.

The parts of your stack that kept you up at night — now in one package.

A thinking partner that actually knows your data — and now ingests it.

Your next pipeline isone command away.

Tweaks

Data pipelines,
designed for the agent world.

Your next pipeline is
one command away.