data.
apache-2.0

Data pipelines,
designed for the agent world.

Seeknal is an all-in-one platform for data and AI/ML engineering. Define pipelines in YAML or Python, run a safe draft → dry-run → apply workflow, and materialize to PostgreSQL and Iceberg in one go — then ask questions in natural language and get answers grounded in your own data.

runtime
py 3.11+
targets
postgres · iceberg
engines
duckdb · sql · python
ai providers
gemini · ollama · openai
▲ install
$ pip install seeknal
the platform

Three verbs. One CLI. Everything else is a command away.

01 / organize
organize.

Transform raw data into structured, reusable knowledge. Point-in-time joins, entity consolidation, watermark-based incrementals — built in.

user_idevent_tsvalue
0012024-01-15142.30
0022024-01-1698.70
0032024-01-17211.00
feature_store · point-in-time ready
sources · transforms · joins
02 / expose
expose.

Publish dashboards. Serve features offline and online. Ask anything in natural language — the agent knows your schema, your pipelines, your exposures.

user score event session report
dashboards · agent · gateway
03 / action
action.

Turn answers into outcomes. Trigger downstream jobs, publish reports, serve features via API, or kick off Telegram alerts — all from the same agent that found the insight.

insight
report
serve
alert
report · api · trigger
the flow

From zero to materialized in four commands.

A safe workflow, by default.

Every pipeline moves through the same four gates — draft, dry-run, apply, query. No silent failures. No surprises in production.

everything included

The parts of your stack that kept you up at night — now in one package.

Dual Pipeline Authoring
Write pipelines in YAML, Python decorators, or both — side by side in the same project. Declarative when you want clarity, imperative when you need logic. The compiler treats them as one graph, so refs, types, and lineage work across the boundary.
yaml · python · one graph
Feature store, both modes
Define ML features in YAML or Python. Entity keys, point-in-time joins, automatic versioning. Offline batch and online real-time serving with a TTL flag.
offline · online · versioned
Draft · Dry-run · Apply
A git-like workflow for pipelines. Know exactly what will change before it runs.
safety · preview · plan
Interactive SQL REPL
Auto-registers parquets, PostgreSQL, Iceberg. Iterate on SQL without leaving the terminal.
duckdb · live
Environment isolation
Per-env profiles, namespace prefixing, one codebase from dev to prod.
dev · staging · prod
Lineage
Column-level lineage across YAML, Python, SQL, and warehouse targets. Click any field to see upstream sources and downstream consumers — no extra instrumentation.
column-level · auto · graph
Build · data checks
Declarative assertions on every transform — nulls, uniqueness, row-count deltas, custom SQL. Fail-loud on dry-run, gate on apply, log on run.
assert · gate · alert
Common
Reusable concepts — entities, metrics, dimensions, time grains — defined once and referenced everywhere. One source of truth for your semantic layer.
entity · metric · semantic
Incremental by default
Watermark-based detection on PostgreSQL, snapshot-based on Iceberg. If fingerprints match, execution is skipped entirely — dependents cascade-invalidate automatically.
watermark · snapshot · 0-wasted-runs
Report & Gateway servers
Self-host published reports with unique share URLs. Expose seeknal ask as an API — WebSocket, SSE, REST, or a Telegram bot.
share · stream · integrate
⬢ seeknal ask

A thinking partner that actually knows your data — and now ingests it.

Drop an .xlsx, .csv, .json or a bank-transfer screenshot into chat. The agent reads your pipelines, entity schema, and exposures — then uses purpose-built tools and built-in skills to move from raw file to queryable table to published report.

Conversational ingest, record text or image. NEW Drop an .xlsx / .csv / .json, paste a URL, type /record fitra, 1 mie ayam, or attach a bank-transfer screenshot — the data-ingest skill walks schema preview → business key → append-or-create, writes a reusable SKILL.md, and surfaces drift + dedup before any write lands in the right ingest_* table.
Confirmation-first. The agent proposes a plan and waits for your go-ahead before acting.
Thin tools, fat skills. Report generation, data profiling, pipeline building — loaded on demand.
Private by default. Run fully local with Ollama, or use Gemini. Your data never leaves your machine unless you say so.
Sandboxed execution. Python runs in an isolated subprocess with restricted imports. Every write emits a provenance JSON sidecar with SHA-256, row counts, and drift decisions.
apply
ship it

Your next pipeline is
one command away.

Open-source. Apache 2.0. No lock-in. Install in thirty seconds and see.

Tweaks