Skip to content

Overview

A short, high-level tour of openKMS. For the full system design see Architecture; for individual features see Functionalities.

What problems does it solve?

  • A single place to collect, parse, and search mixed content (PDF/HTML/ZIP/images, articles, wiki notes).
  • A RAG layer (knowledge bases + QA Agent) that grounds answers in those documents.
  • An ontology and knowledge map so domain terms map to actual channels and pages.
  • Org-friendly access control: OIDC or local users, role-based permissions, group-scoped data.

The three content surfaces

Documents

  • Upload to a document channel (a folder in a tree).
  • A worker picks up the job and runs openkms-cli with PaddleOCR-VL (via the separate mlx-vlm server) to produce Markdown plus per-page layout / block images.
  • Originals live in S3/MinIO under {file_hash}/. Markdown is editable in the UI; explicit version snapshots are stored in document_versions.
  • Lifecycle: series_id, effective_from, effective_to, lifecycle_status, plus document_relationships (supersedes, amends, implements, see_also).

Articles

  • Markdown-first CMS organised in article channels (separate tree from documents — no parsing pipeline).
  • Inline images and arbitrary attachments are uploaded to MinIO under articles/{article_id}/.
  • A POST /api/articles/import multipart endpoint lets external tools push a fully-formed article (markdown + images + attachments) in one call, with an origin_article_id (Source) for provenance and idempotent upserts.
  • Article-to-article Relationships mirror document lineage (supersedes, amends, see_also, …).

Knowledge bases

  • A KB indexes documents from one or more channels. FAQs can be hand-written or LLM-generated; chunks are stored with embeddings in pgvector.
  • The QA Agent is a separate FastAPI + LangGraph service that retrieves through the backend search API and generates answers.
  • Hybrid search supports metadata filters and an opt-in include_historical_documents flag (default respects each document's is_current_for_rag).

Supporting surfaces

  • Wiki spaces — free-form notes with vault import, page graph view, and a Wiki Copilot that can read pages and (with wikis:write) upsert them.
  • Knowledge Map — taxonomy of terms with links to channels / wiki spaces / article channels; rendered as a force graph on the home page.
  • Glossaries — bilingual (EN/CN) term definitions with AI-suggested translations.
  • Ontology (objects & links) — typed object instances and link types stored in the same Postgres database.
  • Pipelines, Jobs, Models, Data sources, Datasets, Evaluations — operator-facing surfaces under the Console and the Ontology sidebar.

Auth in one paragraph

OPENKMS_AUTH_MODE=oidc (default) uses an external OpenID Connect IdP with PKCE in the SPA. OPENKMS_AUTH_MODE=local keeps users and bcrypt hashes in PostgreSQL and issues HS256 JWTs (plus optional HTTP Basic for openkms-cli). Either way the backend accepts Authorization: Bearer or a session cookie. Permissions are catalog-based (security_permissions rows with route/API patterns); roles map to permission keys; group data scopes can additionally narrow what a user sees per resource. See Security.

Where things live (one-liners)

  • PostgreSQL + pgvector — relational truth, embeddings, procrastinate job queue.
  • S3 / MinIO — originals ({file_hash}/), article bundles (articles/{id}/), wiki vaults (wiki/{space_id}/vault/), graph cache JSON.
  • Worker — runs openkms-cli jobs, calls the VLM server, indexes KBs.
  • mlx-vlm server — runs PaddleOCR-VL; deliberately separate from the main stack so you can put it on Apple Silicon / a GPU box.
  • QA Agent — separate process; never touches the DB; only reads via backend APIs.