Building a Local MiFID Regulatory Copilot

June 13, 2026

Building a Local MiFID Regulatory Copilot

I built this project for fun as a local-first regulatory copilot for MiFID II, MiFIR, UK MiFIR, transaction reporting, transparency, ARM/APA workflows, and source-backed analysis. The goal was not to build a generic chatbot. The goal was to build a reviewable regulatory workbench: a tool that starts from official documents, shows the evidence it used, and fails conservatively when the corpus is incomplete.

The app now runs as a FastAPI service with a browser UI, local Ollama generation, a manifest-driven regulatory corpus, hybrid retrieval, evidence scoring, citations, and a corpus explorer. It is designed to be deployed on a private server and exposed through Cloudflare Tunnel, while keeping the source documents and indexes under local control.

Why Local First

Regulatory analysis has two constraints that are easy to underestimate.

First, the answer must be traceable. If an assistant says a transaction is reportable, or that a transparency deferral may apply, a reviewer needs to see the exact document, page, article, and passage behind that claim.

Second, the source set must be controlled. MiFID and MiFIR materials exist across Level 1 legislation, RTS, ESMA Q&A, reporting instructions, XML schemas, FCA pages, FCA Handbook material, discussion papers, consultation papers, and policy statements. A useful assistant needs to know which sources are current, which are historical, which are proposals, and which jurisdiction they belong to.

Those constraints pushed the project toward retrieval-augmented generation instead of fine-tuning. Fine-tuning can help with style or workflow patterns, but it should not be the source of truth for current law. The law and guidance should live in an auditable corpus.

Architecture

The architecture has five layers:

Source manifest
Ingestion and chunking
Keyword and vector indexing
Route-aware retrieval and answer generation
UI, evidence quality, and corpus inspection

The source manifest is the control plane. Each entry records the source ID, title, publisher, jurisdiction, regime, document type, domain tags, priority, status, official URL, and expected local path. The app does not silently scrape the internet at query time. Documents are downloaded into data/raw, then parsed, chunked, and indexed.

FastAPI serves the query endpoints and the static UI. Ollama provides local generation. The deployment model binds the API to 127.0.0.1; Cloudflare Tunnel can expose it through HTTPS without opening an inbound firewall port. HTTP Basic authentication is loaded from an ignored local config/auth.yaml, and a public showcase should additionally use Cloudflare Access.

Corpus Chunk Methodology

The most important implementation detail is chunking. Early versions of the corpus produced low-value chunks such as cover pages, titles, and fragments that looked meaningful in the UI but did not carry legal substance. That created bad retrieval: if the model receives weak context, it will either answer weakly or over-interpret the wrong passage.

The current chunker is legal-boundary aware. It looks for headings such as:

Article 26
Recital (12)
CHAPTER IV
SECTION 2
ANNEX I
Question 3
Table 2

When a boundary is found, the chunker keeps the surrounding legal unit together where possible. It then enforces a size window from config/retrieval.yaml: chunks should be large enough to carry meaning, but small enough to fit into retrieval and prompt context. Oversized blocks are split by paragraph. Undersized blocks are merged with nearby legal blocks until they become useful.

Each chunk carries metadata:

source_id
document title and publisher
jurisdiction, regime, and domain tags
document type, priority, and status
page number
detected article, section, annex, question, or table
original extracted text

The chunker also filters low-value front matter. Short policy-statement covers, tables of contents, and similar non-substantive fragments are excluded unless they contain strong legal signals. This matters because the corpus explorer is not just a debug tool; it shows exactly what the answer engine can see.

Retrieval And Accuracy Controls

Retrieval is hybrid. The app builds:

a SQLite FTS5 keyword index for exact legal terms such as Article 26, RTS 22, FIRDS, deferral, or APA
a local vector index for semantic matching

The hybrid retriever merges both result sets, applies route filters, and adds ranking boosts for source priority, active status, consolidated source material, and domain-specific intent.

The router classifies each question by:

jurisdiction: EU or UK
regime: MiFIR, MiFID II, or UK MiFIR
domain: transaction reporting, transparency, reference data, algorithmic trading, order records, or overview

That route is important. A UK transparency question should not accidentally retrieve EU RTS material unless the user asks for a comparison. A MiFID overview should prefer Level 1 Directive scope material instead of a narrow RTS. A transaction reportability question should route to MiFIR Article 26 and RTS 22, even if a user casually says "under MiFID II".

That last example became a useful design test. A question like "What information is needed to determine transaction reportability under MiFID II?" was initially pulling in RTS 25 clock synchronisation because RTS 25 is MiFID II material and includes transaction-adjacent wording. The fix was not to hard-code one answer. The fix was to improve routing and retrieval:

transaction reportability now routes to MiFIR Article 26 / RTS 22
reportability retrieval seeds canonical chunks from MiFIR Article 26, RTS 22 Article 1, RTS 22 Article 2, and RTS 22 Table 2
RTS 25 clock synchronisation is penalised for reportability questions unless the user asks about clocks, timestamps, or RTS 25
a regression test locks this behaviour

This is the general pattern for accuracy: make the route explicit, prefer canonical sources, show citations, and add tests for retrieval failures.

How An Answer Is Derived

When a user asks a question, the app does the following:

Classifies the question into jurisdiction, regime, and domain.
Expands the retrieval query with domain-specific terms.
Retrieves source chunks using keyword search, vector search, metadata filters, and seeded canonical evidence.
Builds a prompt containing only the retrieved context and route assumptions.
Calls the local Ollama model (using gemma3n).
Converts the generated answer into a deterministic report format.
Builds citations directly from the retrieved chunks.
Scores the evidence quality and shows warnings.

The deterministic formatter is deliberate. LLMs are useful for summarising and connecting evidence, but the outer structure should be stable. The app controls the headings, assessment status, citation list, required data fields, missing facts, systems impact, validation rules, and confidence statement. This avoids UI-breaking formatting drift and keeps reviewer-facing output predictable.

Evidence Quality

Every answer gets an evidence-quality report. The score considers:

number of retrieved sources
number of retrieved chunks
source priority
document type
source status
jurisdiction match
regime match
domain match

The UI treats this as a gate, not decoration. Low means the answer should not be relied on. Medium means draft use only. High means the evidence base is strong enough for analyst review, still subject to compliance sign-off.

Citations show the source ID, title, document type, page, article or section when available, and a short source preview. Retrieved chunks are also listed so a reviewer can inspect what the model saw.

The Corpus Explorer

The corpus explorer makes the knowledge base inspectable. It answers questions like:

Which official files are loaded?
Which sources are missing?
How many chunks were produced per source?
What exact text is available to retrieval?
Which article, page, or table did a chunk come from?

This is essential for trust. When an answer looks wrong, the first question should not be "why did the model say that?" It should be "what evidence did retrieval give the model?" The corpus explorer gives that answer directly.

Deployment Model

The deployment shape is intentionally small:

Clone the repository on the server. (Source code is still private, if you want to see it, reach out)
Create config/auth.yaml locally and keep it out of git.
Add the official source files under the paths listed in config/source_manifest.yaml.
Run manifest validation.
Build the corpus.
Build the indexes.
Start FastAPI bound to 127.0.0.1.
Expose it through Cloudflare Tunnel.

The current helper script can start the API and quick Cloudflare tunnel for demos. For a stable external setup, use a named Cloudflare Tunnel, a stable hostname, Cloudflare Access, log rotation, dependency scanning, rate limits, and a regular corpus refresh process.

Testing Strategy

The test suite covers the mechanics that matter for this kind of app:

source manifest validation
chunk metadata extraction
retrieval filtering
route classification
overview retrieval for MiFID and MiFIR
transaction reportability routing to MiFIR / RTS 22
evidence scoring
acceptance questions with expected source IDs and expected terms

The acceptance file is intentionally simple: each question defines the expected jurisdiction, regime, domain, minimum evidence level, expected sources, and expected terms. This provides a repeatable sanity check without requiring the LLM to produce byte-identical prose.

For a regulatory copilot, this is the right kind of test. The generated wording can vary, but the evidence path should be stable.

What This Enables

The result is a local regulatory copilot that can:

answer MiFID / MiFIR reporting questions from local official sources
separate EU and UK material
cite source chunks
flag weak evidence
expose the corpus for review
support controlled external demos through Cloudflare
grow by adding new manifest sources and domain-specific retrieval seeds

The app is usable today as a showcase and analyst draft tool. It is not a replacement for legal or compliance sign-off. The value is that it makes the evidence path visible and testable, which is exactly what a regulatory assistant needs before it can become operationally useful.

Next Extensions

The strongest next improvements would be:

replace fallback hash embeddings with a production embedding model
add a reranker for better long-document retrieval
improve table extraction for RTS field tables and FCA policy statements
add source freshness checks and supersession metadata
add Cloudflare Access and request rate limiting for public demos
add structured audit logs for questions, routes, retrieved chunks, and evidence scores
build workflow-specific dashboards for transaction reporting, transparency, ARMs, APAs, and exception management

The architecture is intentionally extensible: new source types enter through the manifest, new documents become chunks, chunks become searchable evidence, and route-specific retrieval rules can be added without changing the UI contract.

Search This Blog

Random thoughts on coding and technology

Building a Local MiFID Regulatory Copilot

Why Local First

Architecture

Corpus Chunk Methodology

Retrieval And Accuracy Controls

How An Answer Is Derived

Evidence Quality

The Corpus Explorer

Deployment Model

Testing Strategy

What This Enables

Next Extensions

Comments

Post a Comment

Popular Posts

Train Your Own LoRA with ComfyUI: A Step-by-Step Guide

Firebase Cloud Messaging with Delphi 10.1 Berlin update 2.