Building a Local MiFID Regulatory Copilot
I built this project for fun as a local-first regulatory copilot for MiFID II, MiFIR, UK MiFIR, transaction reporting, transparency, ARM/APA workflows, and source-backed analysis. The goal was not to build a generic chatbot. The goal was to build a reviewable regulatory workbench: a tool that starts from official documents, shows the evidence it used, and fails conservatively when the corpus is incomplete.
The app now runs as a FastAPI service with a browser UI, local Ollama generation, a manifest-driven regulatory corpus, hybrid retrieval, evidence scoring, citations, and a corpus explorer. It is designed to be deployed on a private server and exposed through Cloudflare Tunnel, while keeping the source documents and indexes under local control.
Why Local First
Regulatory analysis has two constraints that are easy to underestimate.
First, the answer must be traceable. If an assistant says a transaction is reportable, or that a transparency deferral may apply, a reviewer needs to see the exact document, page, article, and passage behind that claim.
Second, the source set must be controlled. MiFID and MiFIR materials exist across Level 1 legislation, RTS, ESMA Q&A, reporting instructions, XML schemas, FCA pages, FCA Handbook material, discussion papers, consultation papers, and policy statements. A useful assistant needs to know which sources are current, which are historical, which are proposals, and which jurisdiction they belong to.
Those constraints pushed the project toward retrieval-augmented generation instead of fine-tuning. Fine-tuning can help with style or workflow patterns, but it should not be the source of truth for current law. The law and guidance should live in an auditable corpus.
Architecture
The architecture has five layers:
- Source manifest
- Ingestion and chunking
- Keyword and vector indexing
- Route-aware retrieval and answer generation
- UI, evidence quality, and corpus inspection
data/raw, then parsed, chunked, and indexed.FastAPI serves the query endpoints and the static UI. Ollama provides local generation. The deployment model binds the API to 127.0.0.1; Cloudflare Tunnel can expose it through HTTPS without opening an inbound firewall port. HTTP Basic authentication is loaded from an ignored local config/auth.yaml, and a public showcase should additionally use Cloudflare Access.
Corpus Chunk Methodology
The most important implementation detail is chunking. Early versions of the corpus produced low-value chunks such as cover pages, titles, and fragments that looked meaningful in the UI but did not carry legal substance. That created bad retrieval: if the model receives weak context, it will either answer weakly or over-interpret the wrong passage.
The current chunker is legal-boundary aware. It looks for headings such as:
Article 26Recital (12)CHAPTER IVSECTION 2ANNEX IQuestion 3Table 2
When a boundary is found, the chunker keeps the surrounding legal unit together where possible. It then enforces a size window from config/retrieval.yaml: chunks should be large enough to carry meaning, but small enough to fit into retrieval and prompt context. Oversized blocks are split by paragraph. Undersized blocks are merged with nearby legal blocks until they become useful.
Each chunk carries metadata:
source_id- document title and publisher
- jurisdiction, regime, and domain tags
- document type, priority, and status
- page number
- detected article, section, annex, question, or table
- original extracted text
The chunker also filters low-value front matter. Short policy-statement covers, tables of contents, and similar non-substantive fragments are excluded unless they contain strong legal signals. This matters because the corpus explorer is not just a debug tool; it shows exactly what the answer engine can see.
Retrieval And Accuracy Controls
Retrieval is hybrid. The app builds:
- a SQLite FTS5 keyword index for exact legal terms such as
Article 26,RTS 22,FIRDS,deferral, orAPA - a local vector index for semantic matching
The hybrid retriever merges both result sets, applies route filters, and adds ranking boosts for source priority, active status, consolidated source material, and domain-specific intent.
The router classifies each question by:
- jurisdiction: EU or UK
- regime: MiFIR, MiFID II, or UK MiFIR
- domain: transaction reporting, transparency, reference data, algorithmic trading, order records, or overview
That route is important. A UK transparency question should not accidentally retrieve EU RTS material unless the user asks for a comparison. A MiFID overview should prefer Level 1 Directive scope material instead of a narrow RTS. A transaction reportability question should route to MiFIR Article 26 and RTS 22, even if a user casually says "under MiFID II".
That last example became a useful design test. A question like "What information is needed to determine transaction reportability under MiFID II?" was initially pulling in RTS 25 clock synchronisation because RTS 25 is MiFID II material and includes transaction-adjacent wording. The fix was not to hard-code one answer. The fix was to improve routing and retrieval:
- transaction reportability now routes to MiFIR Article 26 / RTS 22
- reportability retrieval seeds canonical chunks from MiFIR Article 26, RTS 22 Article 1, RTS 22 Article 2, and RTS 22 Table 2
- RTS 25 clock synchronisation is penalised for reportability questions unless the user asks about clocks, timestamps, or RTS 25
- a regression test locks this behaviour
This is the general pattern for accuracy: make the route explicit, prefer canonical sources, show citations, and add tests for retrieval failures.
How An Answer Is Derived
When a user asks a question, the app does the following:
- Classifies the question into jurisdiction, regime, and domain.
- Expands the retrieval query with domain-specific terms.
- Retrieves source chunks using keyword search, vector search, metadata filters, and seeded canonical evidence.
- Builds a prompt containing only the retrieved context and route assumptions.
- Calls the local Ollama model (using gemma3n).
- Converts the generated answer into a deterministic report format.
- Builds citations directly from the retrieved chunks.
- Scores the evidence quality and shows warnings.
The deterministic formatter is deliberate. LLMs are useful for summarising and connecting evidence, but the outer structure should be stable. The app controls the headings, assessment status, citation list, required data fields, missing facts, systems impact, validation rules, and confidence statement. This avoids UI-breaking formatting drift and keeps reviewer-facing output predictable.
Evidence Quality
Every answer gets an evidence-quality report. The score considers:
- number of retrieved sources
- number of retrieved chunks
- source priority
- document type
- source status
- jurisdiction match
- regime match
- domain match
The UI treats this as a gate, not decoration. Low means the answer should not be relied on. Medium means draft use only. High means the evidence base is strong enough for analyst review, still subject to compliance sign-off.
Citations show the source ID, title, document type, page, article or section when available, and a short source preview. Retrieved chunks are also listed so a reviewer can inspect what the model saw.
The Corpus Explorer
The corpus explorer makes the knowledge base inspectable. It answers questions like:
- Which official files are loaded?
- Which sources are missing?
- How many chunks were produced per source?
- What exact text is available to retrieval?
- Which article, page, or table did a chunk come from?
This is essential for trust. When an answer looks wrong, the first question should not be "why did the model say that?" It should be "what evidence did retrieval give the model?" The corpus explorer gives that answer directly.
Deployment Model
The deployment shape is intentionally small:
- Clone the repository on the server. (Source code is still private, if you want to see it, reach out)
- Create
config/auth.yamllocally and keep it out of git. - Add the official source files under the paths listed in
config/source_manifest.yaml. - Run manifest validation.
- Build the corpus.
- Build the indexes.
- Start FastAPI bound to
127.0.0.1. - Expose it through Cloudflare Tunnel.
The current helper script can start the API and quick Cloudflare tunnel for demos. For a stable external setup, use a named Cloudflare Tunnel, a stable hostname, Cloudflare Access, log rotation, dependency scanning, rate limits, and a regular corpus refresh process.
Testing Strategy
The test suite covers the mechanics that matter for this kind of app:
- source manifest validation
- chunk metadata extraction
- retrieval filtering
- route classification
- overview retrieval for MiFID and MiFIR
- transaction reportability routing to MiFIR / RTS 22
- evidence scoring
- acceptance questions with expected source IDs and expected terms
The acceptance file is intentionally simple: each question defines the expected jurisdiction, regime, domain, minimum evidence level, expected sources, and expected terms. This provides a repeatable sanity check without requiring the LLM to produce byte-identical prose.
For a regulatory copilot, this is the right kind of test. The generated wording can vary, but the evidence path should be stable.
What This Enables
The result is a local regulatory copilot that can:
- answer MiFID / MiFIR reporting questions from local official sources
- separate EU and UK material
- cite source chunks
- flag weak evidence
- expose the corpus for review
- support controlled external demos through Cloudflare
- grow by adding new manifest sources and domain-specific retrieval seeds
The app is usable today as a showcase and analyst draft tool. It is not a replacement for legal or compliance sign-off. The value is that it makes the evidence path visible and testable, which is exactly what a regulatory assistant needs before it can become operationally useful.
Next Extensions
The strongest next improvements would be:
- replace fallback hash embeddings with a production embedding model
- add a reranker for better long-document retrieval
- improve table extraction for RTS field tables and FCA policy statements
- add source freshness checks and supersession metadata
- add Cloudflare Access and request rate limiting for public demos
- add structured audit logs for questions, routes, retrieved chunks, and evidence scores
- build workflow-specific dashboards for transaction reporting, transparency, ARMs, APAs, and exception management
The architecture is intentionally extensible: new source types enter through the manifest, new documents become chunks, chunks become searchable evidence, and route-specific retrieval rules can be added without changing the UI contract.

Comments
Post a Comment