Building Fund Overlap Lab: From Surface Labels to Real Exposure
If you have ever compared two multi-asset funds and thought, "These look different, but are they actually different?" this project is for that exact problem.
Fund Overlap Lab is a Python tool for Vanguard UK funds that retrieves holdings, normalizes names, and calculates overlap. It includes a CLI for fast checks and a Streamlit app for interactive analysis.
JordiCorbilla/fund-overlap-lab: fund-overlap-lab
Why It Exists
Wrapper funds such as LifeStrategy and Target Retirement can look distinct at the top level while sharing large portions of the same underlying exposures.
The project aims to make that visible by combining:
- look-through holdings
- transparent overlap math
- risk and cost context
- practical UI workflows for real portfolio conversations
Core Overlap Model
For two funds, overlap is calculated as:
sum(min(weight_a_i, weight_b_i))
where each i is a normalized underlying holding.
The app then reports:
- overlap percentage
- shared and distinct holdings
- only-in-A and only-in-B tables
- bucket-level aggregation
Data Pipeline: What Changed and Why
Early versions parsed portfolio HTML tables. That broke when pages shifted to JS-driven rendering. The current pipeline is built to survive those changes.
Current strategy:
- primary source: GraphQL endpoint for detailed holdings
- fallback source: HTML table/text extraction where possible
- resilience fallback: API asset-allocation level extraction
- dynamic product lookup from Vanguard product catalog
This layered approach keeps the tool operational when upstream page structures move.
How Data Access Works
Under the hood, the provider follows a staged resolution and retrieval flow:
- Product resolution
- the app accepts multiple identifiers (fund code, SEDOL, slug)
- it queries Vanguard UK product metadata and resolves those inputs to canonical product fields (including portId)
- Primary holdings retrieval (GraphQL)
- with the resolved product, it calls Vanguard UK GraphQL endpoints to fetch holdings details
- this path returns richer data than static page scraping, including constituent-level rows where available
- Recursive expansion (optional)
- in Ultimate Look-Through mode, holdings that are themselves funds or ETFs can be expanded recursively
- each child weight is multiplied through the parent path weight
- recursion is bounded by max depth and protected against cycles
- Fallback retrieval
- if detailed holdings cannot be retrieved, the provider falls back to HTML extraction and then allocation-level API data
- this keeps the app functional even when one upstream path changes
- Normalization and comparison preparation
- holding names are normalized for robust joins
- duplicate rows are consolidated
- percentage fields are standardized before overlap calculations
This design keeps the user workflow simple while handling real-world upstream variability in a resilient way.
Ultimate Look-Through Mode (Recursive)
A major upgrade is recursive decomposition of fund-of-funds structures.
You can now choose between:
- Direct Holdings mode: one layer of holdings
- Ultimate Look-Through mode: recursively expands eligible underlying funds and ETFs
Both Two-Fund Compare and Portfolio Analysis now include:
- an Ultimate Look-Through toggle
- a max depth control to limit recursion
- cycle protection so recursive expansion does not loop
This matters for mixed wrappers where one level of holdings is still another basket.
Example Questions It Answers Quickly
- How much overlap exists between a LifeStrategy fund and a Target Retirement fund right now?
- Is a portfolio diversified across managers and wrappers, or concentrated in repeated underlyings?
- Do risk and OCF differences align with actual exposure differences?
- Does recursive look-through materially change the diversification story?
Project Structure
- fund_overlap_lab/providers.py: data access, product resolution, holdings retrieval
- fund_overlap_lab/compare.py: overlap and portfolio analytics
- fund_overlap_lab/models.py: data models
- fund_overlap_lab/buckets.py: coarse asset bucketing
- fund_overlap_lab/cli.py: command-line workflows
- app.py: Streamlit experience
Run It
Install:
python -m venv .venv
.venv\Scripts\activate
pip install -r requirements.txt
CLI compare:
python -m fund_overlap_lab.cli compare VGL100A VAR45GA
Streamlit app:
streamlit run app.py
Final Thought
Fund Overlap Lab is intentionally practical. It is built to help answer a real question before making allocation changes:
"Am I truly diversifying, or buying the same exposure through different wrappers?"

Comments
Post a Comment