Building Fund Overlap Lab: From Surface Labels to Real Exposure

April 08, 2026

Building Fund Overlap Lab: From Surface Labels to Real Exposure

If you have ever compared two multi-asset funds and thought, "These look different, but are they actually different?" this project is for that exact problem.

Fund Overlap Lab is a Python tool for Vanguard UK funds that retrieves holdings, normalizes names, and calculates overlap. It includes a CLI for fast checks and a Streamlit app for interactive analysis.

JordiCorbilla/fund-overlap-lab: fund-overlap-lab

Why It Exists

Wrapper funds such as LifeStrategy and Target Retirement can look distinct at the top level while sharing large portions of the same underlying exposures.

The project aims to make that visible by combining:

look-through holdings
transparent overlap math
risk and cost context
practical UI workflows for real portfolio conversations

Core Overlap Model

For two funds, overlap is calculated as:

sum(min(weight_a_i, weight_b_i))

where each i is a normalized underlying holding.

The app then reports:

overlap percentage
shared and distinct holdings
only-in-A and only-in-B tables
bucket-level aggregation

Data Pipeline: What Changed and Why

Early versions parsed portfolio HTML tables. That broke when pages shifted to JS-driven rendering. The current pipeline is built to survive those changes.

Current strategy:

primary source: GraphQL endpoint for detailed holdings
fallback source: HTML table/text extraction where possible
resilience fallback: API asset-allocation level extraction
dynamic product lookup from Vanguard product catalog

This layered approach keeps the tool operational when upstream page structures move.

How Data Access Works

Under the hood, the provider follows a staged resolution and retrieval flow:

Product resolution

the app accepts multiple identifiers (fund code, SEDOL, slug)
it queries Vanguard UK product metadata and resolves those inputs to canonical product fields (including portId)

Primary holdings retrieval (GraphQL)

with the resolved product, it calls Vanguard UK GraphQL endpoints to fetch holdings details
this path returns richer data than static page scraping, including constituent-level rows where available

Recursive expansion (optional)

in Ultimate Look-Through mode, holdings that are themselves funds or ETFs can be expanded recursively
each child weight is multiplied through the parent path weight
recursion is bounded by max depth and protected against cycles

Fallback retrieval

if detailed holdings cannot be retrieved, the provider falls back to HTML extraction and then allocation-level API data
this keeps the app functional even when one upstream path changes

Normalization and comparison preparation

holding names are normalized for robust joins
duplicate rows are consolidated
percentage fields are standardized before overlap calculations

This design keeps the user workflow simple while handling real-world upstream variability in a resilient way.

Ultimate Look-Through Mode (Recursive)

A major upgrade is recursive decomposition of fund-of-funds structures.

You can now choose between:

Direct Holdings mode: one layer of holdings
Ultimate Look-Through mode: recursively expands eligible underlying funds and ETFs

Both Two-Fund Compare and Portfolio Analysis now include:

an Ultimate Look-Through toggle
a max depth control to limit recursion
cycle protection so recursive expansion does not loop

This matters for mixed wrappers where one level of holdings is still another basket.

Example Questions It Answers Quickly

How much overlap exists between a LifeStrategy fund and a Target Retirement fund right now?
Is a portfolio diversified across managers and wrappers, or concentrated in repeated underlyings?
Do risk and OCF differences align with actual exposure differences?
Does recursive look-through materially change the diversification story?

Project Structure

fund_overlap_lab/providers.py: data access, product resolution, holdings retrieval
fund_overlap_lab/compare.py: overlap and portfolio analytics
fund_overlap_lab/models.py: data models
fund_overlap_lab/buckets.py: coarse asset bucketing
fund_overlap_lab/cli.py: command-line workflows
app.py: Streamlit experience

Run It

Install:

python -m venv .venv
.venv\Scripts\activate
pip install -r requirements.txt

CLI compare:

python -m fund_overlap_lab.cli compare VGL100A VAR45GA

Streamlit app:

streamlit run app.py

Final Thought

Fund Overlap Lab is intentionally practical. It is built to help answer a real question before making allocation changes:

"Am I truly diversifying, or buying the same exposure through different wrappers?"

Search This Blog

Random thoughts on coding and technology

Building Fund Overlap Lab: From Surface Labels to Real Exposure

Why It Exists

Core Overlap Model

Data Pipeline: What Changed and Why

How Data Access Works

Ultimate Look-Through Mode (Recursive)

Example Questions It Answers Quickly

Project Structure

Run It

Final Thought

Comments

Post a Comment

Popular Posts

Train Your Own LoRA with ComfyUI: A Step-by-Step Guide

Firebase Cloud Messaging with Delphi 10.1 Berlin update 2.