Building Fund Overlap Lab: From Surface Labels to Real Exposure

If you have ever compared two multi-asset funds and thought, "These look different, but are they actually different?" this project is for that exact problem.

Fund Overlap Lab is a Python tool for Vanguard UK funds that retrieves holdings, normalizes names, and calculates overlap. It includes a CLI for fast checks and a Streamlit app for interactive analysis.

JordiCorbilla/fund-overlap-lab: fund-overlap-lab


Why It Exists

Wrapper funds such as LifeStrategy and Target Retirement can look distinct at the top level while sharing large portions of the same underlying exposures.

The project aims to make that visible by combining:

  • look-through holdings
  • transparent overlap math
  • risk and cost context
  • practical UI workflows for real portfolio conversations

Core Overlap Model

For two funds, overlap is calculated as:

sum(min(weight_a_i, weight_b_i))

where each i is a normalized underlying holding.

The app then reports:

  • overlap percentage
  • shared and distinct holdings
  • only-in-A and only-in-B tables
  • bucket-level aggregation

Data Pipeline: What Changed and Why

Early versions parsed portfolio HTML tables. That broke when pages shifted to JS-driven rendering. The current pipeline is built to survive those changes.

Current strategy:

  • primary source: GraphQL endpoint for detailed holdings
  • fallback source: HTML table/text extraction where possible
  • resilience fallback: API asset-allocation level extraction
  • dynamic product lookup from Vanguard product catalog

This layered approach keeps the tool operational when upstream page structures move.

How Data Access Works

Under the hood, the provider follows a staged resolution and retrieval flow:

  1. Product resolution
  • the app accepts multiple identifiers (fund code, SEDOL, slug)
  • it queries Vanguard UK product metadata and resolves those inputs to canonical product fields (including portId)
  1. Primary holdings retrieval (GraphQL)
  • with the resolved product, it calls Vanguard UK GraphQL endpoints to fetch holdings details
  • this path returns richer data than static page scraping, including constituent-level rows where available
  1. Recursive expansion (optional)
  • in Ultimate Look-Through mode, holdings that are themselves funds or ETFs can be expanded recursively
  • each child weight is multiplied through the parent path weight
  • recursion is bounded by max depth and protected against cycles
  1. Fallback retrieval
  • if detailed holdings cannot be retrieved, the provider falls back to HTML extraction and then allocation-level API data
  • this keeps the app functional even when one upstream path changes
  1. Normalization and comparison preparation
  • holding names are normalized for robust joins
  • duplicate rows are consolidated
  • percentage fields are standardized before overlap calculations

This design keeps the user workflow simple while handling real-world upstream variability in a resilient way.

Ultimate Look-Through Mode (Recursive)

A major upgrade is recursive decomposition of fund-of-funds structures.

You can now choose between:

  • Direct Holdings mode: one layer of holdings
  • Ultimate Look-Through mode: recursively expands eligible underlying funds and ETFs

Both Two-Fund Compare and Portfolio Analysis now include:

  • an Ultimate Look-Through toggle
  • a max depth control to limit recursion
  • cycle protection so recursive expansion does not loop

This matters for mixed wrappers where one level of holdings is still another basket.

Example Questions It Answers Quickly

  • How much overlap exists between a LifeStrategy fund and a Target Retirement fund right now?
  • Is a portfolio diversified across managers and wrappers, or concentrated in repeated underlyings?
  • Do risk and OCF differences align with actual exposure differences?
  • Does recursive look-through materially change the diversification story?

Project Structure

  • fund_overlap_lab/providers.py: data access, product resolution, holdings retrieval
  • fund_overlap_lab/compare.py: overlap and portfolio analytics
  • fund_overlap_lab/models.py: data models
  • fund_overlap_lab/buckets.py: coarse asset bucketing
  • fund_overlap_lab/cli.py: command-line workflows
  • app.py: Streamlit experience

Run It

Install:

python -m venv .venv
.venv\Scripts\activate
pip install -r requirements.txt

CLI compare:

python -m fund_overlap_lab.cli compare VGL100A VAR45GA

Streamlit app:

streamlit run app.py

Final Thought

Fund Overlap Lab is intentionally practical. It is built to help answer a real question before making allocation changes:

"Am I truly diversifying, or buying the same exposure through different wrappers?"

Comments