Rowlens: finding one exact CSV row in a 17 GB file without blowing up memory

Most CSV tooling assumes the file is still small enough to load, sort, or inspect interactively. That breaks down fast once the file is measured in gigabytes. rowlens was built for the opposite case: a CLI you can point at a very large CSV when you already know the clues you are looking for and you want the exact matching rows back in a readable format.

The design constraint was simple: never read the full file into memory. rowlens opens the CSV as a stream, reads the header once, and then processes each record one at a time with Python's standard csv reader. That keeps memory usage effectively flat even when the input file is 17 GB or more. The tool does not build an index, cache rows, or attempt an in-memory dataframe workflow. It just walks the file once and stops early if you set --max-results.

The matching model is intentionally narrow for version 1.0. Repeated --keyword arguments are treated as exact cell-value matches. Repeated --filter arguments are treated as substring checks. A row is returned only when it satisfies every supplied condition. That gives a useful two-stage search pattern in practice: use --keyword for the hard identifier, then --filter to narrow the surrounding context.

Example:

rowlens --file "huge.csv" --keyword "1213131" --filter "AAA" --output "results.txt"

If a row matches, the output is not dumped as raw CSV. Instead, rowlens renders a bordered CLI report with summary metadata at the top and then a per-match table that lists each column name beside its value. That matters more than it sounds. When you are debugging a production extract or validating a record in an enormous export, the important part is not just finding the row. The important part is understanding the row immediately.

That output style was inspired by terminal-first tooling such as csv-stream-diff: strong borders, obvious sections, and tabular structure that still works in plain text when redirected to a file. The same rendered report can be written with --output, which makes it easy to attach to a ticket, share in chat, or keep as a trace artifact from an investigation.

Under the hood, Python was a good fit here because the problem is mostly streaming IO plus deterministic row checks. The package is structured with Poetry from the start so it can be shipped cleanly to PyPI. The CLI entry point lives behind the rowlens console script, dependencies are minimal, and the test suite covers the main behaviors: combined keyword and filter matching, case-insensitive search, support for extra cells beyond the header, and output-file generation.

Version 1.0 is deliberately focused. It solves one job well: find the exact row you care about in a file too large for the usual tools, then present it cleanly. Future iterations could add column-scoped matching, JSON output, compressed input support, or richer summary stats. But the first release already hits the core operational need: streaming search for massive CSVs with output that humans can read immediately.

Comments