Building sqlcsv-exporter: a Python CLI to Stream SQL Server Data into CSV
I recently built a small Python package called
sqlcsv-exporter to solve a very practical problem: export the results of a SQL Server query into a CSV file, from the command line, without dragging an entire dataset into memory.The tool is packaged as a Poetry project, designed to be publishable to PyPI, and focused on a workflow that is common in reporting and data operations teams:
- store the SQL in a
.sqlfile - pass connection details at runtime
- stream rows in chunks
- write the CSV incrementally
- show visible progress while the job runs
This post walks through why I built it, how it works, and what I learned while testing it against a local SQL Server instance.
The Problem
The starting point was familiar: a one-off Python script that connected to SQL Server, ran a query, and wrote a CSV. That works for quick experiments, but it gets messy fast.
Typical issues show up immediately:
- query text is embedded inside Python instead of living in a
.sqlfile - authentication logic is mixed into export logic
- the script is hard to reuse across environments
- large result sets become memory-heavy
- the UX is poor when a query runs for several seconds or several minutes
The goal was to replace that with a proper CLI package that feels like a real tool rather than a disposable script.
The Goal
I wanted a command like this:
sqlcsv-exporter `
--sql .\queries\report.sql `
--output .\exports\report.csv `
--server "DESKTOP-TTUSQLJ\SQLEXPRESS" `
--database QuantDevTest
The requirements were straightforward:
- read SQL from file
- connect to SQL Server with trusted auth or SQL auth
- stream the result set with
fetchmany() - write CSV rows incrementally
- avoid pandas for the write path
- provide useful progress output
- keep the package testable and publishable
The Package Structure
I split the project into small modules so each concern is isolated:
cli.py: argument parsing and process exit behaviorconfig.py: config model and validationconnection.py: ODBC connection string and connection creationsql_rewriter.py: SQL file loading and optional variable rewritingexporter.py: query execution, chunked fetch, CSV writing, and progress rendering
That separation matters because it makes the code easier to test. The CSV writer can be tested without a database. The SQL rewriting logic can be tested with simple strings. The export workflow can be tested with mocked connections and cursors.
Why Streaming Matters
The core design choice was to stream results instead of loading everything into memory.
The exporter:
- opens the SQL file
- optionally rewrites a declared date variable such as
@InAsOfDate - executes the query with
pyodbc - fetches rows in chunks
- writes each chunk directly to a CSV file
That keeps the memory profile stable even when the result set gets large.
Instead of doing something like this:
rows = cursor.fetchall()
writer.writerows(rows)
the tool does this conceptually:
while True:
rows = cursor.fetchmany(chunk_size)
if not rows:
break
writer.writerows(rows)
That small change is the difference between a toy exporter and something you can trust on real datasets.
A Better CLI Experience
Command-line tools that do database work have one recurring UX problem: silence. If a query takes ten seconds, users start wondering whether the command is stuck, blocked, or dead.
So the exporter prints:
- a short run summary before execution
- detected column count after the query starts returning metadata
- a live progress display while rows are written
- a final completion summary with rows, columns, file size, and elapsed time
I used rich for the terminal output instead of hand-rolled print() calls. It gives the tool a more deliberate interface without turning it into a full TUI.
Handling SQL as a File
One design decision I liked immediately was keeping SQL in a separate file. That brings a few benefits:
- the query is easier to inspect and review
- analysts or DBAs can edit SQL without touching Python code
- the same CLI can be reused for many exports
- long or multi-step SQL scripts remain readable
For example, this simple test query lives in its own file:
SELECT TOP (1000)
[Id],
[Stock],
[LowPrice],
[MaxPrice],
[AvgPrice]
FROM [QuantDevTest].[dbo].[StockMetrics];
That file can be passed directly to the CLI without changing the Python package at all.
Optional SQL Rewriting
Some teams keep parameter values in SQL scripts using a declared variable like:
DECLARE @InAsOfDate DATE = '2026-03-01';
To support that style, the tool includes a lightweight SQL rewriting step. If the variable exists, the CLI can replace its value based on --date. If the variable does not exist, the SQL runs unchanged.
This keeps the interface flexible:
- SQL files can remain mostly static
- operators can adjust date-driven exports without manually editing the file
It is intentionally narrow in scope. It is not trying to become a generic SQL templating engine.
A Real-World Bug Caught by Live Testing
The first end-to-end run against my local SQL Server instance did not succeed.
The CLI connected successfully, but then failed with:
AttributeError: 'pyodbc.Cursor' object has no attribute 'timeout'
That was useful. The mocked tests passed, but the live run exposed a compatibility assumption in the code. Some pyodbc environments expose timeout behavior differently.
The fix was simple:
- try setting the timeout on the cursor if that attribute exists
- otherwise fall back to the connection object
That is exactly why live testing matters. A tool can look clean in unit tests and still fail in a real environment for reasons that only surface with the actual driver and actual database connection.
Stress Testing with a Slower Query
A fast export is not a great demo for progress reporting, so I added a dedicated slow test query.
It:
- copies base rows into a temp table
- creates two multiplier temp tables
- cross joins them to expand the result set
- adds a short
WAITFOR DELAY
That makes it easy to simulate a longer-running export without needing a large production-sized table.
Conceptually, the query looks like this:
SELECT TOP (1000) ... INTO #BaseStockMetrics ...
SELECT TOP (200) ... INTO #MultiplierA ...
SELECT TOP (200) ... INTO #MultiplierB ...
WAITFOR DELAY '00:00:03';
SELECT ...
FROM #BaseStockMetrics AS b
CROSS JOIN #MultiplierA AS a
CROSS JOIN #MultiplierB AS c;
With a small base table, this is enough to create a large CSV and make the progress output visible. With a large base table, it can scale very aggressively, so it is the kind of test query you use carefully.
Testing Strategy
The package includes pytest coverage around the parts that are most likely to break:
- config validation
- date resolution
- SQL file loading
- SQL variable rewriting
- connection string generation
- CSV writing
- mocked end-to-end export behavior
That balance is important. Not every code path needs a deep integration test, but the components that define correctness should have coverage.
The live SQL Server run complements those tests by exercising:
- the actual ODBC driver
- the actual SQL Server instance
- actual filesystem writes
- real CLI behavior
Packaging for PyPI
The project is set up as a Poetry package and versioned as 1.0.1.
Build artifacts are created with:
poetry build
That produces:
- a source distribution
- a wheel
Those are the exact artifacts needed for publishing to PyPI.
Example Usage
Trusted connection:
poetry run sqlcsv-exporter `
--sql .\queries\stock_metrics_top_1000.sql `
--output .\exports\stock_metrics_top_1000.csv `
--server "DESKTOP-TTUSQLJ\SQLEXPRESS" `
--database QuantDevTest
SQL authentication:
poetry run sqlcsv-exporter `
--sql .\queries\stock_metrics_top_1000.sql `
--output .\exports\stock_metrics_top_1000.csv `
--server "my-server" `
--database "Reporting" `
--sql-auth `
--username "report_user" `
--password "secret"
Final Thoughts
This is not a huge project, but it is a useful example of turning a script into a tool.
The interesting part was not the CSV writing itself. It was the engineering around it:
- separating concerns into modules
- choosing streaming over eager loading
- designing a usable CLI
- writing tests around the core behavior
- validating the tool against a real SQL Server environment
That is usually the difference between “something that works on my machine” and “something I would actually hand to another team.”
If I extend it further, the next areas I would look at are:
- environment-variable support for connection defaults
- optional gzip output
- row count estimates before execution when possible
- better handling for very wide result sets
- structured logging for scheduled runs
For now, sqlcsv-exporter does the job it was meant to do: run a SQL Server query from a file and stream the results into CSV in a way that is practical, testable, and ready to package.

Comments
Post a Comment