Data reports for practical questions
Data Reports are structured projects that sit somewhere between a blog post and a full research project. They are for questions where the answer needs more than an opinion, but where the work is still small enough to explain clearly on a static page.
The pattern is simple: use public or local data, process it with a repeatable script, save generated JSON beside the report page, and turn that JSON into summary cards, filters, and accessible tables in the browser.
The goal is not to build a dashboard maze. The goal is to make the data, assumptions, and conclusions readable.
Featured data report project
NumPy versus PyArrow with Lahman Baseball Data
A four-part benchmark comparing pandas default CSV reads, pandas read_csv with the PyArrow engine, grouped file-size results, practical many-file baseball workloads, and CSV versus Parquet repeated reads.
The project uses Lahman baseball CSV files because they provide a real public dataset with varied file sizes, row counts, column counts, and practical analysis groupings.
Built from generated JSON
Each report page can load a local data.json file from the same directory as the page. That keeps the page static, inspectable, and easy to host.
This also makes the report easier to check. The page explains the method, the JSON stores the result, and the browser handles the searchable tables.
Accessible by default
Data reports should not depend on chart-only meaning or visual-only scanning. The report pages use plain explanations, summary cards, searchable controls, result counts, captions, headings, and tables.
If a report includes a chart later, the table still needs to carry the meaning. The data should not vanish behind a graphic.
Current data reports
NumPy versus PyArrow with Lahman Baseball Data
A practical CSV and Parquet benchmark using Lahman baseball data. The project compares pandas default CSV loading, pandas PyArrow CSV loading, many-file workloads, and repeated Parquet reads.
Includes four report pages with generated JSON, summary cards, filters, sorting, and accessible tables.
How these reports are built
The reports are static pages on hluska.ca. When a report needs generated data, that data is saved as JSON beside the report and loaded in the browser. This keeps the reports simple to host and easy to inspect.
Most reports follow the same workflow: a local script prepares data.json , the report page loads that file, and the browser turns it into summary cards, filters, and tables.
This structure keeps the reporting honest. The page can explain its sources, the generated data can be inspected, and the final result can focus on what the numbers actually say.
All data report projects
- NumPy versus PyArrow with Lahman Baseball Data - a four-part benchmark comparing pandas default CSV reads, pandas PyArrow CSV reads, many-file Lahman workloads, and CSV versus Parquet repeated reads.