Performance Testing - NumPy versus PyArrow, One Year Later
A one-year follow-up to my NumPy versus PyArrow performance testing, using the Lahman baseball database to compare pandas default CSV reads with pandas read_csv using the PyArrow engine.
Baseball Reports - MLB Attendance and the 1994 Strike
A quick blog post introducing a new baseball report on MLB attendance before and after the 1994-1995 strike. The post explains why attendance per opening is a useful way to compare strike-era attendance, why 1995 shows the clearest damage, and why the 1998 recovery needs expansion context.
Baseball Reports - Simulating MLB’s salary cap and floor proposal
I launched Baseball Reports with a salary cap and floor simulation because baseball data should be readable, accessible, and useful. The first report tests MLB’s proposed cap and floor against current standings and payroll estimates.
Performance Testing - NumPy versus PyArrow
This post, the first in a series on Fantasy Baseball, is going to start at two points. First, it's going to implement a scraper that will collect stats for an entire year of major league baseball. And then it's going to run some performance tests to see whether NumPy or PyArrow is faster at reading the CSV files generated. PyArrow is faster all the time...but it particularly shines when data sets get larger.