Performance Testing - NumPy versus PyArrow, One Year Later
A one-year follow-up to my NumPy versus PyArrow performance testing, using the Lahman baseball database to compare pandas default CSV reads with pandas read_csv using the PyArrow engine.
Performance Testing - NumPy versus PyArrow
This post, the first in a series on Fantasy Baseball, is going to start at two points. First, it's going to implement a scraper that will collect stats for an entire year of major league baseball. And then it's going to run some performance tests to see whether NumPy or PyArrow is faster at reading the CSV files generated. PyArrow is faster all the time...but it particularly shines when data sets get larger.