A short guide to using generative AI in fantasy baseball

I will be writing a more complete guide to using generative AI in fantasy baseball, but for now I wanted to make a few quick points based on things that I have noticed helping manage a league. Our league is old and very established; save for a two year period after our original manager died, the league has been running for almost thirty years.

Our players tend to be older and accessibility is something we have pushed over the last few years. Which brings up the subject of today's post. In many ways, generative AI services from companies like OpenAI, Anthropic and Microsoft have democratized fantasy league baseball. These chats are frankly more accessible to visually impaired users and they level the playing field between people like me who can write our own scripts to analyze lots of players and people who don't have those technical skills.

However, we have run into some issues over the last few years.

Generative AI will deliver overengineered solutions

This creates two major problems. First, overengineered solutions are more difficult to adapt for future uses. And the biggest one that we have run into is that since our drafts have per manager time limits, if a solution is overengineered and runs slowly, a manager can waste a lot of their clock time waiting for their script to run.

Prompting can solve a lot of these issues. For example, adding the following to prompts can help a lot:

"I am in a fantasy draft and have to complete five rounds of selections in a total of 25 minutes so it is vital that you deliver very fast solutions. Performance is more important than engineering perfection."
"I will be editing this script throughout the seasons. Please write extensive comments within the script you write so that I can follow along and make my own changes."

Just because you can, it doesn't mean that you should

This one is a very difficult subject to cover as it gets into all sorts of ethical and legal issues. There is a tremendous amount of baseball out there. Some of it, like the Lahman's Baseball Database and Retrosheet is available in CSV format and is easy to download. Other data requires a scraper to fully access at baseball scale. And here is where this all throws me for an ethical loop - a lot of this data is not very accessible so in many cases, for people who rely upon screen readers, a massive amount of baseball data is left in a format they cannot possibly learn from. So... what do you do when a condition you certainly did not ask for keeps you from fully enjoying a sport you love? Everything gets more nuanced.

All of these websites have their own terms of use and so you have to carefully read their terms and conditions before you start scraping. Generative AI services that I have tested will not get into these issues unless you specifically ask about them and link directly to their terms.

If you decide you can, make sure to include politeness in your prompts

If after reviewing their terms, you realize that you are in a position where you can ethically scrape data, you still can't just hit another site over and over. Instead, when you are building out your scraper, you have to include politeness delays in your prompt. If you include these kinds of delays, you will be more gentle on the server resources that other people pay for and if you're gentle with their resources, they are more likely to keep hosting them through the age of greedy generative AI scrapers.

Example prompts include:

"Please build a highly ethical scraper to collect (what you are looking for). Make sure it includes a mandatory politeness delay between each request plus a random amount of time to wait in addition to the mandatory politeness delay."

It's easy and more gentle on other people's resources.

Chat > Agentic (in my opinion)

This is where I am sure my background in software development messes with my point of view. But, I am not fully convinced on agentic, especially when it comes to baseball analysis. While agentic is good at generating a lot of code quickly, I find that it follows my lead a little too carefully.

With a chat interface, I can ask it to argue with me and poke holes in my thesis before I start writing code and analyzing data. A good chat, constantly prompted to be critical can help me refine my ideas, suggest different data sources that I could use in my analysis and generally strengthens the quality of my research. It's still my analysis, only it's stronger because I find that if I prompt the chat to be highly critical, it forces me to think through my heavy bias as a fan.

Conclusion

As I mentioned, I will be turning this into a more complete guide but I just wanted to publish some really basic thoughts that I have picked up over the last couple of years of helping run an active league with a variety of adults with a variety of technical skills and physical abilities.