Does AI Help or Homogenize? Thoughts on Knowledge Production in Open-Source
Published:
A running thread of ideas behind my paper on AI assistance tools and open-source communities — what the data is starting to tell us, and what still puzzles me.
(With Abhishek Nagaraj and Eunae Yoo)
There is a simple, optimistic story about AI coding assistants: they help people write better code, faster. Tools like GitHub Copilot suggest completions, catch errors, and let developers focus on the hard parts rather than the boilerplate. Productivity goes up. More things get built.
But there is a second story, a little harder to see, that I keep coming back to. What if these tools don’t just help — what if they also homogenize?
The Productivity Story
The productivity gains are real and measurable. We see them in our data on open-source contributors. Developers who adopt AI assistance tools close issues faster, write more lines per day, and contribute to more projects. The productivity effect is especially pronounced among mid-skill contributors — people who know enough to ask the right question but used to get stuck on implementation details.
The Homogenization Worry
Here is where it gets more interesting. AI assistants are trained on existing code. They suggest solutions that look like the modal solution in their training data. That is fine, even good, when the modal solution is the right one.
But in research — in science, in novel software — the point is often to not do the modal thing. Progress comes from people who approach a problem from an unusual angle, who bring a background no one else has, who make a connection that the training data does not contain.
If AI tools nudge everyone toward similar approaches, the aggregate effect on knowledge production could be negative even if individual productivity is higher. More output, less variance — and in innovation, variance is often where the value lives.
What I’m Trying to Find Out
The paper tries to get traction on this with a difference-in-differences design exploiting variation in the timing and intensity of AI tool adoption across open-source projects. Early results suggest both effects are present: productivity up, diversity of approaches down. Disentangling these — and figuring out which dominates in different contexts — is the hard part.
More soon.
