1. Introduction
The software development landscape is rapidly evolving, leading to new tools designed for the scale and speed of future AI-built systems. One of the biggest challenges and opportunities is software evolution at scale.
Imagine a world where massive software systems, not just individual components, are seamlessly updated and improved without weeks or months of manual effort. Big tech companies like Meta have been doing this for years by employing dedicated teams to build and use codemods—automated code transformation bots—for large-scale software changes.
Recently, progressive software companies like Netlify have leveraged codemods to automate tasks such as adding type safety, removing feature flags, and upgrading frameworks like React Router. At large enterprises like T. Rowe Price, developers have saved weeks of engineering time by automating migrations, such as their MSW v1 to v2 transition. Since the introduction of React version 19RC, thousands of early adopters have used codemods to update their codebases automatically. These examples underscore the vital role of codemods in efficient codebase evolution.
At Codemod, we're on a mission to democratize access to this powerful technology. Our goal is to make code migrations of any size easier with advanced AI and compiler technologies.
Today, we're thrilled to introduce Codemod2.0—a new category that complements traditional codemods. Codemod2.0 enables more intelligent transformations and unlocks new possibilities for codebase migrations.
2. The Limitations of Traditional Codemods
Traditional codemods are scripts that identify patterns in code and transform them. They operate on the abstract syntax tree (AST) of the code, making them reliable but rigid. Because they are rule-based, they lack the flexibility and intuition of human intelligence. For example, traditional codemods struggle to grasp the context of the codebase they modify, such as code style, inline comments, business logic, and the semantics of different code elements.
To overcome these limitations, companies often need to hire experts in codemods, invest significant time in developing highly sophisticated codemods, and create additional tools to compensate for the shortcomings of deterministic engines.
Is there a better solution for achieving intelligent and sophisticated code transformation?
3. The Rise of Foundational Models
Large Language Models (LLMs) and other foundational models optimized for coding are becoming increasingly proficient at generating code. While code transformation differs from code generation, the transformation problem can often be reframed as a generation problem. Given a code block and its context, an LLM can regenerate a modified version of that code block.
The advantage? LLMs excel where traditional rule-based transformations fall short. They can understand the semantics, in-line comments, coding style, and leverage the vast amount of publicly available data they are trained on.
However, while LLMs are great at generating code, they are not designed for detecting patterns at scale. Also, once experts curate effective prompts, sharing them easily with colleagues or the community remains a challenge.
4. Introducing Codemod2.0
Codemod2.0 is a new type of codemod that combines the strengths of deterministic engines for detection and LLMs for transformation, using the right technology for each task. This seamless integration is managed by Codemod’s open-source workflow engine, a modular TypeScript framework designed to handle any code migration tasks at various levels, from entire repositories to individual code blocks and more.
By sitting between deterministic engines and pure foundational models, Codemod2.0 is easier to build and more reliable & scalable than using LLMs alone. This hybrid approach opens up new possibilities for code transformation that were previously not feasible.
5. How Codemod2.0 Works
Let’s take a look at real example of Codemod2.0 to better understand how it works.
Imagine we are using Axios library for our HTTP requests and we want to migrate to Fetch. This requires detection of all Axios usages and transforming them based on the below table:
To build a Codemod2.0, we start by developing a deterministic codemod using ast-grep as our engine. Here are some common Axios patterns that need detection:
import axios from 'axios'const response1 = await axios.get(url, options)const response2 = await axios.post(url, options).then(...)const response3 = await axios(options).then(...).catch(...)
Here are ast-grep patterns that reliably and quickly detect the above patterns, even in very large codebases.
const axiosPatterns = [{ pattern: "axios($$$_)" }, // axios(){ pattern: "axios.$_($$$)" }, // axios.get(...){ pattern: "axios.$_($$$).$_($$$)" }, // axios.get(...).then(...){ pattern: "axios.$_($$$).$_($$$).$_($$$)" }, // axios.get(...).then(...).catch(...){ pattern: "axios.$_($$$).$_($$$).$_($$$).$_($$$)" }, // axios.get(...).then(...).catch(...).finally(...)];
Once specific Axios patterns are detected, we need to transform them. Below is a description of the transformation logic. As you can see from the variety of detected patterns and the complexity of the transformation logic, building this with a deterministic engine is no easy task.
const prompt = `You are migrating from axios to fetch.Ignore axios.create() and axios.all() as fetch doesn’t have these APIs.Here is a general pattern to replace axios with fetch:1. Replace axios.anyFunction(url) with fetch(url, options) and await it.2. if response.ok is false, throw an error.3. Get the response by calling response.json() or response.text() or response.blob() or response.arrayBuffer().4. To be compatible with axios, you need to need to set result variable to { data: await response.json() }.5. Infer the type of result variable from context and apply type to resulve variable { data: await response.json() as SomeType }.Use AbortSignal to replace axios timeout option.For example,axios.get(url, { timeout: 5000 })can be replaced withfetch(url, { signal: AbortSignal.timeout(5000) })`;
To see the complete source of this codemod, check out this link in Codemod’s GitHub repo. You can learn more about it in the Codemod Registry.
Now that the codemod is ready, it can be published to the Codemod Registry for immediate use. Codemod2.0 can work with any LLM, including locally deployed open-source models, though this specific codemod currently supports only OpenAI models. Users need to provide the OPENAI_API_KEY argument to run the codemod easily via CLI.
npx codemod axios-to-fetch --OPENAI_API_KEY=XXX
6. Vision for the Future
While Codemod2.0 has its own strengths and weaknesses, which we will discuss in more detail in a future blog post, we are continuously working to enhance our AI systems. Our efforts focus on several key areas:
- Auto-generating deterministic codemods to detect patterns using Codemod AI.
- Recursively improving human language descriptions for transformation logic with Codemod’s iterative AI system, leveraging tests and compiler checks.
By integrating AI, compiler technologies, and specialized infrastructure, we automatically capture knowledge about the evolution of individual system components in the form of codemods. This knowledge will be proactively distributed across the ecosystem, enabling the entire system to evolve autonomously.
7. Conclusion
Codemod 2.0 offers a balanced solution between scalable deterministic engines and intelligent transformations with foundational models. At Codemod, we are committed to helping developers transform their codebase with the best tools and practices available.