Google search engine

The results cast doubt on the notion that AI invariably boosts the productivity of costly human engineers, a notion that has drawn significant investment into businesses that offer AI tools to support software development.

Using state-of-the-art artificial intelligence techniques did not speed up experienced software developers’ work when they were working in codebases they were familiar with, as is commonly believed.

The in-depth investigation was carried out early this year by the AI research foundation METR on a group of experienced coders while they worked on open-source projects they were familiar with using Cursor, a well-known AI coding helper.

The open-source developers thought AI would speed them up prior to the study, expecting a 24% reduction in work completion time. The developers thought they had reduced task times by 20% even after using AI to complete the tasks. However, according to the study, applying AI had the opposite effect, increasing task completion time by 19%.

Joel Becker and Nate Rush, the study’s principal authors, expressed their surprise at the findings, pointing out that Rush had previously stated that he had anticipated “a 2x speed up, somewhat obviously.”

The results cast doubt on the notion that AI invariably boosts the productivity of costly human engineers, a notion that has drawn significant investment into businesses that offer AI tools to support software development.

Additionally, entry-level coding jobs are anticipated to be replaced by AI. In the next one to five years, AI may eliminate half of all entry-level white-collar employment, according to Dario Amodei, CEO of Anthropic, who recently told Axios.

Significant benefits have been reported in previous research on productivity enhancements: one study indicated that utilizing AI speed up programmers by 56%, while another study found that developers could finish 26% more work in a given amount of time.

However, the recent METR study demonstrates that not all software development situations benefit from such advantages. This study specifically demonstrated that there was a slowdown among seasoned developers who were well-versed in the peculiarities and specifications of sizable, well-established open source codebases.

According to the study’s authors, other research frequently uses AI software development benchmarks, which can occasionally be inaccurate representations of real-world jobs.

Developers had to spend time reviewing and fixing the suggestions made by the AI models, which caused the lag.

“We discovered that the AIs offered some recommendations regarding their work when we watched the videos, and these recommendations were frequently directionally correct but not precisely what was required,” Becker added.

The slowdown is not expected to occur in other situations, such as for junior engineers or engineers working in codebases they are unfamiliar with, the authors said.

Nonetheless, the vast majority of the study’s participants and authors still use Cursor today. The reason, according to the authors, is that AI makes the development process simpler and, consequently, more enjoyable—think of it as editing an article rather than staring at a blank page.

According to Becker, “developers have goals other than completing the task as soon as possible.” “So, they’re choosing this less demanding path.”

Google search engine