Please write your comments, i would like to see your views.
Here is a very reasonable and sober take on the whole Devin noise:
One of the best moments is when Theo discovers that Devin is basically a wrapper around already existing LLMs and could be copied by competitors with ease. Could be the main reason why Cognition isn’t as open as other AI companies with their code.
It’s a gimmick. There is a gold rush, and they are selling shovels.
The Theo video and the Primeagen videos amongst others do a better job of explaining this, but:
- it seems to be tuned to a very specific scenario that makes it look good
- what they’re showing isn’t particularly surprising or interesting
- it has pretty UI, looks flashy
- it seems to be very squarely aimed at executives in big companies
AI is and will more than likely continue to be far less effective than a human approaching the same task with the proper tools and knowledge.
The hype around AI is starting to resemble the one of Web3/ NFTs a few years back: CEOs are afraid to miss the bus, resulting in false promises and marketing noise to attract investors.
The biggest flaw: Real, autonomous AIs are still science fiction, all we have found so far are some impressive, talkative search engines and automated programs. That doesn’t mean managements around the world won’t fall for the hype, so we might see an impact of this and the next Devins. Just like NFTs resulted in millions of Dollars lost.
It’s worth to keep an eye on how things will develop and hope we can cut losses.
Devin correctly resolves 13.86%* of the issues end-to-end, far exceeding the previous state-of-the-art of 1.96%. Even when given the exact files to edit, the best previous models can only resolve 4.80% of issues.
Another way to view this is 86% of the time Devin is unable to help solve the problem. Sure this could be improved, but only so much because Devin, like all AI’s lacks 1 thing. Context.
A bare-bones example would be something like the following:
A user sent in feedback that created a github issue automatically. The user is asking for a feature within their video creation tool where they can clip things together easily.
Even if you are the greatest programmer in the world you can’t solve this problem. It isn’t clear what the user means by “clip”, you could also fall into the trap of assuming they want to “clip” videos together, but what if they are actually asking for something else entirely? What if the app already has this feature, but the user can’t find it, or understand it? What if there isn’t even a video creation tool!
There’s a bunch of context that is lacking that even a professional programmer might have and need to know. Devin might not be able to understand any of this and will fail at its task.
Ultimately Devin should be a useful “code monkey” that can target specific tasks that are easy to accomplish with minimal/no external context. If not, and you try to “unleash it” to perform more generic tasks, you will end up with a backlog of super low quality PRs overwhelming engineers who are already overloaded reviewing things.
Generating code is easy to do, but hard to do it right to the point its nearly impossible to do it right.
But what if they advance really well, say in 5 or 10 years. Asking cause i am a student who wants to pursue CS
I’m not sure how much you’re aware of what the state of the art is at the minute: it’s at the level of a slightly better autocomplete. That’s really good, it’s really useful! And when it works well it saves a few hours a week. That’s good, very useful.
If someone invents C-3P0 in the next 5-10 years, sure. If you’re really worried about that, don’t go into programming. I would say there a bit of Henny Penny in that fear. It’s also almost impossible to say at this point in time if what we have it the minute is the close to the best we get for this current AI hype cycle or not. Also: companies will look for any way to automate tasks. LLMs, which is what I think you’re worried about, are not particularly good at consistently undertaking mundane tasks.
This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.