AI has a greater chance of failure the longer it works on a task
I was browsing over this research paper by Toby Ord who talks about comparing an LLM working on a lengthy task as compared to a human working on the same lengthy task. The longer it takes to complete the task, the greater the odds of failure for the LLM, in fact the odds of failure increase exponentially with time. Naturally, larger LLMs with more computing power can go for longer, but the exponential odds of failure with respect to time trend are the same no matter what.
So for example you might be able to get the LLM to work on a task for 1 hour with a 8-billion parameter model and expect it will succeed at the task 50% of the time. Maybe you can get 2 hours out of a 16-billion parameter model where you can expect it will succeed at the task 50% of the time (I am guessing these parameter sizes). But after that “half life” the odds of the LLM succeeding taper-off to zero percent.
I haven’t read the little details yet, like how do you judge when the task is done (I presume when the LLM claims that it has finished the job), or do these tests allow multiple prompts (I presume it is just fed an input once and allowed to churn on that input until it believes it is finished). So that makes me wonder, could you solve this problem, increase the AI’s rate of success, if you combine it with classical computing methods? For example, perhaps you can ask the LLM to list the steps it would perform to complete a task, then parse the list of steps into a list of new prompts, then feed-back each of those prompts to the LLM again, each one producing another list of sub tasks — could you keep breaking-down the tasks and feeding them back into the AI to increase it’s odds of success?
It is an interesting research question. I am also interested to see how much energy and water this takes as compared to a human working on the same task, including the caloric intake of food, and perhaps the energy used to harvest, process, and deliver the food.