What if scientific research no longer required teams of experts, years of work, or even human supervision?
That question just became very real.
In a quiet GitHub release, Andrej Karpathy — former AI lead at Tesla and co-founder of OpenAI — published a minimalist project that could fundamentally change how research is done. Just 630 lines of code, running on a single GPU, with zero human input once launched.
And then… he went to sleep.
An AI That Experiments All Night Long
Three days ago I left autoresearch tuning nanochat for ~2 days on depth=12 model. It found ~20 changes that improved the validation loss. I tested these changes yesterday and all of them were additive and transferred to larger (depth=24) models. Stacking up all of these changes,… pic.twitter.com/j34dSt4oht
— Andrej Karpathy (@karpathy) March 9, 2026
The project, called AutoResearch, operates with a brutally simple loop:
- Read the code
- Form a hypothesis
- Modify the training process
- Run an experiment
- Measure the result
- Keep it — or discard it
No supervision. No intervention. Just iteration.
Each experiment runs on a fixed budget of five minutes. That’s enough for the agent to test an idea, evaluate it using a single metric, and decide whether it improves performance.
Then it moves on.
Over the course of a single night, the system can run 100+ experiments.
700 Experiments Later… It Found What Humans Missed
Karpathy let the system run for two days.
The result? Around 700 code modifications tested autonomously — with about 20 meaningful improvements identified.
Stacked together, those changes reduced the time needed to reach GPT-2 level performance by 11%.
But the most striking part wasn’t the speed.
It was the discovery.
The AI identified a missing parameter in the normalization process — a subtle issue that had gone unnoticed, even by experienced engineers.
It fixed it on its own.
A Smaller Model… That Outperformed a Bigger One
The experiment didn’t stop there.
Shopify CEO Tobi Lütke adapted the system internally. After one night of autonomous experimentation:
- A model with 800 million parameters
- Outperformed a manually tuned 1.6 billion parameter model
- With a 19% performance gain
In other words: half the size, better results.
The AI didn’t just optimize — it simplified.
When Constraints Create Smarter AI
In another experiment, a distributed network of 35 autonomous agents ran hundreds of experiments overnight — some even on modest hardware like laptops without GPUs.
Most attempts failed.
But the few that succeeded revealed something unexpected:
The model improved by becoming simpler.
This mirrors a principle seen in nature — where limited environments often produce the most efficient adaptations.
Except here, evolution doesn’t take millions of years.
It takes a night.
From Tool to Research Partner
AutoResearch isn’t a general AI system. It’s not trying to think, reason, or understand the world.
It’s something more focused — and possibly more dangerous.
A closed loop.
A measurable objective.
A system that never gets tired, never loses motivation, and doesn’t stop after its 300th failure.
It just keeps going.
And sometimes, it finds things we don’t.
A Glimpse of What Comes Next
Karpathy has already hinted at what this could become:
Massively distributed research systems. Networks of agents running experiments continuously. Collaborative AI-driven discovery at global scale.
No large teams. No slow cycles.
Just machines iterating, testing, improving — faster than any human system ever could.
This isn’t artificial general intelligence.
But it may be something just as disruptive:
Automated scientific progress.
The Real Shift Is Already Happening
Not long ago, machine learning research required teams of PhDs, months of work, and significant resources.
Today, it can start with a few hundred lines of code.
Run overnight.
And improve itself while you sleep.
The question is no longer whether AI will assist research.
It’s how long before it starts leading it.
FAQ
What is AutoResearch?
AutoResearch is a lightweight AI system that autonomously runs experiments, modifies code, and improves model performance without human intervention.
How does it work?
It follows a loop: generate hypothesis, modify code, run experiment, evaluate results, and iterate continuously.
Why is this important?
Because it shows that AI can independently discover optimizations and improve systems — potentially transforming how scientific research is conducted.
Does it replace human researchers?
Not directly — but it dramatically accelerates experimentation and can uncover insights humans might miss.









Leave a Reply