Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
|
from
login
Task-Completion Time Horizons of Frontier AI Models – METR
(
metr.org
)
2 points
by
rootforce
10 hours ago
|
past
|
discuss
METR releases Time Horizon 1.1 with 34% more tasks
(
metr.org
)
1 point
by
mustaphah
11 days ago
|
past
|
discuss
AI Doubling Time Horizon v1.1
(
metr.org
)
1 point
by
chriskanan
12 days ago
|
past
|
discuss
METR Clarifying limitations of time horizon
(
metr.org
)
1 point
by
alphabetatango
12 days ago
|
past
|
discuss
METR AI Benchmark: Clarifying Limitations of Time Horizon
(
metr.org
)
2 points
by
mustaphah
19 days ago
|
past
Measuring AI Ability to Complete Long Tasks
(
metr.org
)
247 points
by
spicypete
52 days ago
|
past
|
193 comments
METR review of OpenAI's GPT-OSS fine-tuning safety methodology
(
metr.org
)
1 point
by
mustaphah
3 months ago
|
past
Measuring AI Ability to Complete Long Tasks
(
metr.org
)
2 points
by
Gedxx
4 months ago
|
past
Measuring AI Ability to Complete Long Tasks (2x every 7 months)
(
metr.org
)
3 points
by
tmoertel
4 months ago
|
past
The Impact of Early-2025 AI on Open-Source Developer Productivity
(
metr.org
)
3 points
by
jvdvegt
5 months ago
|
past
|
1 comment
Measuring AI Ability to Complete Long Tasks – METR
(
metr.org
)
2 points
by
diginova
6 months ago
|
past
Measuring the Impact of AI on Experienced OSS Developer Productivity [pdf]
(
metr.org
)
3 points
by
nreece
7 months ago
|
past
|
1 comment
Measuring Impact of 2025 AI on Experienced Open-Source Developer Productivity [pdf]
(
metr.org
)
1 point
by
sonabinu
7 months ago
|
past
Measuring the Impact of Early-2025 AI on Experienced OpenSource Dev Productivity [pdf]
(
metr.org
)
2 points
by
davikr
7 months ago
|
past
Measuring the Impact of AI on Experienced Open-Source Developer Productivity [pdf]
(
metr.org
)
18 points
by
ColinEberhardt
7 months ago
|
past
|
2 comments
Measuring the impact of AI on experienced open-source developer productivity
(
metr.org
)
775 points
by
dheerajvs
7 months ago
|
past
|
485 comments
Recent Frontier Models Are Reward Hacking
(
metr.org
)
2 points
by
surprisetalk
8 months ago
|
past
AI's Version of Moore's Law
(
metr.org
)
2 points
by
aazo11
9 months ago
|
past
|
1 comment
Measuring AI Ability to Complete Long Tasks
(
metr.org
)
2 points
by
pabo
10 months ago
|
past
Measuring AI Ability to Complete Long Tasks – METR
(
metr.org
)
7 points
by
gk1
10 months ago
|
past
|
1 comment
Measuring AI Ability to Complete Long Tasks
(
metr.org
)
4 points
by
stared
10 months ago
|
past
Measuring Automated Kernel Engineering
(
metr.org
)
1 point
by
gsky
11 months ago
|
past
Evaluating frontier AI R&D capabilities of LLM agents against human experts
(
metr.org
)
1 point
by
tedsanders
on Nov 22, 2024
|
past
When LLM agents can do a task, they can often do so at a fraction of human cost
(
metr.org
)
4 points
by
cpainter
on Aug 6, 2024
|
past
METR: Model Evaluation and Threat Research
(
metr.org
)
2 points
by
Olshansky
on July 8, 2024
|
past
Bounty: Diverse hard tasks for LLM agents
(
metr.org
)
3 points
by
RoboTeddy
on Jan 20, 2024
|
past
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: