Submissions from metr.org

		Task-Completion Time Horizons of Frontier AI Models – METR (metr.org)
		2 points by rootforce 10 hours ago \| past \| discuss
		METR releases Time Horizon 1.1 with 34% more tasks (metr.org)
		1 point by mustaphah 11 days ago \| past \| discuss
		AI Doubling Time Horizon v1.1 (metr.org)
		1 point by chriskanan 12 days ago \| past \| discuss
		METR Clarifying limitations of time horizon (metr.org)
		1 point by alphabetatango 12 days ago \| past \| discuss
		METR AI Benchmark: Clarifying Limitations of Time Horizon (metr.org)
		2 points by mustaphah 19 days ago \| past
		Measuring AI Ability to Complete Long Tasks (metr.org)
		247 points by spicypete 52 days ago \| past \| 193 comments
		METR review of OpenAI's GPT-OSS fine-tuning safety methodology (metr.org)
		1 point by mustaphah 3 months ago \| past
		Measuring AI Ability to Complete Long Tasks (metr.org)
		2 points by Gedxx 4 months ago \| past
		Measuring AI Ability to Complete Long Tasks (2x every 7 months) (metr.org)
		3 points by tmoertel 4 months ago \| past
		The Impact of Early-2025 AI on Open-Source Developer Productivity (metr.org)
		3 points by jvdvegt 5 months ago \| past \| 1 comment
		Measuring AI Ability to Complete Long Tasks – METR (metr.org)
		2 points by diginova 6 months ago \| past
		Measuring the Impact of AI on Experienced OSS Developer Productivity [pdf] (metr.org)
		3 points by nreece 7 months ago \| past \| 1 comment
		Measuring Impact of 2025 AI on Experienced Open-Source Developer Productivity [pdf] (metr.org)
		1 point by sonabinu 7 months ago \| past
		Measuring the Impact of Early-2025 AI on Experienced OpenSource Dev Productivity [pdf] (metr.org)
		2 points by davikr 7 months ago \| past
		Measuring the Impact of AI on Experienced Open-Source Developer Productivity [pdf] (metr.org)
		18 points by ColinEberhardt 7 months ago \| past \| 2 comments
		Measuring the impact of AI on experienced open-source developer productivity (metr.org)
		775 points by dheerajvs 7 months ago \| past \| 485 comments
		Recent Frontier Models Are Reward Hacking (metr.org)
		2 points by surprisetalk 8 months ago \| past
		AI's Version of Moore's Law (metr.org)
		2 points by aazo11 9 months ago \| past \| 1 comment
		Measuring AI Ability to Complete Long Tasks (metr.org)
		2 points by pabo 10 months ago \| past
		Measuring AI Ability to Complete Long Tasks – METR (metr.org)
		7 points by gk1 10 months ago \| past \| 1 comment
		Measuring AI Ability to Complete Long Tasks (metr.org)
		4 points by stared 10 months ago \| past
		Measuring Automated Kernel Engineering (metr.org)
		1 point by gsky 11 months ago \| past
		Evaluating frontier AI R&D capabilities of LLM agents against human experts (metr.org)
		1 point by tedsanders on Nov 22, 2024 \| past
		When LLM agents can do a task, they can often do so at a fraction of human cost (metr.org)
		4 points by cpainter on Aug 6, 2024 \| past
		METR: Model Evaluation and Threat Research (metr.org)
		2 points by Olshansky on July 8, 2024 \| past
		Bounty: Diverse hard tasks for LLM agents (metr.org)
		3 points by RoboTeddy on Jan 20, 2024 \| past