Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
andy99
21 days ago
|
parent
|
context
|
favorite
| on:
Olmo 3: Charting a path through the model flow to ...
It says it’s common crawl, I interpret it to mean this is a generic web scrape dataset, presumably they filter stuff out they don’t want before pretraining. You’d have to do do some ablation testing to know what value it adds
ccgreg
18 days ago
[–]
Common Crawl is a particular dataset. commoncrawl.org
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: