Hacker Newsnew | past | comments | ask | show | jobs | submit | mootothemax's commentslogin

Huh, that’s interesting, I’ve been having very similar thoughts lately about what the near-ish term of this tech looks like.

My biggest worry is that the private jet class of people end up with absurdly powerful AI at their fingertips, while the rest of us are left with our BigMac McAIs.


> We had good small language models for decades. (E.g. BERT)

BERT isn’t a SLM, and the original was released in 2018.

The whole new era kicked off with Attention Is All You Need; we haven’t reached even a single decade of work on it.


> BERT isn’t a SLM

Huh? BERT is literally a language model that's small and uses attention.

And we had good language models before BERT too.

They were a royal bitch to train properly, though. Nowadays you can get the same with just 30 minutes of prompt engineering.


> > BERT isn’t a SLM Huh? BERT is literally a language model that's small and uses attention.

Astute readers will note what’s been missed here.

Fascinating, really. Your confidently-statement yet factually void comments I’d have previously put down to one of the classic programmer mindsets. Nowadays though - where do I see that kind of thing most often? Curious.


After some research, I think I understand what you're getting at here - BERT being a model for encoding text but not architecturally feasible to generate text with it, which "LLMs" (the lack of definition here is resulting in you two talking past eachother), maybe more accurately referred to as GPTs, can do.

Also the irony of your comment when it in itself was confidently stated yet void of any content was not missed either - consider dropping the superiority complex next time.


You can actually generate surprisingly coherent text with minimal finetuning of BERT, by reinterpreting it as a diffusion model: https://nathan.rs/posts/roberta-diffusion/

I don’t see a useful definition of LLM that doesn’t include BERT, especially given its historical importance. 340M parameters is only “small” in the sense that a baby whale is small.


For context, BERT is encoder-only, vs SLMs and LLMs which are decoder-only, and BERT is very much not about generating text, it’s a completely different tech and purpose behind it. I believe some multimodal variants nowadays may muddy the waters slightly, but fundamentally they’re very different things, let alone around been around for decades unless also including the history of computing in general.

While I could’ve written that better and with less attitude, gotta confess - and thx for pointing out my smugness - the AI stuff of the last few weeks really got under my skin, think I’m feeling all rather fatigued about it


BERT is one example of a language model that solved specific language tasks very well and that existed before LLM's.

We had very good language models for decades. The problem was they needed to be trained, which LLM's mostly don't. You can solve a language model problem now with just some system prompt manipulation.

(And honestly typing in system prompts by hand feels like a task that should definitely be automated. I'm waiting for "soft prompting" be become a thing so we can come full circle and just feed the LLM with an example set.)


> Astute readers will note what’s been missed here.

I’m not astute enough to see what was missed here. Could you explain?


If I'm not mistaken, BERT is a classifier (enters text, outputs labels) so it is not a "Language model", as it cannot be used for text generation.

The abstract of the original BERT paper starts with these words: "We introduce a new language representation model called BERT, [...]" The paper itself contains the phrase "language model" 24 times.

It might not be considered a language model today, but it was certainly considered one when it was originally published. Or so it would seem to me. Maybe there is a semantic shift which happened here?


Oh that is beautiful :)

For myself, it’s the feeling of: thank fuck; the grownups have arrived. shoulders lower, everyone takes a deep breath


It’s a delight even to have a regulated source of all fuel station locations in the uk!

This might be a slight missing woods/trees moment but that aside - there is precious little open geospatial data in the uk that establishes see this dot here? That’s a fuel station, that is. That dot there? Oooooh no, that there’s a pub.

The uk govts of the time managed to hand both the address data and the this-is-what-it-is data off to separate commercial enterprises in the name of privatisation, and I genuinely believe it was by accident as it’s… err… quite a niche topic of knowledge.

So anything - anything! - that brings some of that back and truly open to the public is very much welcomed.


Funnily enough I got the same type of hope from Julia, the 1984-from-Julia’s perspective tome that hints at… well, you’ll have to find out :)


I guess this is what makes marketing so tricky; I myself would’ve bought a $10/mo subscription so much sooner given the chance, which by now - and happily, incidentally - would’ve brought in way more dosh than my one-off payment.


That’s an excellent point, thanks for linking.

My takeaway from this thread is: his theory’s great until you discover that your customers are wiling pay *so* much more.

On a more positive note, I’ve been blown away by the (largely, one conspicuous troll-like annoyance aside) positive thoughts in the comments. Maybe it’s not too late?


Some are willing - many take the code they want and bounce after a month


It is true, I paid the lifetime fee for the premium tailwind offering, and they probably could have gotten double that from me with an annual subscription instead.


While I don’t disagree with you, for historical purposes I think it’s important to highlight why google started its push for 100% wire encryption everywhere all the time:

The NSA and GHCQ and basically every TLA with the ability to tap a fibre cable had figured out the gap in Google’s armour: Google’s datacenter backhaul links were unencrypted. Tap into them, and you get _everything_.

I’ve no idea whether Snowdon’s leaks were a revelation or a confirmation for google themselves; either way, it’s arguably a total breach.


When I worked at PayPal back in 2003/4, one of the things we did (and I think we were the first) was encrypt the datacenter backhaul connections. This was on top of encrypting all the traffic between machines. It added a lot of expense and overhead, but security was important enough to justify it.


And yet Venmo, a Paypal company, publishes transaction data publicly by default, no need to decrypt anything ¯\_(ツ)_/¯


Venmo publishes raw unencrypted transaction data? Or are you referring to their social network features?


where?


This is pretty much well what is so remarkable about parquet files; not only do you get seekable data, you can fetch only the columns you want too.

I believe that there are also indexing opportunities (not necessarily via eg hive partitioning) but frankly - am kinda out of my depth pn it.


Hi there - I’m really sorry about your negative experiences. I read the replies to your comment and felt sad that I didn’t read one that recognised how much work you’re putting into what sounds like an indifferent society - and how unfair that is. I also hope I’m not crossing the line of too much/trying too hard. Frankly, it sounds like a shit place to be.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: