And in this context it seems to go against The Consumer Protection from Unfair Trading Regulations 2008 and the Digital Markets, Competition and Consumers Act 2024:
I very much don't believe for a second anyone would manage to get a judgement against them on this in the UK.
For starters, the language is highly subjective, and they'd be able to show vast amounts of discourse about software engineering where "from scratch" often does not involve starting with nothing, and they'd then go on to argue that the person suing haven't actually had any reason to believe that they would be able to replicate a setup that was described as a complex large-scale experiment without much more information.
The person suing would have an uphill battle showing that whatever assumptions they made were something that was reasonable to infer based on that statement.
And to have a case, a consumer would also then need to have relied on this as a significant factor in choosing to buy their services.
But even if we assume the court would agree it is fraudulent, the remedy is only "directly consequential losses".
In other words, I doubt anyone would lose sleep over this risk.
That's very interesting, thanks! I had no idea that legionella risk was a thing for data centers. This article mentions that to avoid the risk most data centers treat the water with disinfectants which are sometimes toxic:
They're really nasty bacteria and once in a system they are hard to get rid of because then you have to heat everything to temperatures that the system normally might never reach.
That's why central heating systems that run 'low' every now and then stoke up to 60 degrees or more on the secondary circuit for tap water.
And data centers are the perfect location, endless 35 to 45 degree water. Cooling towers are the main problem for this, another is aerosols of water that has been sitting in the sun for a while, for instance in a garden hose exposed to the sun.
Last year I had to deal with a contractor who sincerely believed that a very popular library had some issue because it was erroring when parsing a chatgpt generated json... I'm still shocked, this is seriously scary
Interesting, I remembered that when trying out Stable Diffusion, once I ventured outside of the realm of anime waifus, the images ended up being so similar to existing sources, that image search could find the references.
Which is also kinda crazy since superficially there was very little similar between the 2 images, but I guess AI models used for image search converge on similar embedding than the ones used for AI generation.
Just for context, this was the original claim by Cursor's CEO on Twitter:
> We built a browser with GPT-5.2 in Cursor. It ran uninterrupted for one week.
> It's 3M+ lines of code across thousands of files. The rendering engine is from-scratch in Rust with HTML parsing, CSS cascade, layout, text shaping, paint, and a custom JS VM.
> It kind of works! It still has issues and is of course very far from Webkit/Chromium parity, but we were astonished that simple websites render quickly and largely correctly.
Has anyone tried to rewrite some popular open source project with IA? I imagine modern LLMs can be very effective at license-washing/plagiarizing dependencies, it could be an interesting new benchmark too
As the author, it's a stretch to say that JustHTML is a port of html5ever. While you're right that this was part of the initial prompt, the code is very different, which is typically not what counts as "port". Your mileage may wary.
Interesting, IIUC the transformer architecture / attention mechanism were initially designed for use in the language translation domain. Maybe after peeling back a few layers, that's still all they're really doing.
This has long been how I have explained LLMs to non-technical people: text transformation engines. To some extent, many common, tedious, activities basically constitute a transformation of text into one well known form from another (even some kinds of reasoning are this) and so LLMs are very useful. But they just transform text between well known forms.
And while it appears that lots of problems can be contorted into translation, "if all you have is a hammer, everything looks like a nail". Maybe we do hit a brick wall unless we can come up with a model that more closely aligns with actual human reasoning.
Note that it's not clear that any of the JustHTML ports were actually ports per se, as in the end they all ended up with very different implementations. Instead, it might just be that an LLM generated roughly the same library several different times.
Not me personally, but a GitHub user wrote a replacement for Go's regexp library that was "up to 3-3000x+ faster than stdlib": https://github.com/coregx/coregex ... at first I was impressed, so started testing it and reporting bugs, but as soon as I ran my own benchmarks, it all fell apart (https://github.com/coregx/coregex/issues/29). After some mostly-bot updates, that issue was closed. But someone else opened a very similar one recently (https://github.com/coregx/coregex/issues/79) -- same deal, "actually, it's slower than the stdlib in my tests". Basically AI slop with poor tests, poor benchmarks, and way oversold. How he's positioning these projects is the problematic bit, I reckon, not the use of AI.
Same user did a similar thing by creating an AWK interpreter written in Go using LLMs: https://github.com/kolkov/uawk -- as the creator of (I think?) the only AWK interpreter written in Go (https://github.com/benhoyt/goawk), I was curious. It turns out that if there's only one item in the training data (GoAWK), AI likes to copy and paste freely from the original. But again, it's poorly tested and poorly benchmarked.
I just don't see how one can get quality like this, without being realistic about code review, testing, and benchmarking.
Note that this is semantically exactly equivalent to "up to 3000x faster than stdlib" and doesn't actually claim any particular actual speedup since "up to" denotes an upper bound, not a lower bound or expected value. It’s standard misleading-but-not-technically-false marketing language to create a false impression because people tend to focus on the number and ignore the "up to".
Saying “up to” means that bound is the maximum value of the data set. It may be far from the median value, but it is included (or you’re lying). With any other interpretation the phrase has no meaning whatsoever.
I will concede, proactively, that "up to" could refer to some maximum possible bound, even if the current set doesn't include a value at that bound, though I would argue that's likely deceptive wording. For example, you could say that each carton of of eggs on a pallet contains up to 12 eggs, because that's the maximum capacity of the carton, even if none of the actual cartons on this pallet actually have 12 eggs in them.
I used one of the assistants to reverse and rewrite a browser-hosted JS game-like app to desktop Rust. It required a lot of steering but it was pretty useful.
Gaaaah, words. Yes thank you ! Coz in another thread I was mentioning both.
The above post -which I can no longer edit- compares The Black Parade / TBP (a mod for Thief I / The Dark Project / TDP) to The Dark Mod (TDM, a mod for the doom3 engine). Phew :D
As for the original question of comparing TBP to TDP: I’m personally not fond of Thief I and prefer Thief II, as it focuses on what works: stealth! Thief I is wildly creative, but also full of muddy combat with unconvincing monsters & zombies, and annoying maps / missions. So, to me, TBP (which is pleasingly weird and avoids TDP gameplay pitfalls) kinda beats its parent game TDP at its own game.
I believe in the UK the term for this is actually fraudulent misrepresentation:
https://en.wikipedia.org/wiki/Misrepresentation#English_law
And in this context it seems to go against The Consumer Protection from Unfair Trading Regulations 2008 and the Digital Markets, Competition and Consumers Act 2024:
https://www.legislation.gov.uk/uksi/2008/1277/made
https://www.legislation.gov.uk/ukpga/2024/13/section/226
reply