(a) you're not extracting linked plans in the schema - for anything larger scale than a single house development, this is really common. For e.g. you might have 5-20 different planning applications on the same large(r) site for things like initial plans, planning for the entranceways from the road, planning for commercial signage, etc. etc. etc. and so many planning applications have a link. Even for single household developments this can be the same - planning permission expires after 3 years, so if there's a delay, sometimes subsequent planning applications can be made that are pretty much identical to the previous and usually these applications + approvals are boilerplate and noramlly approved unless there's been a major change to policy.
(b) You're missing a major caveating part which is that many properties have additional restrictions on them for development, and so two properties on the same street might get totally different planning outcomes as a result even for the same set of changes to the building (e.g. replacing windows would require permission on my house but wouldn't to my neighbours!). You thus should determine whether the property is listed or in a conservation area and whether an Article 4 direction applies to the property. These are almost always listed in planning permission documents but not on the forms itself. Often there's a heritage assessment or similar provided by a planning consultant too in the application. The first example in the Kaggle dataset is one of these - if you look at it, the property is in the Portland Estate conservation area and was refused because of a discrepancy in the plans.
(c) Each property has a unique property reference number (UPRN) - this pins it to a specific property and is more specific than postcode. This might be useful.
(d) Goes without saying but the reference number is only unique within a single local authority, so you need the local authority to be named in another column. The format is normally YY/counter so 25/01536 mean's it's the 1536th application in 2025. Some local authorities prefix with the name of the local authority
I've tried both against similar and haven't found it such a clear cut difference. I still find neither are able to fully implement a complex algorithm I worked on in the past correctly with the same inputs. Not sharing exactly the benchmark I'm using but think about something for improving performance of N^2 operations that are common in physics and you can probably guess the train of thought.
I've had reasonable success using GPT for both neighbor list and Barnes-Hut implementations (also quad/oct-trees more generally), both of which fit your description, haven't tried Ewald summation or PME / P3M. However, when I say "reasonable success", I don't mean "single shot this algo with a minimal prompt", only that the model can produce working and decently optimized implementations with fairly precise guidance from an experienced user (or a reference paper sometimes) much faster than I would write them by hand. I expect a good PME implementation from scratch would make for a pretty decent benchmark.
Rust is terrible for pulling in hundreds of dependencies though. Add tokio as a dependency and you'll get well over 100 packages added to your project.
Even side stepping that tokio no longer pulls multiple packages, it used to be split into multiple packages, in the same way that KDE in Rust would be hundreds of packages.
Rust projects tend to take their project and split it into many smaller packages, for ease of development, faster compiles through parallelization, ensuring proper splitting of concerns, and allowing code reuse by others. But the packages are equivalent to a single big package. The people that write it are the same. They get developed in tandem and published at the same time. You can take a look at the del tree for ripgrep, and the split of different parts of that app allows me to reuse the regex engine without dealing with APIs that only make sense in the context of a CLI app or pulling in code I won't ever use (which might be hiding an exploit too).
Counting 100 100 line long crates all by the same authors as inherently more dangerous than 1 10000 line long crate makes no sense to me.
It's worth noting that Rust packages (crates) are all single compilation units, and every compilation unit is a package. It's the equivalent of complaining that OpenSSL pulls in hundreds of `.c` files.
pin-project-lite is the only base dependency, which itself has no dependencies. If you enable the "full" feature, ie all optional doodads turned on (which you likely don't need), it's 17: bytes, cfg-if, errno, libc, mio, parking_lot+parking_lot_core+lock_api, pin-project-lite, proc_macro2+quote+syn+unicode-ident, scopeguard, signal-hook-registry, smallvec, and socket2. You let me know which ones you think are bloat that it should reimplement or bind to a C library about, and without the blatant fabrication this time.
I worked in an industry for five years and I could feasibly build a competitor product that I think would solve a lot of the problems we had before, and which it would be difficult to pivot the existing ones into. But ultimately, I could have done that before, it just brings the time to build down, and it does nothing for the difficult part which is convincing customers to take a chance on you, sales and marketing, etc. - it takes a certain type of person to go and start a business.
Nobody’s talking about starting businesses. The article is specifically about pypi packages, which don’t require any sales and marketing. And there’s still no noticeable
uptick in package creation or updates.
In my PhD more than a decade ago, I ended up using png image file sizes to classify different output states from simulations of a system under different conditions. Because of the compressions, homogenous states led to much smaller file size than the heterogenous states. It was super super reliable.
I'm not sure why this is against 'frameworks' per se; if we were sure that the code LLMs could generate was the best possible, we might as well use Assembly, no, since that'd lead to best performance? But we don't generally, we still need to validate, verify and read it. And in, that, there is still some value in using a framework since the code generated is likely, on the whole, to be shorter and simpler than that not using a framework. On top of that, because it's simpler, I've at least found that there's less scope for LLMs to go off and do something strange.
To a degree but most enterprise focused software usually has differential pricing. Often that pricing isn't public so different companies get different quotes.
reply