Yeah i'm not fully opposed to the self host option, I self host quite a few services. However in this case i'm a bit nervous being in charge of families photos and their backups. This way, even if they don't backup. Ente has a pretty robust duplication strategy.
Many times these stores have floor cleaning machines - either robotic or driven by a human. An employee could zip-tie their sensor to it, let it do its cleaning trip around the store, and return to collect the data later.
This would allow an employee to do several stores in a town in a single day. And potentially less chance of a workers-compensation claim being filed if they fall down while walking around looking at their device.
It's been said in one of the comments that the initial mapping by a human takes like 2-3 hours. Knowing the speed of a Roomba, I guess that it would take much more time to do it. And 'humans' are COTS devices available on any department store (sorry ;P).
Currently working on https://exitfox.com/. After recently switching jobs, I became frustrated with the handover process to the new joiner. This experience inspired me to start building a tool that uses Gemini/Claude to ask relevant questions about the project being handed over, streamlines the exit process in a structured format, Now I am adding more features for HR managers and employees around Clearance, FnF etc
Interestingly I found Claude Code to be the only LLM good at designing frontend, asking it to make it look better actually helps
I am sure i have heard strong opinions like this but in practice joins are never cheap, tables with billions of rows crossed with millions of rows just to find a single row with data is not something i would call cheap, more often than not it is better to avoid joining large tables if you can live with duplicate data, also one trick i have found better is just to archive data in your tables which are not accessed frequently and reduce the size of tables keeping only the data which is needed
DB guy with 25+ years experience. Summary: it depends.
> joins are never cheap
it depends. On table size, indexes/table size vs how expensive the alternative is. Always!
> tables with billions of rows crossed with millions of rows just to find a single row with data is not something i would call cheap
indexes
> more often than not it is better to avoid joining large tables if you can live with duplicate data
1E9 x 1E6 = 1E15 (at worst anyway). A join via an index will save you colossal amounts of IO (though as ever, it depends).
Problem here isn't this mostly clueless advice (discarding/archiving unnecessary data is the only good idea here, and it's not used as often as it should be). Problem is strong opinions put forth by someone who doesn't have the necessary experience, or understanding of what's going on under the hood. Denormalising is a useful tool that IME rarely gains you more than it loses you, but this 'advice' is just going to lead people down the wrong alley, and I'm tired of suchlike n00b advice strongly (and incorrectly and arrogantly) expressed on HN.
There's also the possibility of filtering each source table first, then doing an inner join. Which can VASTLY cut down on computation. I assume GP assumed doing an outer join first, then filtering.
But those are details for the database engine to handle. And, as you said, indexes
FYI for others, such filtering is called predicate pushdown (I believe also called predicate hoisting sometimes). Example (and this is trivial but for illustration)
select * from (select * from tbl) as subqry where subqry.col = 25
would be rewritten by any halfway decent optimiser to
select * from (select * from tbl where tbl.col = 25)
(and FTR the outermost select * would be stripped off as well).
Good DB optimisers do a whole load of that and much more.
Yeah, had to get quite well acquainted with query execution plans and the like a few years ago (And forgot most of it by now) because of diagnosing a SLOW query.
Joining onto either table a or table b is something that REALLY trips optimizers up.
I thought I was being informative. I can't give hard&fast rules because (drumroll)... it depends. So I have tradeoffs to consider, and indexes got mentioned.
How else could I have posted better? Honest question.
Because you didn’t actually refute anything the GP said, and gave bad advice, all while being incredibly negative and arrogant.
> this mostly clueless advice
> strong opinions put forth by someone who doesn't have the necessary experience, or understanding of what's going on under the hood
> I'm tired of suchlike n00b advice strongly (and incorrectly and arrogantly) expressed on HN
You continue to just say it depends without giving any actual scenarios. You make it sound like magic, but it’s not: “under x and y, do z except when u” is better than “it depends, I’m sick of all these noobs”.
Also, your main points are against denormalization and avoiding large table joins which are 100% rational arguments under certain workloads.
I refuted what he said by pointing out that 1E9 x 1E6 = 1E15. A billion row table denormalised with a million row table = 1000 trillion row table. How big's your disk array? How are you going to ensure correctness on update?
His was stupid advice and had it should not have been given.
> You continue to just say it depends without giving any actual scenarios
it depends. Use your common sense and then use a stopwatch, is a good start. There are entire shelves of books on this, I won't repeat them.
> You make it sound like magic, but it’s not:
absolutely true!
> “under x and y, do z except when u” is better than
it's a multidimensional problems inc. memory size, disk size, the optimiser, sizes of particular tables joined, where the hotspot is, cost of updates of non-normalised tables, etc. I can't give general advice from here.
> Also, your main points are against denormalization and avoiding large table joins which are 100% rational arguments under certain workloads.
I said "Denormalising is a useful tool that IME rarely gains you more than it loses you,"
True, you normalise/denormalise data not tables as such; tables pop out of a normalisation process and denormalisation collapses them together. Perhaps if I'm still wrong you could put me right. And don't just point at the wiki article on it, please be specific.
To your question, probably longer than you but I've always more to learn.
If you have billions of rows you need to search through for a single row, it will be costly unless you have the appropriate indexes. Storing data in JSON fields or similar is only going to make it slower.
If you are only selecting a single row, a join will be instantaneous. If you are joining many rows on either side it has a cost, but so has denormalized data since you just need to scan through that many more rows. Sure in certain specific cases denormalization can be a valid optimization, but in the general case it will just make queries slower.
Saying joins are cheap or expensive only makes sense when comparing to the alternative.
> If you are only selecting a single row, a join will be instantaneous.
Exactly. It's not like the join materializes explicitly in memory.
This has all been known for decades since the beginning of relational databases. That's why taking a DB class is valuable, or at least reading a good DB fundamentals book.
But to steel man this a little. Joins can be cheap if you understand how they work and what the patterns of use are. I've worked on systems where that billions of rows to millions scenario is a thing. I recall working on a particularly nasty legacy query underpinning a report that took most of a day to finish that I managed to get down to tens of seconds just by tweaking a bunch of things to join right.
Joins are cheap when joining on indexed columns, but the trade off is extra time maintaining those indexes when writing. As always, it depends on the use case.
Depends a lot on the size of the tables, but that is true for anything to do with databases. It's probably hard to give one-size-fits-all advice about database management, since you're ultimately balancing conflicting interests (query performance, maintainability, disk space).
With a sufficiently small database, even non-indexed joins may appear fast.
If your database has billions of rows or more, then even indexed joins will need to be used judiciously, especially if you have a lot of indexes. The indexes will probably also become very large on disk (possibly 100s of Gb), and they'll also degrade the performance since more than likely the system will struggle to keep them in RAM.
Joins can be cheap if you keep half an eye on the query optimiser. Just make sure that the filters only grab what is necessary from your huge tables before the join part starts and a join can be blazingly fast on most modern databases.
It really depends where you put the cost in the end: do you pay the cost on the query side or the manually managing data integrity side?
Even if you don't denormalize, there's plenty of optimizations for joins: e.g. a bitmap join index can optimize a millions x billions join pretty well!
What about standardisation that comes with using a framework like kubernetes , while not using k8s you end up with adhoc deployment methods, clunky work arounds of handling networking, policies, secrets etc, with Kubernetes or even ECS it signals that the team or the developer is looking to use fixed set of rules for infrastructure, also k8s scales well even for smaller apps
Seriously, I use k8s for the same reason I use docker: it's a standard language for deployment. Yeah I could do the same stuff manually for a small project.. but why?
More like generate a random number from 30 to 90 and show gibberish data, cool idea though to fool small businesses into improving their visual score and charging money, good gimmick though