Hacker Newsnew | past | comments | ask | show | jobs | submit | nusmella's commentslogin

An old company I worked for used project management software with a check-in/out mechanism for making changes. When you "check out" a project it downloads a copy that you change locally, then "check in" uploads it back to the server. A project is "locked" while in the "checked out" state. We all felt it was an archaic mechanism in a word of live updating apps.

After 10 years of building SPA "web apps", that data synchronization mechanism feels ahead of its time.


What many people either can't or don't want to acknowledge is that ultimately whether or not you support live updates in parallel by multiple users, instead of locking so only one update can proceed at a time, is not a technical decision, it's a business decision: do the business rules that are appropriate for your application enable you to deal with concurrent live updates or not?

Ultimately that comes down to whether you can implement a process to ensure consistent resolution of any incompatibilities between multiple concurrent updates. Sometimes that can be done, and sometimes it can't, and which is the case for your application depends on your business rules, not on any technical capability.

If your business rules don't allow you to implement a resolution mechanism, you need locking so that only one update can happen at a time, whether you have the technical capability to support concurrent updates or not.


Indeed, many of the most painful technical problems are actually three business problems in a trenchcoat.


This is one of those phrases that should turn into a saying, and be passed around for hundreds of years.

Every hard problem I have today in my career involves getting business people to define their business problem properly in order to solve it with technology. Even the hardest code I've ever written was easy compared to some projects, simply due to the business issues lurking around the project. Last week I finished a script to download a CSV and save it to a SQL table (literally) that took 3 weeks because business folks couldn't get their act together on what they wanted. I finished another project in a few days which is currently the core of a previous employers energy efficiency controls product which was easy because the person defining it did it very well, and I had no questions, just work to perform.


So true ... and often those business decision are not yours to make.


Sometimes known as, "two people with firing authority fighting a proxy war through the dev team".


This literally made me lol. :-)


Nah. I've seen plenty of systems where the business rules would handle concurrent updates fine, but since they're using a traditional Web/ORM/RDBMS setup they build a last-write-wins system without thinking about it. It's one of those rare problems where the technical part is actually harder than the business part.


Database systems have been able to deal with concurrent updates for quite some time now, so I don't think doing this is technically difficult with the current state of the art. Individual dev teams might not be well versed in the current state of the art, but the correct business response to that is not to restrict your business rules but to get developers who are well versed in the current state of the art.


> Database systems have been able to deal with concurrent updates for quite some time now, so I don't think doing this is technically difficult with the current state of the art.

Traditional ACID systems can't really handle them nicely - your only choice with an update is to commit it or discard it - so you have to do a lot of handwritten logic on top, and even if the database itself handles that well, the layers above it generally don't. Event-sourcing style systems work well but they're still not really mainstream yet.


It solves so many problems and makes it so easy to implement if you go this way.

But just like mentioned it is hard to convince people that it is what they actually want.

People fall into some grand illusion that everything should be always available but in reality then one person is doing changes at a time and if somehow 2 or more people have to work on something - more often than not they should be talking or communicating with each other anyway to synchronize.

Even with GIT and fully distributed development you cannot solve conflicts automagically. You still have to communicate with others and understand context to pick correct changes.


you can only have one person work on the code at a time? that seems, very very obviously dumb


I can change A to B on my own, you can change A to C on your own.

At some point we have to communicate which change is correct.

It does not have to be synchronous and it might be via commit message - but still change alone is not enough for conflict resolution.

If you edit word document and someone then changes something there is no commit message but might be comment on document, email or im.


Unison has a neat approach to this problem: References are hashes of the abstract syntax tree, the only way to write a "collision" is to write an identical function--which isn't actually a collision at all.


Good point. I do the same in my own system and use Hashes of the source code, so there are no collisions. Slowly this technique will become mainstream I predict.


Multiple people can work on the code simultaneously and asynchronously, but conflict resolution must be done synchronously.


This totally. This is one of the reasons that classical RDBMS paradigms and software like MySQL still survive despite however people want to talk it down in favor of "Nosql" or non-relational databases like mongodb citing how fast it is or how cool it is in comparison.

For some things, you need the time tested solutions.


Sounds like RCS [1]. I remember, back when a company I worked for switched from RCS to CVS, one of my coworkers was annoyed that CVS didn't support locking checkouts.

[1] https://en.wikipedia.org/wiki/Revision_Control_System [2] https://en.wikipedia.org/wiki/Concurrent_Versions_System


And, of course, the default mode of Microsoft Team Foundation Server [0], decades after there were better patterns.

So many forgotten locks from lazy devs...

[0] https://en.m.wikipedia.org/wiki/Azure_DevOps_Server#TFVC


Now I feel old, I remember "Anything but sourcesafe" [0], which was a followup to "Visual Sourcesafe Version Control tunsafe at any speed", and having my trust evapourate when I found out Microsoft didn't dogfood their own version control system.

So long ago I can't remember exactly which but I was running a local cvs and/or subversion repository for my own work just to avoid issues like the above. s [0] https://blog.codinghorror.com/source-control-anything-but-so...

[1] https://developsense.com/visual-sourcesafe-version-control-u...

To get back on topic, the key thing an explicit database gives you is a purpose built-language (and data-integrity enforcement etc. if you do it properly), that everyone knows. (Or used to? SQL is getting more hidden by abstraction layers/eco-systems these days). I'm old, so I reach for my older, well understood tools over new and exciting. Get off my lawn. It may be over-architecting, but I'm also not working in maximising 'performance in milli/micro-seconds is vital' high load environments, or releasing updated software every other day.

The other issue is tool/eco-system fragmentation.

But when you're young and have the energy and mental capacity to abstract out the wahoo for effeciency/performance, you do, because you can, because its better at the time. In our day everyone was writing code to write to code which were effectively the pre-cursors to ORM's. It's just part of being young and committed to your craft, and wanting to get better at it - this is a good thing!

It's only as you get older you start to appreciate the "Less is More" around same time that job ads appear with "Must have 3 years of SQL-Sync experience" (no offence intended here). There are both costs and benefits but which and how much of each you only find out years later.


Are you sure? My experience of using TFVC was that it would warn you if someone else had opened the file for editing but would not actually lock it. Multiple people could edit the same file concurrently with standard automerging/conflict resolution afterwards.


Server workspaces vs local workspaces, maybe? With server, your local copy was marked read-only. Don’t recall if you could change that flag to edit anyway. We moved to local workspaces as Quickly as we could - that was a more typical offline edit, resolve conflicts at commit model. Don’t remember all the details, been 5+ years since I did anything with TFS.


Yes, “tf edit” would mark on the server that you were opening the file for editing, and cleared the read-only bit, but it didn’t lock the file for others or prevent concurrent edits in any way.


I'm definitely not sure. Could very well be the transition from CVS to Subversion that I'm remembering. It's been a long time :)


Back in the early days of TFS I was briefly at a company that went all in on MS tools. TFS was used and to avoid the lock each developer had a clone made and after checking their clone in the “TFS Guy” in the office would merge it. He also had to merge things when later checking had conflicting changes.

Now, the best part of this shit show was they had ~30 different customers and each of these customers had a clone of the main thing that would be customized. So the “TFS Guy” had to determine if to keep in the customer clone only or to propagate to the main and then to all the other clones!

Needless to say the “TFS Guy” made a lot of money.


I have to use TFS for a couple of projects where I work. I really wish we had a "TFS Guy"!


That sounds like torture, he deserved that money.


I'm a fan of this approach. SQLSync effectively is doing this continuously - however it would be possible to coordinate it explicitly, thus enabling that kind of check in/out approach. As for single-owner lock strategies, I think you could also simulate that with SQLSync - although you may not need to depending on the app. If the goal is to "work offline" and then merge when you're ready, SQLSync provides this pattern out of the box. If the goal is only one client can make any changes, then some kind of central lock pattern will need to be used (which you could potentially coordinate via SQLSync).


flashbacks to working on a team were we needed to shout across the room for people to unlock their source files in MS SourceSafe :-p



Looks very similar to JEDI [0], an early Delphi VCS system that worked that way. It gave us the tranquility to know that no conflict would appear, as only one developer could work with a locked/checked out file at a time. There was no merge those days. In contrast, files that were frequently changed in every task would always cause a blocking between developers.

[0] https://jedivcs.sourceforge.net/


There were loads of VCSs that operated this way. And I don’t miss them one bit.


Sounds like Lotus Notes.


CouchDB is a lineal descendant I guess.


Even the alpine nodejs images have pnpm and yarn nowadays


I can't wait until people find its limits... in production!


Did you know that Postgres has a max table size of 32TB? Its really really fun to find that out the Wednesday evening before Thanksgiving.

Make sure to prune old data from your tables. This one got to this limit because it eventually got too large that queries to delete old data would time out... so it just kept growing.

https://www.postgresql.org/docs/current/limits.html


I just put a maximum Postgres table size I’m willing to manage in my employment contract. By the time we’re measuring in Terabytes I’m out of there.


For an interesting presentation on Postgres at different orders of magnitude, all the way up to petabytes, see https://thebuild.com/presentations/2019-fosdem-broken.pdf


it is partitioned underneath with limit of 32TB for single partition. (hello ctid). there could be many of them.


Wow. That is still pretty large for one table.


Everyone has a testing environment. Some people even have separate production environments.


Production is the best testing environment after all.


For 90%+ of our customers (small-to-mid sized US financial institutions), production is the only environment available to work with.

For the other 10%, we take them aside and politely explain that they almost certainly have an unusable staging environment per the scope of our B2B project.

Testing in production is a wonderful path if you are comfortable talking to business people and making lots of compromises.


I don't always test my code. But when I do, I test it in production. -The Most Interesting Man in the World


My Pixel 5 bricked recently after the latest android update.

Waited weeks to upgrade so that upgrade issues could be resolved by Google.

Lesson learned.


I don't know if this is normal or if anyone else does it but I usually either download binaries or compile from source and move the executable to /usr/local/bin/ and create a symlink. Lets me easily switch between versions. I avoid using a package manager for anything where I want control over the version and installation.

- curl -fSLJO $RELEASE

- tar xvf $DOWNLOAD.tar.gz && cd $DOWNLOAD`

- make .

- mv $EXECUTABLE /usr/local/bin/$EXECUTABLE-$VERSION`

- ln -s /usr/local/bin/$EXECUTABLE-$VERSION /usr/local/bin/$EXECUTABLE

- # chmod 750, chown root:$appuser, etc

Works great for everything I've tried thus far. Redis, HAproy, Prometheus exporters, and many more.


We have a 10TB database we switched from Aurora to Postgres and it cut out bill by 80%. However, there are some differences in our schema such as now using native partitions so it's hard to tell how much $ is due to the switch and how much due to our table and query design.

We have a similar story with DynamoDB too.


Curious what you mean by switching from Aurora to Postgres? AWS offers Postgres on Aurora, and Postgres on regular RDS. Do you mean you switched to RDS, or off of AWS altogether, or something else?


I mean that we switched from RDS Aurora Postgres to RDS regular Postgres


Probably means Aurora MySQL. In CloudFormation and other AWS artifacts, "Aurora" is a keyword that regularly comes up meaning MySQL, since that was the original target for Aurora years before the Postgres flavor was released. There are AWS old-timers at my shop that call it Aurora, and it shows up in their YAML.


To whomever downvoted, when specifying the AWS::RDS::DBCluster "Engine" property in CloudFormation, aurora = mysql5.6 and below, aurora-mysql = mysql5.7 or mysql8.x, aurora-postgresql = postgres. Since 5.6 was deprecated, the "aurora" engine type was removed the CF docs, but it was there until a few years ago. "Aurora" was synonymous with MySQL for a while.

#AWSHistory


People downvoted because you were assuming that when someone says "we moved to postgres" they would mean "we moved to mysql" as if they wouldn't know what they were talking about.

Even your history thing makes no sense, Aurora Postgres was launched 9 months after Mysql version in July 2015.


Aurora MySQL was released October 2014.

https://aws.amazon.com/blogs/aws/highly-scalable-mysql-compa...

Aurora Postgres hit general availability October 2017. (Sorry, betas and early release offerings don't count.)

https://aws.amazon.com/blogs/aws/now-available-amazon-aurora...

Quick math… carry the two… compute the partial differential equation…

Looks like 3 YEARS between the release of Aurora MySQL and Aurora Postgres.


Yes, if you choose other definitions for launch dates you can come up with many different timespans.


You're right. I should use the GA announcement for Aurora MySQL as well.

…which was November 2015.

https://aws.amazon.com/blogs/aws/now-available-amazon-aurora...


So now tell me, between 7-9 years have gone by since Aurora is a multi database product, what sense does it make to assume its mysql? That's like saying at your company people call AWS "S3" or "SQS" because that's how it started. I don't even know what point you're trying to make because unless the majority of the time was mysql only and then recently other databases would've been added, the anecdote would still not make sense.


https://news.ycombinator.com/item?id=37085062

The majority of the time, it was identified as "aurora" in CloudFormation templates. 5.6 hit EOL in February of THIS YEAR, aka 2023.

https://aws.amazon.com/blogs/database/upgrade-amazon-aurora-...

So… that would make it the majority of time, wouldn't it? Why does this upset you so?


Someone disagreeing with you doesn't mean they are upset my friend. Have a good one because we're not going to agree.


> People downvoted because you were assuming that when someone says "we moved to postgres" they would mean "we moved to mysql" as if they wouldn't know what they were talking about.

What? That's not what that comment says at all. They're saying that aurora mysql was a plausible interpretation of what OP moved from, before OP clarified.


Yep, I was wrong about the source DB in retrospect, but that other guy just seems to want to argue with straw men. Takes all kinds, I guess.


> using native partitions

FWIW, I'm actively exploring native partitions on Aurora with Postgres and I'm seeing very little benefit. Two identical tables, each with 500M+ rows, and I'm not seeing any meaningful performance/IO changes. A composite index with the partition key as the first entry has been as effective for reducing IO and query time as partition pruning. I'm sure there are workloads where partitioning makes more sense, but I've been surprised by how little difference there was in this case.


> it's hard to tell how much $ is due to the switch and how much due to our table and query design.

Sounds like good material for a technical blog post.


What's the story with DynamoDB?


I don't understand the appeal of serverless.

>it costs less

That's only true if you have low traffic in which case why not host from a $50/mo (at most) VPC? If a business can pay your salary then surely they can afford an extra $50/mo cloud costs.

>you don't have to manage servers

However now you have to learn to write serverless functions that will execute in an environment fundamentally different from your local machine making it more difficult to develop. So you've reduced time spent in devops and increased time spent in development.


Regarding cost, it can depend a lot on your traffic structure. If traffic is bursty, or substantially different between intraday peaks and troughs, it can be more cost effective. Solving this yourself costs dev time.

>reduced time spent in devops and increased time spent in development

This may be true if you’re trying to figure out how to do something you already know how to do outside of serverless, but IME many developers benefit from serverless eliminating boilerplate and nudging them away from state where it’s not necessary.

Also regarding cost and devops: I see serverless as an insurance policy against “my startup/product got a once-in-a-lifetime lucky break by going viral while I was asleep” causing you to fall over from a flood of traffic. Not only do you get to skip implementing your own scaling to go from 0-1 but you only pay substantially (vs the regular price diff) extra for it when it happens.


I've found some great use cases for serverless. None of those have involved hosting a backend for a website / web application. It's been a useful solution for automating some cloud management tasks though.


I wonder if these are the same systems that "detect fraud" and freeze my bank account requiring manual intervention to fix the 2 times a year I send a random family member less than $2,000


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: