Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

From a straight performance perspective:

The rule of thumb is that DBs are bound on IO, so any calculation that you can get with the same amount of IO as returning the records is likely to be "free", and any calculation that requires more IO is likely to be "expensive" (unless that allows you to avoid IO by restricting returned records, avoiding future queries, etc. - it gets complicated fast).

So things like column-column calculations, simple aggregates, etc. are likely to be good ideas on the DB; for anything else It Depends.



Another fun complication: database stored procedures implemented with non-SQL code. I worked on a project where the client's dedicated SQL ninjas managed to bring the time to process daily extracts from the data warehouse down to 2min from 45min by switching over to CLR (.NET) implementations of the calculations being performed on SQL Server 2005.

IIRC the issue was simply that MSSQL 2005 is just plain slow at doing calculations compared to C# code (the calculations were complex financial models involving large amounts of data and inter-dependencies between that data in calculations, so shipping the raw data to an application was not a viable option).


The rule of thumb is that DBs are bound on IO

Isn't that just because of the way systems that make use of DBs are architected? I would think you could trivially make them CPU bound if you start tacking a bushel of FLOPs on to every request.


It's a rule of thumb, not a rule of fist.

But actually, it's not uncommon that you might have enough caching (or an in-memory DB) that you appear to be maxed out on CPU - but because of memory latency, overhead between the DB and the network, etc. you can still get a fair number of operations for "free".

Measure everything, but know where your starting points are and what to try first.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: