If you are considering TiDB for OLAP workloads, you should first evaluate an OLAP-only data warehouse like Snowflake, BigQuery, or Redshift. A "hybrid" database sounds like a good idea, until you realize that you probably don't want ad-hoc analytical queries running against your production database, so you're going to end up running two systems anyway.
Furthermore, column-stores are so different than row-stores in their ideal implementation that they are effectively completely different systems with the same outer interface (SQL). Daniel Abadi wrote a fantastic paper [1] in which he showed that fast column stores must use completely different strategies at every layer of query processing.
You should be very skeptical of any database claiming to be good at both transaction processing and analytical workloads. Such as system would effectively be two different databases hiding behind the same SQL interface, and would not be as useful as you might expect, for devops reasons.
I agree that column and row store have very different characteristics, but what I think is worth mentioning is that some hybrid solutions actually store as both row and columnar and have a query optimizer that can pick between them. For example: Oracle DB In-Memory, SQL Server Columnstore index.
At the same event as this announcement, we also announced that we are working on TiFlash which will do similar. Stay tuned for a blog post with more details :-)
As described in that paper, it’s not sufficient to simply store a second, columnar projection of the data to get good performance. You also need a block-oriented execution engine, which means you effectively have two separate databases operating side-by-side. This is a huge challenge and it’s not clear if it’s really worth it, since for logistical reasons you will nearly always operate a separate data warehouse doing mostly OLAP and production database doing mostly OLTP.
The only products that support HTAP (hybrid OLTP/OLAP) well so far are MemSQL and SQL Server. It's definitely possible but youre right, it takes a lot of engineering and requires column stores and advanced execution engines to make it happen.
I think the OLAP scenario is pretty well covered these days while scalable relational OLTP is still new and would likely be the better focus.
Anybody using TiDB and care to share why? Especially if you migrated from MariaDB / MySQL. I'm curious about the use cases compared to just using MariaDB.
Anyone else feel super-conservative about databases? It's such a critical piece of infrastructure, the last thing you want is for it to fail... I'm using Aurora MySQL but am interested in alternatives. I can't bring myself to use them out of bug paranoia.
My stance on clustered databases is that you replace hardware risk with software risk. So they make sense if they get you better throughput or better uptime stats, but you've still got to plan for a software bug bringing down the whole thing.
TiDB (The pronunciation is: /'taɪdiːbi:/ tai-D-B, etymology: titanium) is an open-source distributed scalable Hybrid Transactional and Analytical Processing (HTAP) database. It features infinite horizontal scalability, strong consistency, and high availability. TiDB is MySQL compatible and serves as a one-stop data warehouse for both OLTP (Online Transactional Processing) and OLAP (Online Analytical Processing) workloads.
“Infinite horizontal scalability”. Yeah, no. There’s coordination overhead for adding more nodes. It needs to have an upper bound. It may be a high upper bound, but infinite is straight up impossible.
It may seem trivial, but claims like that are why I prefer CockroachDB when it comes to NewSQL systems. They’re at least very upfront with its compromises and areas needing improvement.
I’ve never see anything from Pingcap explaining the compromises that the DB makes. A “when should you not use this” type overview, and to me those are the most important things when deciding on a technology.
It may be good technology, but it’s marketed like modern day snake oil.
Morgan from the TiDB team here. Thank you for the feedback, and I agree with you. We actually took this line out from the same copy in the docs: https://pingcap.com/docs/
(We must have missed a spot, and I will follow up and make sure it is addressed).
We try to be transparent about the differences from MySQL. On the compatibility page, there are a few cases described such as large transactions, small transactions and single threaded workloads:
Morgan from the TiDB team here. We are working on at rest encryption now - stay tuned.
w.r.t. nested transactions, this is not something that MySQL currently offers (TiDB is MySQL 5.7 compatible). Sometimes this is emulatable via savepoints, which is a feature we plan to add in the future.
Furthermore, column-stores are so different than row-stores in their ideal implementation that they are effectively completely different systems with the same outer interface (SQL). Daniel Abadi wrote a fantastic paper [1] in which he showed that fast column stores must use completely different strategies at every layer of query processing.
You should be very skeptical of any database claiming to be good at both transaction processing and analytical workloads. Such as system would effectively be two different databases hiding behind the same SQL interface, and would not be as useful as you might expect, for devops reasons.
[1] http://db.csail.mit.edu/pubs/abadi-column-stores.pdf