TiDB open sources its MySQL/MariaDB compatible data migration tool

georgewfraser · on Jan 22, 2019

If you are considering TiDB for OLAP workloads, you should first evaluate an OLAP-only data warehouse like Snowflake, BigQuery, or Redshift. A "hybrid" database sounds like a good idea, until you realize that you probably don't want ad-hoc analytical queries running against your production database, so you're going to end up running two systems anyway.

Furthermore, column-stores are so different than row-stores in their ideal implementation that they are effectively completely different systems with the same outer interface (SQL). Daniel Abadi wrote a fantastic paper [1] in which he showed that fast column stores must use completely different strategies at every layer of query processing.

You should be very skeptical of any database claiming to be good at both transaction processing and analytical workloads. Such as system would effectively be two different databases hiding behind the same SQL interface, and would not be as useful as you might expect, for devops reasons.

[1] http://db.csail.mit.edu/pubs/abadi-column-stores.pdf

morgo · on Jan 22, 2019

Morgan from the TiDB team here.

I agree that column and row store have very different characteristics, but what I think is worth mentioning is that some hybrid solutions actually store as both row and columnar and have a query optimizer that can pick between them. For example: Oracle DB In-Memory, SQL Server Columnstore index.

At the same event as this announcement, we also announced that we are working on TiFlash which will do similar. Stay tuned for a blog post with more details :-)

georgewfraser · on Jan 22, 2019

As described in that paper, it’s not sufficient to simply store a second, columnar projection of the data to get good performance. You also need a block-oriented execution engine, which means you effectively have two separate databases operating side-by-side. This is a huge challenge and it’s not clear if it’s really worth it, since for logistical reasons you will nearly always operate a separate data warehouse doing mostly OLAP and production database doing mostly OLTP.

manigandham · on Jan 22, 2019

The only products that support HTAP (hybrid OLTP/OLAP) well so far are MemSQL and SQL Server. It's definitely possible but youre right, it takes a lot of engineering and requires column stores and advanced execution engines to make it happen.

I think the OLAP scenario is pretty well covered these days while scalable relational OLTP is still new and would likely be the better focus.

giancarlostoro · on Jan 22, 2019

Anybody using TiDB and care to share why? Especially if you migrated from MariaDB / MySQL. I'm curious about the use cases compared to just using MariaDB.

jinqueeny · on Jan 22, 2019

Disclaimer: I work for TiDB.

Here are some links that might be helpful:

- the typical use cases: https://pingcap.com/blog/architecture-and-use-cases-of-a-clo...

- the key differences between MySQL and TiDB

https://pingcap.com/blog/5-key-differences-between-mysql-and...

- the current case studies: https://pingcap.com/success-stories/

jinqueeny · on Jan 22, 2019

One more link: https://www.pingcap.com/blog/2017-07-24-tidbbestpractice/

hu3 · on Jan 22, 2019

Some use cases here: https://pingcap.com/docs/adopters/

etaioinshrdlu · on Jan 22, 2019

Anyone else feel super-conservative about databases? It's such a critical piece of infrastructure, the last thing you want is for it to fail... I'm using Aurora MySQL but am interested in alternatives. I can't bring myself to use them out of bug paranoia.

SamReidHughes · on Jan 22, 2019

My stance on clustered databases is that you replace hardware risk with software risk. So they make sense if they get you better throughput or better uptime stats, but you've still got to plan for a software bug bringing down the whole thing.

blakesterz · on Jan 22, 2019

Somehow I've never heard of this before:

TiDB (The pronunciation is: /'taɪdiːbi:/ tai-D-B, etymology: titanium) is an open-source distributed scalable Hybrid Transactional and Analytical Processing (HTAP) database. It features infinite horizontal scalability, strong consistency, and high availability. TiDB is MySQL compatible and serves as a one-stop data warehouse for both OLTP (Online Transactional Processing) and OLAP (Online Analytical Processing) workloads.

https://github.com/pingcap/tidb

thanatos_dem · on Jan 22, 2019

“Infinite horizontal scalability”. Yeah, no. There’s coordination overhead for adding more nodes. It needs to have an upper bound. It may be a high upper bound, but infinite is straight up impossible.

It may seem trivial, but claims like that are why I prefer CockroachDB when it comes to NewSQL systems. They’re at least very upfront with its compromises and areas needing improvement.

I’ve never see anything from Pingcap explaining the compromises that the DB makes. A “when should you not use this” type overview, and to me those are the most important things when deciding on a technology.

It may be good technology, but it’s marketed like modern day snake oil.

morgo · on Jan 22, 2019

Morgan from the TiDB team here. Thank you for the feedback, and I agree with you. We actually took this line out from the same copy in the docs: https://pingcap.com/docs/

(We must have missed a spot, and I will follow up and make sure it is addressed).

We try to be transparent about the differences from MySQL. On the compatibility page, there are a few cases described such as large transactions, small transactions and single threaded workloads:

https://www.pingcap.com/docs/sql/mysql-compatibility/

morgo · on Jan 22, 2019

Update: The documentation has now been updated https://github.com/pingcap/tidb/pull/9144

gigatexal · on Jan 22, 2019

How does tidb deal with at rest encryption and nested transactions?

morgo · on Jan 22, 2019

Morgan from the TiDB team here. We are working on at rest encryption now - stay tuned.

w.r.t. nested transactions, this is not something that MySQL currently offers (TiDB is MySQL 5.7 compatible). Sometimes this is emulatable via savepoints, which is a feature we plan to add in the future.

gigatexal · on Jan 22, 2019

How good or bad is the SqlAlchemy support?

morgo · on Jan 22, 2019

It should work fine, with one item to note:

I believe SqlAlchemy defaults to READ-COMMITTED, whereas TiDB defaults to (and recommends) MySQL's default of REPEATABLE-READ.

gigatexal · on Jan 27, 2019

Any chance you guys run it against the SQLalchemy test suite?

gigatexal · on Jan 23, 2019

I’ll check it out today. Thanks.