No Sharding, No ETL: Use a Scale-Out MySQL Different to Retailer 160+ TB of Information

Jan 27·5 min read

Business: Promoting


    Chunlei Liu (Senior DBA at
  • Kai Xuan (Former Senior DBA at is China’s leading online marketplace for classifieds covering various categories, such as jobs, real estate, automotive, financing, used goods, and local services. Merchants and consumers can publish their advertisements on our online platform so that they can connect, share information, and conduct business. By the end of 2018, we had more than 500 million users, and our total revenue in 2019 was nearly US $2.23 billion.

As our companies grew, giant amounts of knowledge flooded into our databases. However standalone MySQL databases couldn’t retailer a lot information, and sharding was an undesirable answer. To attain MySQL excessive availability, we would have liked an exterior software. To carry out On-line Analytical Processing (OLAP), we had to make use of difficult and tedious extract, remodel, load (ETL) jobs. We regarded for a scalable, easy-to-maintain, extremely out there database to resolve these points.

After an investigation, we adopted TiDB, an open-source, distributed, Hybrid Transactional/Analytical Processing (HTAP) database. Now, our production environment has 52 TiDB clusters that store 160+ TB of data, with 320+ servers running in 15 applications. To maintain such large-scale clusters, we solely want two DBAs.

On this put up, we’ll deep dive into why we migrated from MySQL to TiDB and the way we use TiDB to scale out our databases, carry out real-time analytics, and obtain excessive availability.

See also  17 Best Photo Editing Software for PC Free Download [2021]

Our ache factors

After we used MySQL, we encountered these issues:

    A standalone MySQL database has restricted capability. It could possibly’t retailer sufficient information to fulfill our wants.
  • Sharding was troublesome. At, we didn’t have a middleware staff, and our builders wanted to function and keep sharding situations by themselves. After we sharded our database, it took a number of work to combination the information.
  • To perform OLAP analytics, we must do complex and boring ETL tasks. An ETL process was time-consuming, and this hindered our ability to do real-time data analytics.
  • To attain excessive availability, MySQL depends on an exterior software. We used Grasp Excessive Availability (MHA) to implement MySQL excessive availability, nevertheless it elevated our upkeep price.
  • In the primary-secondary database framework, MySQL has high latency on the secondary database. When we performed data description language (DDL) operations, high latency occurred in the secondary database. This greatly affected real-time reads.
  • A standalone MySQL database couldn’t assist giant quantities of writes. When our writes in a standalone MySQL database reached about 15,000 rows of knowledge, we encountered a database efficiency bottleneck.

Due to this fact, we regarded for a brand new database answer with the next capabilities:

    It has an active community. If its community is not active, when we find issues or bugs, we can’t get solutions.
  • It’s simple to function and keep.
  • It could possibly clear up our present issues, corresponding to points introduced by sharding and plenty of writes and deletes.
  • It’s appropriate for a lot of software situations and gives a number of options.

Why we selected TiDB, a NewSQL database

We adopted TiDB, as a result of it met all our necessities for the database:

    TiDB uses distributed storage, and it can easily scale out. We no longer need to worry that a single machine’s capacity is limited.
  • TiDB is very out there. We don’t want to make use of a further high-availability service. So TiDB helps us remove further operation and upkeep prices.
  • TiDB supports writes from multiple nodes. This prevents a database performance bottleneck when we write about 15,000 rows of data to a single node.
  • TiDB gives information aggregation options for sharding. With TiDB, we not must shard databases.
  • TiDB has an entire monitoring system. We don’t must construct our personal monitoring software program.

How we’re utilizing TiDB as a substitute for MySQL

TiDB’s structure at


TiDB’s structure at

From TiDB 2.0 to TiDB 4.0.2


Our purposes operating on TiDB


TEG’s purposes are our self-developed databases’ administration platforms. They’ve excessive write volumes. In August 2019, their information dimension reached about 6 TB. TEG purposes’ information will increase by about 500 GB per 30 days. Up to now 6 months, 8 flash reminiscence playing cards have been broken in these purposes. However because of TiDB’s excessive availability, they didn’t have an effect on our companies.


Now, at, we use databases including MySQL, Redis, MongoDB, Elasticsearch, and TiDB. We receive about 800 billion visits per day. We have 4,000+ clusters, with 1,500+ physical servers.

In case you have any questions otherwise you’d wish to study extra about our expertise with TiDB, you may be a part of the TiDB group on Slack.

Leave a Reply

Your email address will not be published.