Jan 27·5 min read
- Chunlei Liu (Senior DBA at 58.com)
- Kai Xuan (Former Senior DBA at 58.com)
58.com is China’s leading online marketplace for classifieds covering various categories, such as jobs, real estate, automotive, financing, used goods, and local services. Merchants and consumers can publish their advertisements on our online platform so that they can connect, share information, and conduct business. By the end of 2018, we had more than 500 million users, and our total revenue in 2019 was nearly US $2.23 billion.
As our companies grew, giant amounts of knowledge flooded into our databases. However standalone MySQL databases couldn’t retailer a lot information, and sharding was an undesirable answer. To attain MySQL excessive availability, we would have liked an exterior software. To carry out On-line Analytical Processing (OLAP), we had to make use of difficult and tedious extract, remodel, load (ETL) jobs. We regarded for a scalable, easy-to-maintain, extremely out there database to resolve these points.
After an investigation, we adopted TiDB, an open-source, distributed, Hybrid Transactional/Analytical Processing (HTAP) database. Now, our production environment has 52 TiDB clusters that store 160+ TB of data, with 320+ servers running in 15 applications. To maintain such large-scale clusters, we solely want two DBAs.
On this put up, we’ll deep dive into why we migrated from MySQL to TiDB and the way we use TiDB to scale out our databases, carry out real-time analytics, and obtain excessive availability.
Our ache factors
We saved our personal cloud monitoring information in MySQL. As a result of our information dimension was enormous, we would have liked to recurrently clear our tables. This was a number of work for our DBAs.
After we used MySQL, we encountered these issues:
- A standalone MySQL database has restricted capability. It could possibly’t retailer sufficient information to fulfill our wants.
- Sharding was troublesome. At 58.com, we didn’t have a middleware staff, and our builders wanted to function and keep sharding situations by themselves. After we sharded our database, it took a number of work to combination the information.
To perform OLAP analytics, we must do complex and boring ETL tasks. An ETL process was time-consuming, and this hindered our ability to do real-time data analytics.
- To attain excessive availability, MySQL depends on an exterior software. We used Grasp Excessive Availability (MHA) to implement MySQL excessive availability, nevertheless it elevated our upkeep price.
In the primary-secondary database framework, MySQL has high latency on the secondary database. When we performed data description language (DDL) operations, high latency occurred in the secondary database. This greatly affected real-time reads.
- A standalone MySQL database couldn’t assist giant quantities of writes. When our writes in a standalone MySQL database reached about 15,000 rows of knowledge, we encountered a database efficiency bottleneck.
Due to this fact, we regarded for a brand new database answer with the next capabilities:
- It’s simple to function and keep.
- It could possibly clear up our present issues, corresponding to points introduced by sharding and plenty of writes and deletes.
- It’s appropriate for a lot of software situations and gives a number of options.
Why we selected TiDB, a NewSQL database
TiDB is an open-source, cloud-native, distributed SQL database constructed by PingCAP and its open-source group. It’s MySQL suitable and options horizontal scalability, robust consistency, and excessive availability. It’s a one-stop answer for each On-line Transactional Processing (OLTP) and OLAP workloads. You’ll be able to study extra about TiDB’s structure right here.
We adopted TiDB, as a result of it met all our necessities for the database:
- TiDB is very out there. We don’t want to make use of a further high-availability service. So TiDB helps us remove further operation and upkeep prices.
TiDB supports writes from multiple nodes. This prevents a database performance bottleneck when we write about 15,000 rows of data to a single node.
- TiDB gives information aggregation options for sharding. With TiDB, we not must shard databases.
- TiDB has an entire monitoring system. We don’t must construct our personal monitoring software program.
How we’re utilizing TiDB as a substitute for MySQL
Up to now, we’ve deployed
TiDB’s structure at 58.com
In a TiDB cluster, our application separates read and write domain names, and the backend uses load balancers to balance read and write loads. By default, a single cluster has four TiDB nodes, one for writes and three for reads. If write performance becomes a bottleneck, we can add TiDB nodes.
From TiDB 2.0 to TiDB 4.0.2
The next desk summarizes how we’ve used TiDB over time and the way our reliance on TiDB has grown.
Our purposes operating on TiDB
We’re operating 242 TiDB databases in 15 purposes. To call a number of:
TEG’s purposes are our self-developed databases’ administration platforms. They’ve excessive write volumes. In August 2019, their information dimension reached about 6 TB. TEG purposes’ information will increase by about 500 GB per 30 days. Up to now 6 months, 8 flash reminiscence playing cards have been broken in these purposes. However because of TiDB’s excessive availability, they didn’t have an effect on our companies.
Because of TiDB’s horizontal scalability, excessive availability, and HTAP capabilities, let’s imagine goodbye to troublesome MySQL sharding and time-consuming ETL. It’s really easy to scale out our databases and keep such large-scale clusters.
Now, at 58.com, we use databases including MySQL, Redis, MongoDB, Elasticsearch, and TiDB. We receive about 800 billion visits per day. We have 4,000+ clusters, with 1,500+ physical servers.
In case you have any questions otherwise you’d wish to study extra about our expertise with TiDB, you may be a part of the TiDB group on Slack.