Oracle has published MySQL Reference Architectures for Massively Scalable Web Infrastructure, a whitepaper outlining recommended topologies for different types and sizes of websites using MySQL for data storage.
This whitepaper proposes 4 reference architectures for creating websites with MySQL based on their size and availability requirements for 4 different types of services provided: User and Session Management, eCommerce, Analytics (multi-structured data), and CMS (meta-data), as shown in the following table:
These guidelines are basic recommendations and they need to be adjusted based on read/writes patterns, load balancing and caching mechanisms used, etc.
Small Web Reference Architecture
This reference architecture can be used for all small implementations of the four types of websites mentioned before. MySQL Replication can be used to make copies of the data for backup and analytics purposes.
Medium Web Reference Architecture
In this case it is recommended to use separate infrastructures for different types of activities, considering that each MySQL instance can serve up to 8 application servers, adding more slave instances if the number of application servers is increased for scalability purposes.
Linux Heartbeat along with semi-synchronous replication is used for high availability purposes for session management and eCommerce sites. CMS sites usually have greater needs for scaling out for read operations, the whitepaper recommending adding 20-30 slaves to each MySQL master based on the consideration that each slave can handle up to 3,000 concurrent users. CMS systems could store data in a SAN or on distributed devices attached to each server.
Memcached is recommended to be used both for session management and CMS sites to relieve the application and MySQL servers from much of their burden.
The topology for Analytics purposes is simpler, a master with 3 slaves doing the job.
Large Web Reference Architecture
For this reference architecture the whitepaper recommends database replication across different data centers using MySQL Geographic Replication which provides asynchronous replication across geographically separated clusters.
Session Management and eCommerce sites should use clusters, the paper claiming that with “4 x data nodes, it is possible to support 6,000 sessions (page hits) a second, with each page hit generating 8 – 12 database operations.” Large CMS sites use a configuration similar to medium ones, just having more slaves as necessary. A Data Refinery unit is introduced to clean up and organize the data for analytics purposes.
Extra Large Web Reference Architecture
The whitepaper makes recommendations even for social websites stating that MySQL is “deployed in 9 of the top 10 most trafficked sites on the web including Google, Facebook and YouTube,” without saying what those sites are using it for, but it is known that LinkedIn has used MySQL with success.
Social topology makes use of concepts implemented by medium and large websites including dedicated application servers, memcached, data refinery, but introduces shards to be able to scale out for write operations. MySQL Cluster is used for user authentication and look-up, directing reads and writes to their appropriate shard when “more than one key is used for look-ups.”
The recommended specification for both MySQL master and slave servers is:
- 8 – 16 x86-64 bit CPU cores (MySQL 5.5 and above)
- 4 – 8 x86 -64 bit CPU cores (MySQL 5.1 and earlier).
- 3 – 10x more RAM than active data.
- Linux, Solaris or Windows operating systems.
- Minimum of 4 x hard disk drives. 8 – 16 disks will increase performance for I/O intensive applications.
- Hardware RAID with battery-backed cache.
- RAID 10 recommended. RAID 5 is suitable if the workload is read-intensive.
- 2 x Network Interface Cards and 2 x Power Supply Units for redundancy.
The whitepaper also includes recommendations for MySQL clusters and data storage devices, plus solutions for monitoring, backup and cluster management.