MySql has recently become the most popular open source database engine. A wide range of applications use it as default method of handling data. In particular it applies to web applications, from micro-sites based on common content management systems to complex systems requiring fully a transactional, scaleable and replicable approach.
As a database engine plays a significant role in the whole system it’s very important to use it properly in order to obtain good performance and user experience. Here are some ways to improve the functioning of applications that use MySQL. Some of them apply regardless of database engine, so even if you don’t use MySQL you may still find this post useful.
Pick the right storage engine
Mysql provides a few storage engines. A storage engine is basically the way certain tables store and manage data. Use of particular methods offer certain advantages (and disadvantages). Two of the most common engines are InnoDB (default from version 5.5) and MyISAM (default before version 5.5). As a rule, MyISAM is better in cases that require many read operations; for instance the tables that stores news entries on website. It’s very likely that there are far more read operations – associated with ordinary users visiting a news section – than written – associated with the management of a news section by website admin. On the other hand InnoDB provides better performance for tables where operations are more frequently written. Another important difference is the fact that the InnoDB engine allows transactions. Right example of InnoDB usage may be the table that stores auctions data – each bid corresponds to write operation and whole bidding procedure has to be contained in the transaction.
A database index is a data structure that significantly improves the speed of read operations (in particular search operations based on the column that index involves). But then each written operation requires additional time to rebuild index structure (and index itself needs additional storage space). Generally speaking it is good practice to use indexes for all foreign keys and other columns that are heavily used in WHERE queries sections.
Using the proper text search method
Text search operations can be very costly. If used incorrectly they may cause hold-ups of the whole system. Firstly try to avoid queries like:
SELECT `Client`.`id` FROM `clients` AS `Client` WHERE `Client`.`company_name` LIKE '%abc%'
Having a wildcard at the beginning of the search pattern causes the database engine to go through a whole string to check if it matches each time. Additionally, such a pattern doesn’t utilize standard text indexes as they always index strings from the first character.
Consider setting up a FULL TEXT INDEX. This will basically allow you to create much more complex conditions against text fields. But there is one important restriction – FTI is supported only by the MyISAM storage engine.
If any of above is not enough it may be profitable to outsource text search operations to another service like Sphinx or Lucene.
Using the RAND() function to sort query results may cause serious delays in execution time. For example take the query below:
SELECT `Client`.`id` FROM `clients` AS `Client` WHERE `Client`.`is_archived` = '0' ORDER BY RAND( )
MySql firstly selects all the rows that satisfy the conditions and then adds another column with random values to set the results and finally sort them. This procedure can be very time-consuming, especially if you query large sets of data.
Only query data that you truly need
This sounds quite obvious but it’s often disregarded by developers as they simply use * wildcard in SELECT queries. Another tip is to limit the number of rows you query to the one that you actually need. That often goes hand in hand with data pagination in the application, not only improving database performance but also user experience when dealing with big sets of data.
For example if you only need the email addresses of 10 recently added clients, the query would look like:
SELECT `Client`.`id`, `Client`.`email` FROM `clients` AS `Client` ORDER BY `Client`.`id` LIMIT 0, 10
MySql allows you to detect which queries generate delays. To enable this feature simply set the value of long_query_time variable. Every query which takes more than the long_query_time will be logged in file specified in log_slow_queries. One way to set up the values of these variables on most Unix servers is to edit the my.cnf file and add two lines:
long_query_time = 3 #number of seconds log_slow_queries = /var/log/mysql/mysql-slow.log #path to log file
When you have detected a slow query, MySql provides a handy method of illustrating what’s really happening when a query is executed. You can view this by adding an EXPLAIN command in the front of the query. So for example to illustrate the execution of a previous query, you can simply run:
EXPLAIN SELECT `Client`.`id`, `Client`.`email` FROM `clients` AS `Client` ORDER BY `Client`.`id` LIMIT 0, 10
I imagine you would probably like to EXPLAIN more complex queries. I also realize that I have only covered a small part of database query optimizations topic and I have left a lot out. So please don’t hesitate to leave your comments and questions below, or I can be found on Twitter.