Monday, September 20, 2010

IT person 的生涯

如果你是一个靠近30岁的programmer了,你有没有想过,其实这条路对你来说并不是那么适合的?
每天不断的抱怨,不断的开始讨厌一些在工作上给你难堪的人?
然后你会发现到,别人把你当傻瓜来使,当踏脚板来用,在人前把你批得一无是处 ...
这就是一个programmer 生涯即将结束的征兆?
还是,这个纯粹是职场上的一种正常的现象?
其实有少许灰心...
我发现自己的梦想渐渐离我而去,我不够努力?
不够专心?还是根本没机会发挥?

Friday, September 3, 2010

84 tips for mysql optimization

MySQL is a widely used and fast SQL database server. It is a client/server implementation that consists of a server daemon (mysqld) and many different client programs/libraries.


  1. Kaj (Most Excellent Obvious Facilitator) Index stuff.
  2. Ronald Don't Index Everything
  3. Use benchmarking
  4. Minimize traffic by fetching only what you need.
    1. Paging/chunked data retrieval to limit
    2. Don't use SELECT *
    3. Be wary of lots of small quick queries if a longer query can be more efficient
  5. Use EXPLAIN to profile the query execution plan
  6. Use Slow Query Log (always have it on!)
  7. Don't use DISTINCT when you have or could use GROUP BY
  8. Use proper data partitions
    1. For Cluster. Start thinking about Cluster *before* you need them
  9. Insert performance
    1. Batch INSERT and REPLACE
    2. Use LOAD DATA instead of INSERT
  10. LIMIT m,n may not be as fast as it sounds
  11. Don't use ORDER BY RAND() if you have > ~2K records
  12. Use SQL_NO_CACHE when you are SELECTing frequently updated data or large sets of data
  13. avoid wildcards at the start of LIKE queries
  14. avoid correlated subqueries and in select and where clause (try to avoid in)
  15. config params –
  16. no calculated comparisons — isolate indexed columns
  17. innodb_flush_commit=0 can help slave lag
  18. ORDER BY and LIMIT work best with equalities and covered indexes
  19. isolate workloads don't let administrative work interfere with customer performance. (ie backups)
  20. use optimistic locking, not pessimistic locking. try to use shared lock, not exclusive lock. share mode vs. FOR UPDATE
  21. use row-level instead of table-level locking for OLTP workloads
  22. Know your storage engines and what performs best for your needs, know that different ones exist.
    1. use MERGE tables ARCHIVE tables for logs
  23. Optimize for data types, use consistent data types. Use PROCEDURE ANALYSE() to help determine if you need less
  24. separate text/blobs from metadata, don't put text/blobs in results if you don't need them
  25. if you can, compress text/blobs
  26. compress static data
  27. don't back up static data as often
  28. derived tables (subqueries in the FROM clause) can be useful for retrieving BLOBs w/out sorting them. (self-join can speed up a query if 1st part finds the IDs and use it to fetch the rest)
  29. enable and increase the query and buffer caches if appropriate
  30. ALTER TABLE…ORDER BY can take chronological data and re-order it by a different field
  31. InnoDB ALWAYS keeps the primary key as part of each index, so do not make the primary key very large, be careful of redundant columns in an index, and this can make the query faster
  32. Do not duplicate indexes
  33. Utilize different storage engines on master/slave ie, if you need fulltext indexing on a table.
  34. BLACKHOLE engine and replication is much faster than FEDERATED tables for things like logs.
  35. Design sane query schemas. don't be afraid of table joins, often they are faster than denormalization
  36. Don't use boolean flags
  37. Use a clever key and ORDER BY instead of MAX
  38. Keep the database host as clean as possible. Do you really need a windowing system on that server?
  39. Utilize the strengths of the OS
  40. Hire a MySQL ™ Certified DBA
  41. Know that there are many consulting companies out there that can help, as well as MySQL's Professional Services.
  42. Config variables & tips:
    1. use one of the supplied config files
    2. key_buffer, unix cache (leave some RAM free), per-connection variables, innodb memory variables
    3. be aware of global vs. per-connection variables
    4. check SHOW STATUS and SHOW VARIABLES (GLOBAL|SESSION in 5.0 and up)
    5. be aware of swapping esp. with Linux, "swappiness" (bypass OS filecache for innodb data files, innodb_flush_method=O_DIRECT if possible (this is also OS specific))
    6. defragment tables, rebuild indexes, do table maintenance
    7. If you use innodb_flush_txn_commit=1, use a battery-backed hardware cache write controller
    8. more RAM is good so faster disk speed
    9. use 64-bit architectures
  43. Know when to split a complex query and join smaller ones
  44. Debugging sucks, testing rocks!
  45. Delete small amounts at a time if you can
  46. Archive old data — don't be a pack-rat! 2 common engines for this are ARCHIVE tables and MERGE tables
  47. use INET_ATON and INET_NTOA for IP addresses, not char or varchar
  48. make it a habit to REVERSE() email addresses, so you can easily search domains
  49. –skip-name-resolve
  50. increase myisam_sort_buffer_size to optimize large inserts (this is a per-connection variable)
  51. look up memory tuning parameter for on-insert caching
  52. increase temp table size in a data warehousing environment (default is 32Mb) so it doesn't write to disk (also constrained by max_heap_table_size, default 16Mb)
  53. Normalize first, and denormalize where appropriate.
  54. Databases are not spreadsheets, even though Access really really looks like one. Then again, Access isn't a real database
  55. In 5.1 BOOL/BIT NOT NULL type is 1 bit, in previous versions it's 1 byte.
  56. A NULL data type can take more room to store than NOT NULL
  57. Choose appropriate character sets & collations — UTF16 will store each character in 2 bytes, whether it needs it or not, latin1 is faster than UTF8.
  58. make similar queries consistent so cache is used
  59. Have good SQL query standards
  60. Don't use deprecated features
  61. Use Triggers wisely
  62. Run in SQL_MODE=STRICT to help identify warnings
  63. Turning OR on multiple index fields (<5.0) into UNION may speed things up (with LIMIT), after 5.0 the index_merge should pick stuff up.
  64. /tmp dir on battery-backed write cache
  65. consider battery-backed RAM for innodb logfiles
  66. use min_rows and max_rows to specify approximate data size so space can be pre-allocated and reference points can be calculated.
  67. as your data grows, indexing may change (cardinality and selectivity change). Structuring may want to change. Make your schema as modular as your code. Make your code able to scale. Plan and embrace change, and get developers to do the same.
  68. pare down cron scripts
  69. create a test environment
  70. try out a few schemas and storage engines in your test environment before picking one.
  71. Use HASH indexing for indexing across columns with similar data prefixes
  72. Use myisam_pack_keys for int data
  73. Don't use COUNT * on Innodb tables for every search, do it a few times and/or summary tables, or if you need it for the total # of rows, use SQL_CALC_FOUND_ROWS and SELECT FOUND_ROWS()
  74. use –safe-updates for client
  75. Redundant data is redundant
  76. Use INSERT … ON DUPLICATE KEY update (INSERT IGNORE) to avoid having to SELECT
  77. use groupwise maximum instead of subqueries
  78. be able to change your schema without ruining functionality of your code
  79. source control schema and config files
  80. for LVM innodb backups, restore to a different instance of MySQL so Innodb can roll forward
  81. use multi_query if appropriate to reduce round-trips
  82. partition appropriately
  83. partition your database when you have real data
  84. segregate tables/databases that benefit from different configuration variables

INSERT INTO ... SELECT Performance with Innodb tables

Everyone using Innodb tables probably got use to the fact Innodb tables perform non locking reads, meaning unless you use some modifiers such as LOCK IN SHARE MODE or FOR UPDATE, SELECT statements will not lock any rows while running.

This is generally correct, however there a notable exception - INSERT INTO table1 SELECT * FROM table2. This statement will perform locking read (shared locks) for table2 table. It also applies to similar tables with where clause and joins. It is important for tables which is being read to be Innodb - even if writes are done in MyISAM table.

So why was this done, being pretty bad for MySQL Performance and concurrency ? The reason is - replication. In MySQL before 5.1 replication is statement based which means statements replied on the master should cause the same effect as on the slave. If Innodb would not locking rows in source table other transaction could modify the row and commit before transaction which is running INSERT .. SELECT statement. This would make this transaction to be applied on the slave before INSERT... SELECT statement and possibly result in different data than on master. Locking rows in the source table while reading them protects from this effect as other transaction modifies rows before INSERT ... SELECT had chance to access it it will also be modified in the same order on the slave. If transaction tries to modify the row after it was accessed and so locked by INSERT ... SELECT, transaction will have to wait until statement is completed to make sure it will be executed on the slave in proper order. Gets pretty complicated ? Well all you need to know it had to be done fore replication to work right in MySQL before 5.1.

In MySQL 5.1 this as well as few other problems should be solved by row based replication. I'm however yet to give it real stress tests to see how well it performs :)

One more thing to keep into account - INSERT ... SELECT actually performs read in locking mode and so partially bypasses versioning and retrieves latest committed row. So even if you're operation in REPEATABLE-READ mode, this operation will be performed in READ-COMMITTED mode, potentially giving different result compared to what pure SELECT would give. This by the way applies to SELECT .. LOCK IN SHARE MODE and SELECT ... FOR UPDATE as well.

One my ask what is if I'm not using replication and have my binary log disabled ? If replication is not used you can enable innodb_locks_unsafe_for_binlog option, which will relax locks which Innodb sets on statement execution, which generally gives better concurrency. However as the name says it makes locks unsafe fore replication and point in time recovery, so use innodb_locks_unsafe_for_binlog option with caution.

Note disabling binary logs is not enough to trigger relaxed locks. You have to set innodb_locks_unsafe_for_binlog=1 as well. This is done so enabling binary log does not cause unexpected changes in locking behavior and performance problems. You also can use this option with replication sometimes, if you really know what you're doing. I would not recommend it unless it is really needed as you might not know which other locks will be relaxed in future versions and how it would affect your replication.

So what are safe workarounds if you're using replication ?

The most general one is to use: PLAIN TEXT SQL:

  1. SELECT * FROM tbl1 INFO OUTFILE '/tmp/tbl1.txt';
  2. LOAD DATA INFILE '/tmp/tbl1.txt' INTO TABLE tbl2;

instead of: PLAIN TEXT SQL:

  1. INSERT INTO tbl2 SELECT * FROM tbl1;

INSERT ... INTO OUTFILE does not have to set extra locks.

If you use this aproach make sure to delete file after it is loaded back (it has to be done outside of MySQL Server) as otherwise the script will fail second time.

If you need result to be even closer to one of INSERT ... SELECT you may execute this transaction in READ-COMMITTED isolation mode.

Other workarounds are less general purpose. For example if you're doing batch processing which is well indexed you might chop transactions and process rows by small bulks, which do not cause long enough locks to cause the problems.

To complete this article I should show how wait caused by this statement will look in SHOW INNODB STATUS:

 TRANSACTION 0 42304626, ACTIVE 14 sec, process no 29895, OS thread id 2894768 updating or deleting mysql tables in use 1, locked 1 LOCK WAIT 3 lock struct(s), heap size 320, undo log entries 1 MySQL thread id 1794760, query id 6994946 localhost root Updating update sample set j=0 where i=5 TRX HAS BEEN WAITING 14 SEC FOR THIS LOCK TO BE GRANTED: RECORD LOCKS space id 0 page no 33504 n bits 328 index `j` of table `test/sample` trx id 0 42304626 lock_mode X locks rec but not gap waiting Record lock, heap no 180 PHYSICAL RECORD: n_fields 2; compact format; info bits 0 0: len 30; hex 306338386465646233353863643936633930363962373361353736383261; asc 0c88dedb358cd96c9069b73a57682a;...(truncated); 1: len 4; hex 00000005; asc ;;  TRANSACTION 0 42304624, ACTIVE 37 sec, process no 29895, OS thread id 4058032 fetching rows, thread declared inside InnoDB 3 mysql tables in use 1, locked 1 2539 lock struct(s), heap size 224576 MySQL thread id 1794751, query id 6994931 localhost root Sending data insert into test select * from sample 

As you can see INSERT... SELECT has a lot of lock structs, which means it has locked a lot of rows. "fetching rows" of course means it is still going. In this case write is done to MyISAM table so we'll not see any write activity.

Other transaction which happes to be simple primary key update is waiting on sample table for this record to be unlocked.