Monday, 19 August 2013

pt-online-schema-change - non-blocking alter for mysql tables

pt-online-schema-change is the tool we recommend for online schema changes in mysql versions 5.0.2 and later, until built-in functionality is available (possibly in mysql 5.6). It is part of YInst:percona_toolkit. Invoke it as follows: Note: Always run in a screen session!:
pt-online-schema-change [OPTIONS] 
    --user [root|USER] --password [`sudo keydbgetkey mysqlroot`|PASSWORD]
    --[dry-run|execute] --chunk-index [PRIMARY|KEYNAME] 
    --alter "[engine=InnoDB|CHANGES]"
    D=[DBNAME],t=[TABLENAME]
Note that this will NOT work on tables with triggers. It can work on tables with foreign keys, with caveats, see below. The most important options are discussed below. Full documentation can be found on Percona's Web Site. You should read the docs and test carefully before using pt-online-schema-change in production!

Restrictions

Row-based Replication

If you are trying to rebuild a table on server that has row-based slaves, you must let the change replicate (don't use the set-vars sql_log_bin option below). If you don't do this, the trigger and table creation and whatnot will not get to the slave, but the actual rows the triggers try to insert -will- get sent to the slaves, causing replication failure when it can't find the new table. We're working on a patch to make the trigger code aware of if the whole process should replicate, but for now, this is a major limitation.

Triggers

There is no support for running this on tables with triggers on them, or that have triggers anywhere in their replication stream, if this change is to be replicated. I (BYUser:wraymond) am trying to find an easy way to have it use AFTER if you have BEFORE, or vice versa, so it could theoretically work - but this is not yet ready as of writing this.

Foreign Keys

Foreign keys are supported atomically, as long as the tables to be altered are considered small enough to alter. There is a drop/swap option if they are larger, but that can lead to issues where the key is not always enforced during the change. See below on variables 'alter-foreign-keys-method' and 'chunk-size-limit' for more information.

Gap Locks

Note that if you have gap locking enabled (if tx isolation is repeatable read and innodb_locks_unsafe_for_binlog is 0, you have gap locking enabled) then the insert...select queries run by this tool will lock the table being altered during execution. If you have a large chunk time, this can have a visible impact on users. Conveniently, gap locking is only ever needed on statement baseed replication masters/relays - so if you use row based replication, or the server you're running on has no slaves, you're safe (but make sure it's configured to be off by setting innodb_locks_unsafe_for_binlog=1! It's on by default for safety). If you have it enabled, use a small enough chunk time where users will not notice.

mysql versions

pt-online-schema-change does not support any mysql version earlier than 5.0.2.

Bugs

Known client library issue: If you encounter an error like "DBD::mysql::st execute failed: called with 2 bind variables when 3 are needed", make sure you're running the latest version of perl-DBD-mysqlXX. The bug seems to have been fixed by version 4.020 (e.g. perl-DBD-mysql51-4.020_02 - see BUG:6253924). If you encounter other issues, check percona's documentation or the launchpad project site for the latest info, or if you think you've found a bug.

Options

--alter "[engine=InnoDB|CHANGES]"

This is what tells the script what you plan to do. Syntax is the same as a mysql alter table command, without the "ALTER TABLE table_name " prefix. Multiple changes, comma separated, are expected. If you're just rebuilding a table, use "engine=InnoDB"...else define it fully, e.g. "drop primary key, add primary key (report_id), add column user_id int unsigned not null, drop column created_date" - if you are changing the primary key like this, though, please see the bit on the chunk-index variable below.

--alter-foreign-keys-method [rebuild_constraints|auto]

This, set to 'rebuild_constraints,' tells the script to always rebuild the constraints atomically, rather than dropping them and trying to add them back. It is the only truly reliable way to handle tables involved in explicit foreign key relationships, however it can cause the script to fail if the tables to change are too large to safely alter. If it fails, and you are okay with a non-atomic change, you may try again with 'auto' which will fall back to constraint drop/swap in case of failure. Of course, we recommend just avoiding reliance on these constraints altogether, as besides being difficult here, they can hurt query performance.

--charset utf8

We recommend using this, which tells the script to run "set names utf8" on connect, to avoid possible charset compatibility issues.

--check-interval 1

If this is running on the master of a replicated environment, you should set this to 1, so it polls slaves to check delay every second. This will ensure that running this does not cause system-wide replication delays, though it is not required.

--chunk-index [PRIMARY|KEYNAME]

I always recommend using the primary key as the chunk index, for performance reasons, however if you are changing the primary key, using one that partially or fully matches the new key will result in a more optimized new table...or if the table for some reason has NO primary key (and you are adding one, hopefully!), you will also need to pick a secondary key, again preferably matching the new primary.

--chunk-size-limit 255

Set this high to allow the script to work on related tables that are relatively large, rather than fail. If you are very worried about possible table locks, set this smaller, but you may miss some data and have to redo this.

--chunk-time 0.25

This sets a goal time for each chunk to execute in, in seconds. Setting this to a low value, like the recommended 0.25, will reduce potential impact - either with replication lag on a replicated system, or with row locking on a system with gap locks enabled. Setting it higher, if you aren't worried about those things, will decrease the overall execute time, but most systems here are replicated or have gap locks and would prefer the low impact approach.

--critical-load Threads_running=1000

This sets a condition where the script will kill itself rather than pause. As this is really never desirable (pause is just as effective and will not screw up your change), set this high.

--no-check-plan

Setting this makes the script skip "explain" on the queries it will run. Skipping this can help avoid data block failures, but it puts the responsibility into the users hands to ensure a good key is chosen. As that's something very important to do anyway, it's silly to also risk failures with a plan check.

--no-check-replication-filters

This turns off the failsafe that prevents it from running on any server with slaves that filter replication. This is something the user should confirm compatibility with before attempting any sort of cmr. Remove this option and dry-run it if you want the script to check for filters for you.

--no-drop-old-table

This prevents the script from dropping the old table after the new one is swapped in. I recommend setting it, so that the user executing can go manually verify the change, and preserve it for rollback. Just remember to drop it yourself, or you're wasting tons of disk space! This is not required, but it's recommended.

--password [`sudo keydbgetkey mysqlroot`|PASSWORD]

You may use a keydbgetkey call here to pull a password, or specify explicitly. To be prompted on the command line instead, use --ask-pass.

--progress time,15

This defines how the script outputs progress. For example, setting it to 'percent,5' would have it write a progress line every 5%. The recommended 'time,15' writes a line every 15 seconds, keeping the user monitoring the change well aware of the progress and able to estimate completion time.

--recursion-method [processlist|none]

This once again only applies to replicated environments. With 'processlist', it tells the script to check replication hosts connected by using the processlist - an effective way to track currently running slaves, so it can poll them for delay. Set it to 'none' if you do not care to automatically scale to reduce delay...but that's not recommended on an environment with live slaves.

--retries 255

Set this to a high number, like 255, to tell the script to retry any failed data chunks many times before giving up. As the impact is low and exec time can be long, retrying is better than failing.

--set-vars [none|wait_timeout=10000,sql_log_bin=0]

This can be useful if you need a session variable set on execution. It defaults to wait_timeout=10000, which is good. It is most useful when you don't want to replicate your alter...for example, on rebuilding a table to reduce fragmentation on a single server, particularly if that server may have slaves/relays with triggers that it can't be rebuilt on - but maybe you want to make the master faster (rhyme not intended)? It, like mysql's SET, takes a comma separated list, so to avoid replication, use "wait_timeout=10000,sql_log_bin=0" and you're all set. You must be a super user to set that variable.

--user [root|USER]

Set to either root or a permissioned user to run under. For replicated environments, a systemwide permissioned user should be chosen.

--[dry-run|execute]

This determines if the script will actually execute, or just run tests and output any warnings. A dry run is recommended before any prod execution.

Examples

In these first three examples, I'm using "engine=InnoDB" as the alter phrase, a simple rebuild. As discussed above, all forms of mysql alter are supported, multiples comma separated.

Altering an innodb table `test`.`table`

pt-online-schema-change --charset utf8 --no-check-plan \
   --chunk-size-limit 255 --chunk-time 0.25 \
   --no-drop-old-table --progress time,15 --retries 255 \
   --user root --password `sudo keydbgetkey mysqlroot` \
   --execute --chunk-index PRIMARY \
   --alter "engine=innodb" D=test,t=table

Altering a replicated `test`.`table`

This uses auto-scaling to reduce slave lag, with the variables check-interval and recursion-method. It also asserts that the user has checked compatibility with no-check-replication-filters. Note that a systemwide user must be used so the script can also connect to the slaves.
pt-online-schema-change --check-interval 1 --recursion-method processlist \
  --no-check-replication-filters --charset utf8 --no-check-plan \
  --chunk-size-limit 255 --chunk-time 0.25 \
  --no-drop-old-table --progress time,15 --retries 255 \
  --user headless_admin --password `sudo keydbgetkey headless_admin` \
  --execute --chunk-index PRIMARY \
  --alter "engine=innodb" D=test,t=table

Rebuilding a table `test`.`table` without the changes replicated

pt-online-schema-change \
   --set-vars wait_timeout=10000,sql_log_bin=0 --recursion-method none  \
   --no-check-replication-filters --charset utf8 --no-check-plan \
   --chunk-size-limit 255 --chunk-time 0.25 \
   --no-drop-old-table --progress time,15 --retries 255 \
   --user root --password `sudo keydbgetkey mysqlroot` \
   --execute --chunk-index PRIMARY \
   --alter "engine=innodb" D=test,t=table

Altering `test`.`table` with foreign key constraints

See how we set alter-foreign-keys-method for an atomic change.
pt-online-schema-change \
   --alter-foreign-keys-method rebuild_constraints \
   --charset utf8 --no-check-plan \
   --chunk-size-limit 255 --chunk-time 0.25 \
   --no-drop-old-table --progress time,15 --retries 255 \
   --user root --password `sudo keydbgetkey mysqlroot` \
   --execute --chunk-index PRIMARY \
   --alter "engine=innodb" D=test,t=table

Adding a primary key to `test`.`table`

This assumes you're adding a primary key, based on the surrogate 'KEY `should_be_pk` (`pk_col`)' - and dropping the old one while you're at it.
pt-online-schema-change \
   --charset utf8 --no-check-plan \
   --chunk-size-limit 255 --chunk-time 0.25 \
   --no-drop-old-table --progress time,15 --retries 255 \
   --user root --password `sudo keydbgetkey mysqlroot` \
   --execute --chunk-index should_be_pk \
   --alter "add primary key (pk_col), drop key should_be_pk" D=test,t=table

The mysql_online_alter package

Mysql_online_alter is a wrapper package for pt-online-schema-change, with yinst-configurable variables to set up the above conditions easily and handle remote hosts. YInst:mysql_online_alter

Pitfalls

  • Weird results with replication-filtered tables
    • CMR:610472 - largish (10s of GB) innodb table did not have a primary key and was filtered by replicate-ignore on some slaves. Additionally, some slaves were using row-based replication, while most used SBR. Most hosts that ignore the table started with the table definition but no data. The change was to add a primary key.
    • On hosts that were ignoring the original table report_request_status
      • When the change completed, the final rename never happens; the target table still has the old schema and the _new table still exists.
      • On RBR hosts that filter the target table, the _new table is (naturally) large.
      • On SBR hosts that filter the target table, the _new table should be empty if the original table was, however on one SBR slave where the original table had mistakenly not been truncated replication totally hung
    • suggestions:
      • use replicate-wild-ignore-table with patterns that will match the original table and the temporary versions of it.
      • make sure slaves don't have data in tables they're filtering; truncate them or alter them to the blackhole engine before running pt-online-schema-change. (Truncating may be preferred because the blackhole engine is nontransactional.)
      • after pt-online-schema-change has completed, update the slaves that filter this table: make sure the _new table is empty, and do a rename locally: truncate table _report_request_status_new; rename table report_request_status to _report_request_status_old, _report_request_status_new to report_request_status; This will prevent the schema on these slaves getting out of sync. If you're sure you don't care, you could just drop the _new table instead of doing the rename.

No comments:

Post a Comment