论坛首页 综合技术版 Database

mysql replication中耗时更新操作阻塞整个库更新的问题如何解决

浏览 3032 次
精华帖 (0) :: 良好帖 (0) :: 新手帖 (0) :: 隐藏帖 (0)
作者 正文
最后更新时间:2008-06-04
theone 写道
引用
这时一旦有个耗时update操作,尽管只涉及一个表,但是整个slave库的更新都会被阻塞住


如果仅仅是update单条或者几条记录,请你告诉我,为什么会耗时?


这是mysql replication slave端的实现模式,我认为可能是避免同时维护多个更新动作队列,降低复杂度,提高稳定性。再说一遍,这和mysql的update动作本身的执行模式无关。
耗时的update操作有很多,譬如alter table add index
   
0 请登录后投票
最后更新时间:2008-06-05
http://junior.vox.com/library/post/talk-9-the-future-of-mysql-replication.html

The Future of MySQL Replication
Jul 27, 2006 at 5:17 PM
2 comments

This talk is by Dr. Lars Thalmann regarding where MySQL replication is going.

First bit is about why and how replication works as it stands.  People do it for high availibility (failover), load-balancing, and having an online backup.  It's done with the standard snapshot and binlog method that everybody uses.  Hurrah, nothing new here.

New features in 5.0... auto-increment works in bi-directional replication (master-master), character set and timezone replication, stored procedure and triggers and views replication, and other advanced things such as FOREIGN_KEY_CHECKS, UNIQUE_KEY_CHECKS, SQL_AUTO_IS_NULL, and SQL_MODE.

New in 5.1... row-based logging and replication, dynamic switching of log format (between statement, default, and row), MySQL Cluster replication, etc.  More detail coming up.

Fairly simple, but now MySQL allows you to change the auto_increment offset and increment on a server, so that one will generate all the odd numbers, one all the event.  For more complex setups, you can make them do every third number, or every hundredth, or whatever.

The bigger change is that MySQL 5.1 allows row-based replication (RBR).  Usually, entire statements are written to the binlog on the master, then the slave reads the statements, and executes them.  Row-based goes into the binlog straight from the storge engine.  Using RBR, it is possible to replicate when you're using such features as LOAD_FILE(), UUID(), USER(), FOUND_ROWS(), and UDFs.  Statement-based replication (SBR) is the default, is proven technology, and lets you audit the queries being executed.  This also allows you to have different table definitions on the master and slave.

Also in 5.1 are four new binlog events.  A table map event, stating, "this table id maps this table definition".  Then there are three row change commands: binwrite, binupdate, and bindelete.  Oh neat!  Using this RBR and if you have primary keys on your tables, the binlogs are idempotent.  You can run the same log against the slave multiple times, and it won't hurt it.  (Useful in some recovery situations - just replay all the logs! It won't hurt! No more calculating where we left off, etc etc.   I like that.)

The format of the binlog doesn't seem to have changed.  Same 19 byte common header.  Then the new binlog event types.  Moving on.  RBR also provides lots of potential optimizations.  If the master had to create a temporary table and do an hour of sorting to get the data to insert, the slave won't have to do that work.  It just replicates the final results of the query from the master.  This also means that if you do something like "INSERT INTO t1 SELECT FROM t2" then the table t2 doesn't have to exist on the slave.

Another advantage of RBR is that now MySQL Cluster (the in-memory HA setup) support replication.  You can now replicate from one cluster to another.  This enables you to have your machines in lock-step, synchronously replicating the data, so every machine is guaranteed up to date.  Then you can asynchronously replicate the data to another setup, such as having one cluster in the US, one in Japan, providing HA at each location, and yet keep them relatively up to date with eachother.

MySQL Cluster is looking very interesting now.  The target for the replication doesn't even have to be Cluster, you can have a HA MySQL Cluster system, and replicate it to a InnoDB/MyISAM system for backup purposes or whatever else you want.

Now a huge slide with no less than 30 arrows describing how MySQL Cluster works.  Yikes.  Pardon me for not getting into detail - the slide's gone now so I don't really have a chance to type it up.

Another new thing, the binlog now has an "injector interface" which enables you to inject events into the binlog.  He didn't provide much information, but this seems interesting for people who are writing plugins for MySQL.

A neat trick you can use is to have the master write SBRs and then set the slave to write RBRs, thereby converting them.  Not sure what you'd want to do that for, but he suggested it.  Heh.

Next steps is multi-source replication, online backup, conflict detection and resolution, and automatic failover.  That is where MySQL is going next.

------------------------------------------

虽然没有我原先期望的以表为单位处理队列,不过有了 row-based logging and replication,slave中不需要再次执行update操作中的where查询,也算侧面的解决了部分问题,只是还不彻底,而且,这已经是06年的blog了...到现在还没看到实现(纠正,刚刚看到5.1.5的文档中已经包含了RBR,我孤陋寡闻了...)

idempotent更新很像是参考了ice的slice,别处还没看到过类似定义
   
0 请登录后投票
最后更新时间:2008-06-05
mysql的replication读binlog的时候是一个一个执行的,当你一条执行没有完成,不会执行另外一条。
   
0 请登录后投票
最后更新时间:2008-06-06
ztka 写道
mysql的replication读binlog的时候是一个一个执行的,当你一条执行没有完成,不会执行另外一条。


是,只是我很奇怪为什么没有实现成多线程更新,结合update锁表机制,多线程更新理应不会有数据歧义性问题。
mysql 5.1的RBR说明中提到“基于RBR,也给replication slave端多线程更新机制提供了基础”,可见他们也是想这么做的,只是出于什么原因没有实现。
这两天正在看5.0的repl代码,基本上只涉及到3个cpp文件,揣摩中。
   
0 请登录后投票
最后更新时间:2008-06-06
这不是多线程那么简单的,如果你的下一句sql依赖这句sql,那么你如果不执行完这句就要出错的。
   
0 请登录后投票
最后更新时间:2008-06-06
ztka 写道
这不是多线程那么简单的,如果你的下一句sql依赖这句sql,那么你如果不执行完这句就要出错的。


如果上一句sql update没有执行完毕,被操作的表自然会处于lock状态,下一句update中如果有依赖被操作表的数据,读取时也会被写操作阻塞直到上一句完成。 怎么会出错?
   
0 请登录后投票
最后更新时间:2008-07-23
并发会有问题,比如有3个update:A B C, 都是更新一个表,并且有依赖,必须按照A B C 顺序。
Slave上,A先执行,此时表被lock,B C 线程运行阻塞。
过了一会,A 执行完毕,然后B C 苏醒,此时只会有一个线程执行B 或 C ,如果C 先执行了,
结果就违反了依赖顺序。
   
0 请登录后投票
最后更新时间:2008-07-24
luxbob 写道
并发会有问题,比如有3个update:A B C, 都是更新一个表,并且有依赖,必须按照A B C 顺序。
Slave上,A先执行,此时表被lock,B C 线程运行阻塞。
过了一会,A 执行完毕,然后B C 苏醒,此时只会有一个线程执行B 或 C ,如果C 先执行了,
结果就违反了依赖顺序。


对于同一个表,当然还是要维持一个fifo的执行队列,我所设想的方案也是这样的。
目的是避免阻塞其他的无关或者可以说是无辜的表的更新。
   
0 请登录后投票
最后更新时间:2008-07-24
pi1ot 写道
luxbob 写道
并发会有问题,比如有3个update:A B C, 都是更新一个表,并且有依赖,必须按照A B C 顺序。
Slave上,A先执行,此时表被lock,B C 线程运行阻塞。
过了一会,A 执行完毕,然后B C 苏醒,此时只会有一个线程执行B 或 C ,如果C 先执行了,
结果就违反了依赖顺序。


对于同一个表,当然还是要维持一个fifo的执行队列,我所设想的方案也是这样的。
目的是避免阻塞其他的无关或者可以说是无辜的表的更新。


如果不同表依赖关系呢?
   
0 请登录后投票
最后更新时间:2008-07-24
ztka 写道
pi1ot 写道
luxbob 写道
并发会有问题,比如有3个update:A B C, 都是更新一个表,并且有依赖,必须按照A B C 顺序。
Slave上,A先执行,此时表被lock,B C 线程运行阻塞。
过了一会,A 执行完毕,然后B C 苏醒,此时只会有一个线程执行B 或 C ,如果C 先执行了,
结果就违反了依赖顺序。


对于同一个表,当然还是要维持一个fifo的执行队列,我所设想的方案也是这样的。
目的是避免阻塞其他的无关或者可以说是无辜的表的更新。


如果不同表依赖关系呢?


不同表的执行顺序依赖,即使在master端跑起来也是无法预料结果的,所以应该不算是replication本身的问题吧
   
0 请登录后投票
论坛首页 综合技术版 Database

跳转论坛:
JavaEye推荐