知数堂公开课-《MySQL Group Replication思想》

activitywubx 发表了文章 • 1 个评论 • 1050 次浏览 • 2016-12-26 18:06 • 来自相关话题

《MySQL Group Replication思想》
  2017.1.5【暂定】
  古雷 CNTV资深MySQL专家
 
    本次分享是知数堂2017年首次分享,感谢古大师支持,知数堂在学习古大师做人的风格,让知数堂能帮助更多的朋友。

1、主题
    《MySQL Group Replication思想》
 
2、嘉宾介绍
        古雷  CNTV资深MySQL专家。曾在搜狐等公司DBA团队工作,积累了丰富的MySQL运维经验。喜欢研究MySQL源码,把研究成果用于日常运维中,并且经常在MySQL的活动中分享出来。
 
3、课程介绍
 
      MySQL的Group Replication(简称GR)一经推出就备受瞩目。可以说Oracle终于在MySQL的高可用技术上出手了。GR的数据同步可以称为虚拟同步,借鉴了paxos协议。本次的公开课旨在讲解GR实现虚拟同步的思路,如何结合异步复制的机制达到各个数据库节点数据一致。人们在接受新的名词(新概念)时,需要不断熟悉其所在的上下文,才能逐渐了解新名词的真正含义,是一个比较缓慢的过程。因此,如果采用打比方的方式,利用人们已有的经验,更容易让人们一下子掌握新概念的核心思想,而不用去管那一堆新的名字(这组件那组件,这协议那协议)。所以本次大家将要听到的,不是《MySQL Group Replication思想》,而是《马农、砖家、运韦合伙放鸽子种地的故事》。
 
4. 分享方式

分享时间:2017.1.5(周四) 晚上20:30 - 21:30

分享方式:YY语音直播,在QQ&微信群发送PPT等图文内容

本次公开课报名: 大咖说直播平台

请提前加入知数堂技术交流QQ群:529671799(加群暗号:知数堂),或扫描识别下面二维码加入。





 
 
  查看全部
古雷.jpeg

《MySQL Group Replication思想》
  2017.1.5【暂定】
  古雷 CNTV资深MySQL专家
 
    本次分享是知数堂2017年首次分享,感谢古大师支持,知数堂在学习古大师做人的风格,让知数堂能帮助更多的朋友。

1、主题
    《MySQL Group Replication思想》
 
2、嘉宾介绍
        古雷  CNTV资深MySQL专家。曾在搜狐等公司DBA团队工作,积累了丰富的MySQL运维经验。喜欢研究MySQL源码,把研究成果用于日常运维中,并且经常在MySQL的活动中分享出来。
 
3、课程介绍
 
      MySQL的Group Replication(简称GR)一经推出就备受瞩目。可以说Oracle终于在MySQL的高可用技术上出手了。GR的数据同步可以称为虚拟同步,借鉴了paxos协议。本次的公开课旨在讲解GR实现虚拟同步的思路,如何结合异步复制的机制达到各个数据库节点数据一致。人们在接受新的名词(新概念)时,需要不断熟悉其所在的上下文,才能逐渐了解新名词的真正含义,是一个比较缓慢的过程。因此,如果采用打比方的方式,利用人们已有的经验,更容易让人们一下子掌握新概念的核心思想,而不用去管那一堆新的名字(这组件那组件,这协议那协议)。所以本次大家将要听到的,不是《MySQL Group Replication思想》,而是《马农、砖家、运韦合伙放鸽子种地的故事》。
 
4. 分享方式

分享时间:2017.1.5(周四) 晚上20:30 - 21:30

分享方式:YY语音直播,在QQ&微信群发送PPT等图文内容

本次公开课报名: 大咖说直播平台

请提前加入知数堂技术交流QQ群:529671799(加群暗号:知数堂),或扫描识别下面二维码加入。

二群.jpeg

 
 
 

来来来,美女DBA教你MySQL高可用架构选型

activitywubx 发表了文章 • 0 个评论 • 954 次浏览 • 2016-12-25 22:40 • 来自相关话题

知数堂培训公开课《MySQL体系结构及常见高可用架构》








2016.12.29

知数堂培训公开课《MySQL体系结构及常见高可用架构》

1、主题

《MySQL体系结构及常见高可用架构》

2、嘉宾介绍

杭星,MySQL OCP,某电信运营商数据库高级主管,IT相关专业硕士,金奖台讲师,多次获得各类奖项和荣誉称号。

负责MySQL架构设计、部署配置、运行维护及自动化提升工作,具备MySQL扎实的理论知识和丰富的实战运维经验。


3、课程大纲

(1)MySQL体系结构

(2)MySQL存储引擎

(3)MySQL常见高可用架构

(4)MySQL自动化运维场景




分享方式

分享时间:2016.12.29(周四) 晚上20:30 - 21:30

分享方式:YY语音直播,在QQ&微信群发送PPT等图文内容

YY频道:53695719(需提前安装YY客户端,支持windows/ios/andriod多平台)


请提前加入知数堂技术交流QQ群:529671799(加群暗号:知数堂),或扫描识别下面二维码加入。 查看全部
知数堂培训公开课《MySQL体系结构及常见高可用架构》


杭星-3.jpg



2016.12.29

知数堂培训公开课《MySQL体系结构及常见高可用架构》

1、主题

《MySQL体系结构及常见高可用架构》

2、嘉宾介绍

杭星,MySQL OCP,某电信运营商数据库高级主管,IT相关专业硕士,金奖台讲师,多次获得各类奖项和荣誉称号。

负责MySQL架构设计、部署配置、运行维护及自动化提升工作,具备MySQL扎实的理论知识和丰富的实战运维经验。


3、课程大纲

(1)MySQL体系结构

(2)MySQL存储引擎

(3)MySQL常见高可用架构

(4)MySQL自动化运维场景




分享方式

分享时间:2016.12.29(周四) 晚上20:30 - 21:30

分享方式:YY语音直播,在QQ&微信群发送PPT等图文内容

YY频道:53695719(需提前安装YY客户端,支持windows/ios/andriod多平台)


请提前加入知数堂技术交流QQ群:529671799(加群暗号:知数堂),或扫描识别下面二维码加入。

二群.jpeg

sql 优化の order by select col 产生的影响

MySQLqidan3500 发表了文章 • 2 个评论 • 1061 次浏览 • 2016-12-21 15:04 • 来自相关话题

先自我 介绍一下 我来自吉林 延边
现在 韩国 从事oracle tibero(kdb) dba
为了 进一步提高自己 跟着两位老师 学了 MySQL

一直想写一篇 相对有内容的 文章 趁着这几天 相对比较闲 写了 这篇

create table t1 (
a1 bigint  not null primary key auto_increment,
c10 char(10),
c100 char(100),
c200 char(200),
c30 char(255),
c300 text
)

insert into t1 (c10,c100,c200,c30,c300)values ('a','a','a','a','a');

insert into t1 (c10,c100,c200,c30,c300) select lpad(c10,10,'1'),lpad(c100,100,'1'),lpad(c200,190,'a'),lpad(c30,250,'1'),lpad(c300,2000,'1') from t1;


root@localhost [test]>select count(a1) from t1;
+-----------+
| count(a1) |
+-----------+
|     65536 |

select SQL_NO_CACHE count(1) from (
       select a1 from t1 IGNORE INDEX (primary)   order by c10 desc  limit  1000 ) a;


select SQL_NO_CACHE count(1) from (
       select a1,c100 from t1 IGNORE INDEX (primary)   order by c10 desc  limit  1000 ) a;

select SQL_NO_CACHE count(1) from (
       select a1,c100,c200 from t1 IGNORE INDEX (primary)   order by c10 desc  limit  1000 ) a;
   
   
select SQL_NO_CACHE count(1) from (
       select a1,c100,c200,c30 from t1 IGNORE INDEX (primary)   order by c10 desc  limit  1000 ) a; 

select SQL_NO_CACHE count(1) from (
       select a1,c100,c200,c30 ,c300 from t1 IGNORE INDEX (primary)   order by c10 desc  limit  1000 ) a;    

select SQL_NO_CACHE count(1) from (
       select * from t1 IGNORE INDEX (primary)   order by c10 desc  limit  1000 ) a;
   
   








从上面的实验 结果都是只执行了 一遍 要是取得相对准确值 可以 多执行几次 

但我们 可以从最上面和 最后一条 可以看出 一个在 3.* 秒 一个在6.* 秒

说明 有排序的时候 影响 性能 的 不仅仅是 order by 后面的列 因为 都使用了 tmp 表 而加入tmp表的内容是select 当中 列 然后根据 order by 当中的 列 进行 排序 所以用的tmp 值是select col 的和 * rows  

大家也可以从 show profile 中 发现  Creating sort index  是上面的 执行性能 重点 进一步论证了 上面的观点
 
不足的地方 望大家 指正 查看全部
先自我 介绍一下 我来自吉林 延边
现在 韩国 从事oracle tibero(kdb) dba
为了 进一步提高自己 跟着两位老师 学了 MySQL

一直想写一篇 相对有内容的 文章 趁着这几天 相对比较闲 写了 这篇

create table t1 (
a1 bigint  not null primary key auto_increment,
c10 char(10),
c100 char(100),
c200 char(200),
c30 char(255),
c300 text
)

insert into t1 (c10,c100,c200,c30,c300)values ('a','a','a','a','a');

insert into t1 (c10,c100,c200,c30,c300) select lpad(c10,10,'1'),lpad(c100,100,'1'),lpad(c200,190,'a'),lpad(c30,250,'1'),lpad(c300,2000,'1') from t1;


root@localhost [test]>select count(a1) from t1;
+-----------+
| count(a1) |
+-----------+
|     65536 |

select SQL_NO_CACHE count(1) from (
       select a1 from t1 IGNORE INDEX (primary)   order by c10 desc  limit  1000 ) a;


select SQL_NO_CACHE count(1) from (
       select a1,c100 from t1 IGNORE INDEX (primary)   order by c10 desc  limit  1000 ) a;

select SQL_NO_CACHE count(1) from (
       select a1,c100,c200 from t1 IGNORE INDEX (primary)   order by c10 desc  limit  1000 ) a;
   
   
select SQL_NO_CACHE count(1) from (
       select a1,c100,c200,c30 from t1 IGNORE INDEX (primary)   order by c10 desc  limit  1000 ) a; 

select SQL_NO_CACHE count(1) from (
       select a1,c100,c200,c30 ,c300 from t1 IGNORE INDEX (primary)   order by c10 desc  limit  1000 ) a;    

select SQL_NO_CACHE count(1) from (
       select * from t1 IGNORE INDEX (primary)   order by c10 desc  limit  1000 ) a;
   
   

QQ




从上面的实验 结果都是只执行了 一遍 要是取得相对准确值 可以 多执行几次 

但我们 可以从最上面和 最后一条 可以看出 一个在 3.* 秒 一个在6.* 秒

说明 有排序的时候 影响 性能 的 不仅仅是 order by 后面的列 因为 都使用了 tmp 表 而加入tmp表的内容是select 当中 列 然后根据 order by 当中的 列 进行 排序 所以用的tmp 值是select col 的和 * rows  

大家也可以从 show profile 中 发现  Creating sort index  是上面的 执行性能 重点 进一步论证了 上面的观点
 
不足的地方 望大家 指正

pt-archiver 不会迁移max(id)那条数据

MySQLHH 发表了文章 • 3 个评论 • 778 次浏览 • 2016-12-21 14:27 • 来自相关话题

工具版本: pt-archiver 2.2.14
源表名: ord_order
目标表名: ord_order_1
表结构相同:root@test 15:09:54>show create table ord_order \G
*************************** 1. row ***************************
Table: ord_order
Create Table: CREATE TABLE `ord_order` (
`order_id` int(11) NOT NULL AUTO_INCREMENT COMMENT '订单ID',
`amount` int(11) NOT NULL DEFAULT '0' COMMENT '订单金额',
`create_time` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT '创建时间',
`pay_type` tinyint(4) NOT NULL DEFAULT '1' COMMENT '支付类型',
`serial_num` varchar(16) DEFAULT NULL COMMENT '余额交易流水号',
PRIMARY KEY (`order_id`),
KEY `idx$pay_type` (`pay_type`)
) ENGINE=InnoDB AUTO_INCREMENT=185 DEFAULT CHARSET=utf8mb4 COMMENT='订单表'
1 row in set (0.00 sec)使用如下语句始终会有一条 数据迁移不了pt-archiver \
--source h=127.0.0.1,D=test,t=ord_order,u=root,p=oracle \
--dest h=127.0.0.1,D=test,t=ord_order_1,u=root,p=oracle \
--where '1=1' \
--no-check-charset \
--limit=10000 \
--progress=10000 \
--statistics迁移完后查看数据,还有一条数据存在root@test 15:11:40>SELECT * FROM ord_order;
+----------+--------+---------------------+----------+------------+
| order_id | amount | create_time | pay_type | serial_num |
+----------+--------+---------------------+----------+------------+
| 184 | 0 | 2016-11-17 10:58:33 | 12 | NULL |
+----------+--------+---------------------+----------+------------+
1 row in set (0.00 sec)无奈之下只能打开 general_log 并且再次运行上面 pt-archiver 并查看日志34 Query SELECT MAX(`order_id`) FROM `test`.`ord_order`
34 Query SELECT CONCAT(@@hostname, @@port)
35 Query SELECT CONCAT(@@hostname, @@port)
34 Query SELECT /*!40001 SQL_NO_CACHE */ `order_id`,`amount`,`create_time`,`pay_type`,`serial_num` FROM `test`.`ord_order` FORCE INDEX(`PRIMARY`) WHERE (order_id > 1) AND (`order_id` < '184') ORDER BY `order_id` LIMIT 10000
34 Quit
35 Quit 可以看到主要的插叙语句,这里我们关注的SQL有SELECT MAX(`order_id`) FROM `test`.`ord_order`
SELECT /*!40001 SQL_NO_CACHE */ `order_id`,`amount`,`create_time`,`pay_type`,`serial_num` FROM `test`.`ord_order` FORCE INDEX(`PRIMARY`) WHERE (order_id > 1) AND (`order_id` < '184') ORDER BY `order_id` LIMIT 10000发现第二条语句多加了一个条件 (`order_id` < '184')

很明显这样的条件是查询不到 第 184 条记录的
这是我们只能是自行修改 pt-archiver 文件代码, 相关代码在5813行(不同版本的pt-archiver就不同)

原来:5813 $first_sql .= " AND ($col < " . $q->quote_val($val) . ")";修改后:5813 $first_sql .= " AND ($col <= " . $q->quote_val($val) . ")";修改后再次运行下面代码:pt-archiver \
--source h=127.0.0.1,D=test,t=ord_order,u=root,p=oracle \
--dest h=127.0.0.1,D=test,t=ord_order_1,u=root,p=oracle \
--where '1=1' \
--no-check-charset \
--limit=10000 \
--progress=10000 \
--statistics并查看日志:48 Query SELECT MAX(`order_id`) FROM `test`.`ord_order`
48 Query SELECT CONCAT(@@hostname, @@port)
49 Query SELECT CONCAT(@@hostname, @@port)
48 Query SELECT /*!40001 SQL_NO_CACHE */ `order_id`,`amount`,`create_time`,`pay_type`,`serial_num` FROM `test`.`ord_order` FORCE INDEX(`PRIMARY`) WHERE (1=1) AND (`order_id` <= '184') ORDER BY `order_id` LIMIT 10000
49 Query INSERT INTO `test`.`ord_order_1`(`order_id`,`amount`,`create_time`,`pay_type`,`serial_num`) VALUES ('184','0','2016-11-17 10:58:33','12',NULL)
49 Query commit
48 Query commit
48 Query SELECT /*!40001 SQL_NO_CACHE */ `order_id`,`amount`,`create_time`,`pay_type`,`serial_num` FROM `test`.`ord_order` FORCE INDEX(`PRIMARY`) WHERE (1=1) AND (`order_id` <= '184') AND ((`order_id` > '184')) ORDER BY `order_id` LIMIT 10000由上面就可以很明显的看到 (`order_id` <= '184') 是我们想要的答案了。
  查看全部
工具版本: pt-archiver 2.2.14
源表名: ord_order
目标表名: ord_order_1
表结构相同:
root@test 15:09:54>show create table ord_order \G 
*************************** 1. row ***************************
Table: ord_order
Create Table: CREATE TABLE `ord_order` (
`order_id` int(11) NOT NULL AUTO_INCREMENT COMMENT '订单ID',
`amount` int(11) NOT NULL DEFAULT '0' COMMENT '订单金额',
`create_time` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT '创建时间',
`pay_type` tinyint(4) NOT NULL DEFAULT '1' COMMENT '支付类型',
`serial_num` varchar(16) DEFAULT NULL COMMENT '余额交易流水号',
PRIMARY KEY (`order_id`),
KEY `idx$pay_type` (`pay_type`)
) ENGINE=InnoDB AUTO_INCREMENT=185 DEFAULT CHARSET=utf8mb4 COMMENT='订单表'
1 row in set (0.00 sec)
使用如下语句始终会有一条 数据迁移不了
pt-archiver \
--source h=127.0.0.1,D=test,t=ord_order,u=root,p=oracle \
--dest h=127.0.0.1,D=test,t=ord_order_1,u=root,p=oracle \
--where '1=1' \
--no-check-charset \
--limit=10000 \
--progress=10000 \
--statistics
迁移完后查看数据,还有一条数据存在
root@test 15:11:40>SELECT * FROM ord_order;
+----------+--------+---------------------+----------+------------+
| order_id | amount | create_time | pay_type | serial_num |
+----------+--------+---------------------+----------+------------+
| 184 | 0 | 2016-11-17 10:58:33 | 12 | NULL |
+----------+--------+---------------------+----------+------------+
1 row in set (0.00 sec)
无奈之下只能打开 general_log 并且再次运行上面 pt-archiver 并查看日志
34 Query     SELECT MAX(`order_id`) FROM `test`.`ord_order`
34 Query SELECT CONCAT(@@hostname, @@port)
35 Query SELECT CONCAT(@@hostname, @@port)
34 Query SELECT /*!40001 SQL_NO_CACHE */ `order_id`,`amount`,`create_time`,`pay_type`,`serial_num` FROM `test`.`ord_order` FORCE INDEX(`PRIMARY`) WHERE (order_id > 1) AND (`order_id` < '184') ORDER BY `order_id` LIMIT 10000
34 Quit
35 Quit
可以看到主要的插叙语句,这里我们关注的SQL有
SELECT MAX(`order_id`) FROM `test`.`ord_order`
SELECT /*!40001 SQL_NO_CACHE */ `order_id`,`amount`,`create_time`,`pay_type`,`serial_num` FROM `test`.`ord_order` FORCE INDEX(`PRIMARY`) WHERE (order_id > 1) AND (`order_id` < '184') ORDER BY `order_id` LIMIT 10000
发现第二条语句多加了一个条件 (`order_id` < '184')

很明显这样的条件是查询不到 第 184 条记录的
这是我们只能是自行修改 pt-archiver 文件代码, 相关代码在5813行(不同版本的pt-archiver就不同)

原来:
5813       $first_sql .= " AND ($col < " . $q->quote_val($val) . ")";
修改后:
5813       $first_sql .= " AND ($col <= " . $q->quote_val($val) . ")";
修改后再次运行下面代码:
pt-archiver \
--source h=127.0.0.1,D=test,t=ord_order,u=root,p=oracle \
--dest h=127.0.0.1,D=test,t=ord_order_1,u=root,p=oracle \
--where '1=1' \
--no-check-charset \
--limit=10000 \
--progress=10000 \
--statistics
并查看日志:
48 Query     SELECT MAX(`order_id`) FROM `test`.`ord_order`
48 Query SELECT CONCAT(@@hostname, @@port)
49 Query SELECT CONCAT(@@hostname, @@port)
48 Query SELECT /*!40001 SQL_NO_CACHE */ `order_id`,`amount`,`create_time`,`pay_type`,`serial_num` FROM `test`.`ord_order` FORCE INDEX(`PRIMARY`) WHERE (1=1) AND (`order_id` <= '184') ORDER BY `order_id` LIMIT 10000
49 Query INSERT INTO `test`.`ord_order_1`(`order_id`,`amount`,`create_time`,`pay_type`,`serial_num`) VALUES ('184','0','2016-11-17 10:58:33','12',NULL)
49 Query commit
48 Query commit
48 Query SELECT /*!40001 SQL_NO_CACHE */ `order_id`,`amount`,`create_time`,`pay_type`,`serial_num` FROM `test`.`ord_order` FORCE INDEX(`PRIMARY`) WHERE (1=1) AND (`order_id` <= '184') AND ((`order_id` > '184')) ORDER BY `order_id` LIMIT 10000
由上面就可以很明显的看到 (`order_id` <= '184') 是我们想要的答案了。
 

innodb_rollback_on_timeout参数对锁的影响

MySQLhanchangyue 发表了文章 • 3 个评论 • 1560 次浏览 • 2016-12-21 10:51 • 来自相关话题

前提提要:
MySQL版本:5.6.21-log MySQL Community Server (GPL)
innodb_rollback_on_timeout是啥作用?
答:事务B在锁等待超时后是回滚事务内所有的statement还是最后一条语句;
0表示rollback最后一条语句,默认值;有点坑
1表示回滚事务B内所有的statements;
此参数是只读参数,需在my.cnf中配置,并且重启生效;
注意:回滚statements后不自动commit或rollback事务;坑表结构:mysql> show create table t12\G;
*************************** 1. row ***************************
Table: t12
Create Table: CREATE TABLE `t12` (
`a` int(10) unsigned NOT NULL AUTO_INCREMENT,
`b` varchar(766) DEFAULT NULL,
`c` int(11) DEFAULT NULL,
PRIMARY KEY (`a`),
KEY `b` (`b`)
) ENGINE=InnoDB AUTO_INCREMENT=16 DEFAULT CHARSET=latin1
row in set (0.00 sec)实验一:
innodb_rollback_on_timeout开启的情况下,锁等待超时后,该事务所持有锁会一并释放;参数配置:
mysql> show variables like 'innodb_rollback_on_timeout';
+----------------------------+-------+
| Variable_name | Value |
+----------------------------+-------+
| innodb_rollback_on_timeout | ON |
+----------------------------+-------+
row in set (0.00 sec)

mysql> show variables like 'tx_iso%';
+---------------+-----------------+
| Variable_name | Value |
+---------------+-----------------+
| tx_isolation | REPEATABLE-READ |
+---------------+-----------------+
row in set (0.00 sec)

mysql> show variables like 'autocommit';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| autocommit | ON |
+---------------+-------+
row in set (0.00 sec)过程:


实验二:innodb_rollback_on_timeout关闭的情况下,锁等待超时后事务持有的锁不会释放;参数配置:
mysql> show variables like 'innodb_rollback_on_timeout';
+----------------------------+-------+
| Variable_name | Value |
+----------------------------+-------+
| innodb_rollback_on_timeout | OFF |
+----------------------------+-------+
row in set (0.00 sec)

mysql> show variables like 'tx_iso%';
+---------------+-----------------+
| Variable_name | Value |
+---------------+-----------------+
| tx_isolation | REPEATABLE-READ |
+---------------+-----------------+
row in set (0.00 sec)

mysql> show variables like 'autocommit';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| autocommit | ON |
+---------------+-------+
row in set (0.00 sec)过程:


总结:1、关闭innodb_rollback_on_timeout后,一旦以begin;start transaction;等语句开启一个事务,当锁等待超时后,该事务请求的锁将不释放,直到事务提交或回滚或会话超时;

所以autocommit参数建议设置成ON,只要程序没有显示开启事务,就可以避免上述锁未释放问题。

2、开启innodb_rollback_on_timeout后,一旦锁等待超时,是事务内sql将全部回滚,且释放之前请求的锁。

3、当autocommit=on,只要不显示开启事务,将不存在上面2个问题,即锁的问题和回滚的问题。
  查看全部
前提提要:
MySQL版本:5.6.21-log MySQL Community Server (GPL)
 
innodb_rollback_on_timeout是啥作用?
答:事务B在锁等待超时后是回滚事务内所有的statement还是最后一条语句;
0表示rollback最后一条语句,默认值;有点坑
1表示回滚事务B内所有的statements;
此参数是只读参数,需在my.cnf中配置,并且重启生效;
注意:回滚statements后不自动commit或rollback事务;坑
表结构:
mysql> show create table t12\G;
*************************** 1. row ***************************
Table: t12
Create Table: CREATE TABLE `t12` (
`a` int(10) unsigned NOT NULL AUTO_INCREMENT,
`b` varchar(766) DEFAULT NULL,
`c` int(11) DEFAULT NULL,
PRIMARY KEY (`a`),
KEY `b` (`b`)
) ENGINE=InnoDB AUTO_INCREMENT=16 DEFAULT CHARSET=latin1
row in set (0.00 sec)
实验一:
innodb_rollback_on_timeout开启的情况下,锁等待超时后,该事务所持有锁会一并释放;
参数配置:
mysql> show variables like 'innodb_rollback_on_timeout';
+----------------------------+-------+
| Variable_name | Value |
+----------------------------+-------+
| innodb_rollback_on_timeout | ON |
+----------------------------+-------+
row in set (0.00 sec)

mysql> show variables like 'tx_iso%';
+---------------+-----------------+
| Variable_name | Value |
+---------------+-----------------+
| tx_isolation | REPEATABLE-READ |
+---------------+-----------------+
row in set (0.00 sec)

mysql> show variables like 'autocommit';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| autocommit | ON |
+---------------+-------+
row in set (0.00 sec)
过程:


实验二:innodb_rollback_on_timeout关闭的情况下,锁等待超时后事务持有的锁不会释放;
参数配置:
mysql> show variables like 'innodb_rollback_on_timeout';
+----------------------------+-------+
| Variable_name | Value |
+----------------------------+-------+
| innodb_rollback_on_timeout | OFF |
+----------------------------+-------+
row in set (0.00 sec)

mysql> show variables like 'tx_iso%';
+---------------+-----------------+
| Variable_name | Value |
+---------------+-----------------+
| tx_isolation | REPEATABLE-READ |
+---------------+-----------------+
row in set (0.00 sec)

mysql> show variables like 'autocommit';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| autocommit | ON |
+---------------+-------+
row in set (0.00 sec)
过程:


总结:1、关闭innodb_rollback_on_timeout后,一旦以begin;start transaction;等语句开启一个事务,当锁等待超时后,该事务请求的锁将不释放,直到事务提交或回滚或会话超时;

所以autocommit参数建议设置成ON,只要程序没有显示开启事务,就可以避免上述锁未释放问题。

2、开启innodb_rollback_on_timeout后,一旦锁等待超时,是事务内sql将全部回滚,且释放之前请求的锁。

3、当autocommit=on,只要不显示开启事务,将不存在上面2个问题,即锁的问题和回滚的问题。
 

苏州招MySQL DBA岗

回复

MySQLhanchangyue 发起了问题 • 1 人关注 • 0 个回复 • 1041 次浏览 • 2016-12-21 10:30 • 来自相关话题

深圳某大型电商招聘高级MySQL DBA

MySQLpengyuanwen163 发表了文章 • 2 个评论 • 665 次浏览 • 2016-12-21 10:03 • 来自相关话题

工作地点:深圳
学历:大专
工作经验:5-10年
待遇:¥20k-35k

职位描述
工作职责:
1. 负责维护数据库的日常变更运营,备份、恢复、扩容,数据迁移和安全管理;
2. 负责数据库的性能分析与系统优化、问题跟踪与管理;
3. 负责数据库运维流程的探索、工作范畴与深度方面的文档建设;
4. 负责分析业务发展需要的资源趋势,并做出相应规划。

工作要求:
1. 5年以后DBA相关互联网企业工作经验。
2. 精通mysql数据库体系架构设计及高可用读写分离与负载均衡解决方案。
3. 精通mysql数据库备份与恢复,以及主从复制原理和实践。
4. 数据数据库的性能优化、sql调优,对锁和事物隔离问题有深入理解。
5. 精通大规模linux环境下数据库运营和维护。
6. 高度的责任心,良好的沟通技巧和团队合作精神。
7. 有阿里、京东等工作经验优先 查看全部
工作地点:深圳
学历:大专
工作经验:5-10年
待遇:¥20k-35k

职位描述
工作职责:
1. 负责维护数据库的日常变更运营,备份、恢复、扩容,数据迁移和安全管理;
2. 负责数据库的性能分析与系统优化、问题跟踪与管理;
3. 负责数据库运维流程的探索、工作范畴与深度方面的文档建设;
4. 负责分析业务发展需要的资源趋势,并做出相应规划。

工作要求:
1. 5年以后DBA相关互联网企业工作经验。
2. 精通mysql数据库体系架构设计及高可用读写分离与负载均衡解决方案。
3. 精通mysql数据库备份与恢复,以及主从复制原理和实践。
4. 数据数据库的性能优化、sql调优,对锁和事物隔离问题有深入理解。
5. 精通大规模linux环境下数据库运营和维护。
6. 高度的责任心,良好的沟通技巧和团队合作精神。
7. 有阿里、京东等工作经验优先

GR 搭建与参数解析

MySQLTottizhu 发表了文章 • 0 个评论 • 894 次浏览 • 2016-12-21 10:01 • 来自相关话题

引入组复制,是为了解决传统复制和半同步复制可能产生数据不一致的问题。组复制依靠分布式一致性协议(Paxos协议的变体),实现了分布式下数据的强一致性,提供了真正的数据高可用方案(是否真正高可用还有待商榷)。其提供的多写方案,给我们实现多活方案带来了希望。一个replication group由若干个节点(数据库实例)组成,组内各个节点维护各自的数据副本(Share Nothing),通过一致性协议实现原子消息和全局有序消息,来实现组内实例数据的强一致。

MySQL 5.7 Group Replication 安装步骤:
一台机器上安装三个实例:

第一步:

mkdir data
mysql-5.7/bin/mysqld --initialize-insecure --basedir=$PWD/mysql-5.7 --datadir=$PWD/data/s1
mysql-5.7/bin/mysqld --initialize-insecure --basedir=$PWD/mysql-5.7 --datadir=$PWD/data/s2
mysql-5.7/bin/mysqld --initialize-insecure --basedir=$PWD/mysql-5.7 --datadir=$PWD/data/s3


第二步:
 

Configuring an Instance for Group Replication
 
 [mysqld]

# server configuration
datadir=<full_path_to_data>/data/s1
basedir=<full_path_to_bin>/mysql-5.7/

port=24801
socket=<full_path_to_sock_dir>/s1.sock


#Replication Framework
#The following settings configure replication according to the MySQL Group Replication requirements.
#以下配置是 GR 要求的基础配置:

server_id=1
gtid_mode=ON
enforce_gtid_consistency=ON
master_info_repository=TABLE
relay_log_info_repository=TABLE
binlog_checksum=NONE
log_slave_updates=ON
log_bin=binlog
binlog_format=ROW
 

#Group Replication Settings

#At this point the my.cnf file ensures that the server is configured and is instructed to instantiate the replication infrastructure under a given configuration. The following section configures the Group Replication settings 
# for the server.
#这部分是GR集群第一台机器的要求的配置信息。由于翻译水平有限,直接把这些配置的英文注释信息贴出来了。

#Line 1 instructs the server that for each transaction it has to collect the write set and encode it as a hash using the XXHASH64 hashing algorithm.

transaction_write_set_extraction=XXHASH64
#Line 2 tells the plugin that the group that it is joining, or creating, is named "aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa".
loose-group_replication_group_name="aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa"   

#Line 3 instructs the plugin to not start operations automatically when the server starts.      loose-group_replication_start_on_boot=off
 
#Line 4 tells the plugin to use the IP address 127.0.0.1, or localhost, and port 24901 for incoming connections from other members in the group.
loose-group_replication_local_address= "127.0.0.1:24901"
 
#Line 5 tells the plugin that the following members on those hosts and ports should be contacted in case it needs to join the group. These are seed members, which are used when this member wants to connect to the group. 
loose-group_replication_group_seeds= "127.0.0.1:24901,127.0.0.1:24902,127.0.0.1:24903"
 
#Line 6 instructs the plugin whether to boostrap the group or not.
loose-group_replication_bootstrap_group= off
 
警告:
GR集群无法同时进行多个成员的加入。
Creating a group and joining multiple members at the same time is not supported. It may work, but chances are that the operations race and then the act of joining the group ends up in an error or a time out.

重要信息:
loose-group_replication_bootstrap_group= off 这个配置必须是在集群只有一台机器的情况下使用。(通常是初始化集群建立或者整个集群宕机,然后重新进行启动恢复的情况)
Important    (loose-group_replication_bootstrap_group= off)
This option must only be used on one server instance at any time, usually the first time you bootstrap the group (or in case the entire group is brought down and back up again). If you bootstrap the group multiple times, for example when multiple server instances have this option set, then they could create an artificial split brain scenario, in which two distinct groups with the same name exist. Disable this option after the first server instance comes online.

要点:
所有成员机器的配置信息是非常类似的。你需要具体定义每台机器的以下信息。(比如服务器id,数据目录,复制成员本机IP)
Configuration for all servers in the group is quite similar. You need to change the specifics about each server (for example server_id, datadir, group_replication_local_address). This is illustrated later in this tutorial.
 
 

 第三步:
 
#Start the server:

mysql-5.7/bin/mysqld --defaults-file=data/s1/s1.cnf

#Disable binary logging, create a user with the correct permissions and save the credentials for the Group Replication recovery channel.

SET SQL_LOG_BIN=0;
CREATE USER rpl_user@'%';
GRANT REPLICATION SLAVE ON *.* TO rpl_user@'%' IDENTIFIED BY 'rpl_pass';
FLUSH PRIVILEGES;
SET SQL_LOG_BIN=1;
CHANGE MASTER TO MASTER_USER='rpl_user', MASTER_PASSWORD='rpl_pass' FOR CHANNEL 'group_replication_recovery';
 
重要提示:

#The final line seen above configures this server to use the given credentials next time it needs to recover its state from another member. Distributed recovery is the first step taken by a server that joins the group. If these credentials are not set properly, the server can not run the recovery protocol, and ultimately can not join the group.

第四步:

INSTALL PLUGIN group_replication SONAME 'group_replication.so';   
#To check that the plugin was installed successfully, issue SHOW PLUGINS; and look at the output.
# SHOW PLUGINS; 使用该命令确认复制插件是否安装成功,如下所示,代表已经成功安装。
...  
group_replication          | ACTIVE   | GROUP REPLICATION  | group_replication.so | GPL  

第五步:
SET GLOBAL group_replication_bootstrap_group=ON;
START GROUP_REPLICATION;
SET GLOBAL group_replication_bootstrap_group=OFF;

# SELECT * FROM performance_schema.replication_group_members;   该命令查询集群成员信息。
To check that the group is now created and that there is one member in it:
#For the purpose of demonstrating that the server is indeed in a group and that it is able to handle load, create a table and add some content to it.

第六步:
server2 的配置与加入:

配置文件如下:
[mysqld]

# server configuration
datadir=<full_path_to_data>/data/s2
basedir=<full_path_to_bin>/mysql-5.7/

port=24802
socket=<full_path_to_sock_dir>/s2.sock

#
# Replication configuration parameters
#
server_id=2
gtid_mode=ON
enforce_gtid_consistency=ON
master_info_repository=TABLE
relay_log_info_repository=TABLE
binlog_checksum=NONE
log_slave_updates=ON
log_bin=binlog
binlog_format=ROW

#
# Group Replication configuration
#
transaction_write_set_extraction=XXHASH64
loose-group_replication_group_name="aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa"
loose-group_replication_start_on_boot=off
loose-group_replication_local_address= "127.0.0.1:24902"
loose-group_replication_group_seeds= "127.0.0.1:24901,127.0.0.1:24902,127.0.0.1:24903"
loose-group_replication_bootstrap_group= off

启动 server2
mysql-5.7/bin/mysqld --defaults-file=data/s2/s2.cnf
 
第七步 建立复制同步账户,安装Group Rplication 插件,启动组复制:

#Then configure the recovery credentials as follows. The commands are the same as used when setting up server s1 as the user is shared within the group.

mysql> SET SQL_LOG_BIN=0;
Query OK, 0 rows affected (0,00 sec)

mysql> CREATE USER rpl_user@'%';
Query OK, 0 rows affected (0,00 sec)

mysql> GRANT REPLICATION SLAVE ON *.* TO rpl_user@'%' IDENTIFIED BY 'rpl_pass';
Query OK, 0 rows affected, 1 warning (0,00 sec)

mysql> FLUSH PRIVILEGES;
Query OK, 0 rows affected (0,00 sec)

mysql> SET SQL_LOG_BIN=1;
Query OK, 0 rows affected (0,00 sec)

mysql> CHANGE MASTER TO MASTER_USER='rpl_user', MASTER_PASSWORD='rpl_pass' \\
FOR CHANNEL 'group_replication_recovery';
Query OK, 0 rows affected, 2 warnings (0,01 sec)

#Install the Group Replication plugin and start the process of joining the server to the group. The following example installs the plugin in the same way as used while deploying server s1.

mysql> INSTALL PLUGIN group_replication SONAME 'group_replication.so';
Query OK, 0 rows affected (0,01 sec)
 
mysql> START GROUP_REPLICATION;
 
执行到这一步时,通常报以下错误信息:
2016-12-19T09:26:51.980926Z 0 [ERROR] Plugin group_replication reported: 'This member has more executed transactions than those present in the group. Local transactions: aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:1-2,
f653f65f-c378-11e6-89ed-7427eaf07fdf:1-2 > Group transactions: aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:1-5'
2016-12-19T09:26:51.980976Z 0 [Warning] Plugin group_replication reported: 'The member contains transactions not present in the group. It is only allowed to join due to group_replication_allow_local_disjoint_gtids_join option'
2016-12-19T09:26:51.980997Z 0 [Note] Plugin group_replication reported: 'Starting group replication recovery with view_id 14821349853020005:8'

依据错误提示信息:set global  group_replication_allow_local_disjoint_gtids_join = ON
然后启动组复制(start group_replication;)。
 
重要提示信息:
Unlike the previous steps that were the same as those executed on s1, here there is a difference in that you do not issue SET GLOBAL group_replication_bootstrap_group=ON; before starting Group Replication, because the group has already been created and bootstrapped by server s1. At this point server s2 only needs to be added to the already existing group.

确认第二个基点是否成功加入:
#Checking the performance_schema.replication_group_members table again shows that there are now two ONLINE servers in the group.

mysql> SELECT * FROM performance_schema.replication_group_members;
 
确认server2 的数据和server1是否同步一致:
As server s2 is also marked as ONLINE, it must have already caught up with server s1 automatically. Verify that it has indeed synchronized with server s1 as follows.

节点加入数据同步原理:
As seen above, the second server has been added to the group and it has replicated the changes from server s1 automatically. According to the distributed recovery procedure, this means that just after joining the group and immediately before being declared online, server s2 has connected to server s1 automatically and fetched the missing data from it. In other words, it copied transactions from the binary log of s1 that it was missing, up to the point in time that it joined the group.

第八步:server3 的加入步骤和server2加入步骤一样,这里不再重复描述。
我把server3的重要配置信息贴出来。

[mysqld]

# server configuration
datadir=<full_path_to_data>/data/s3
basedir=<full_path_to_bin>/mysql-5.7/

port=24802
socket=<full_path_to_sock_dir>/s3.sock

#
# Replication configuration parameters
#
server_id=3
gtid_mode=ON
enforce_gtid_consistency=ON
master_info_repository=TABLE
relay_log_info_repository=TABLE
binlog_checksum=NONE
log_slave_updates=ON
log_bin=binlog
binlog_format=ROW

#
# Group Replication configuration
#
transaction_write_set_extraction=XXHASH64
loose-group_replication_group_name="aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa"
loose-group_replication_start_on_boot=off
loose-group_replication_local_address= "127.0.0.1:24903"
loose-group_replication_group_seeds= "127.0.0.1:24901,127.0.0.1:24902,127.0.0.1:24903"
loose-group_replication_bootstrap_group= off



=================
Group Rplication 监控参数解析

performance_schema.replication_group_member_stats  表

Channel_name    GR 复制通道名称
Member_id       复制集群中的成员 UUID(每个成员的id是唯一的) 。
Count_Transactions_in_queue        在检测到事务冲突期间的等待执行事务序列。  如果该值比较大说明复制延迟比较大。 需要相应调大GR复制的流控值。
Count_transactions_checked        显示需要进行冲突检测的事务数量
Count_conflicts_detected        显示不需要做冲突检测的事务数量
Count_transactions_validating     显示的当前数据库发生冲突的事务数量(与之相对的是每个被确认执行的事务)
Transactions_committed_all_members  显示在当前复制集群中所有成员成功执行的事务总量。这个值在一个固定的时间间隔进行更新。
Last_conflict_free_transaction        显示最后一次检测到的冲突释放后的事务标识符。


performance_schema.replication_group_members 表

Channel_name        GR 复制通道名称
Member_id            复制集群中的成员 UUID
Member_host            成员网络地址
Member_port            数据库连接监听端口
Member_state        给出集群成员的状态信息。这些状态包括(ONLINE, RECOVERING, OFFLINE or UNREACHABLE)



performance_schema.replication_connection_status 表

Channel_name        GR 复制通道名称
Group_name            显示集群名称的值。通常是一个有效的UUID
Source_UUID            显示集群标识符。该值和集群名称很相似。是作为所有在集群复制中产生的事务的UUID。
Service_state        显示该成员是否是集群中的一员。该值包括{ON, OFF and CONNECTING};
Received_transaction_set        该成员接收到在 GTID set中的事务。


performance_schema.replication_applier_status

Channel_name        GR 复制通道名称
Service_state        申请执行事务的服务的状态  (ON or OFF)
Remaining_delay        申请执行事务的延迟状况
Count_transactions_retries    重复申请执行某一个事物的次数   (The number of retries performed while applying a transaction.)
Received_transaction_set    该成员接收到在 GTID set中的事务。(Transactions in this GTID set have been received by this member of the group.)

Group Replication Server States
ONLINE
RECOVERING
OFFLINE
ERROR
UNREACHABLE

group_replication_recovery - This channel is used for the replication changes that are related to the distributed recovery phase.

group_replication_applier - This channel is used for the incoming changes from the group. This is the channel used to apply transactions coming directly from the group.


===========================================

GR 系统参数解析:

group_replication_group_name        
group_replication_start_on_boot  on/off
group_replication_local_address          ip:port  格式
group_replication_group_seeds          A list of peer addresses, comma separated list such as host1:port1,host2:port2.
group_replication_force_members        
#注释  A list of peer addresses, comma separated list such as host1:port1,host2:port2. This option is used to force a new group membership, in which the excluded members do not receive a new view and are blocked. You need to manually kill the excluded servers.

group_replication_bootstrap_group   
#Configure this server to bootstrap the group. This option must only be set on one server and only when starting the group for the first time or restarting the entire group.


group_replication_flow_control_mode
Specifies the mode used for flow control. This variable can be changed without resetting Group Replication.


很重要的流控参数,无需重启复制,直接生效。
group_replication_flow_control_certifier_threshold
#Specifies the number of waiting transactions in the certifier queue that trigger flow control. This variable can be changed without resetting Group Replication.

group_replication_flow_control_applier_threshold
#Specifies the number of waiting transactions in the applier queue that trigger flow control. This variable can be changed without resetting Group Replication.

group_replication_ip_whitelist  IP 白名单设置。


========================================

设置GR 集群的要求和限制条件:

GR集群要求:
一、基础结构要求
1、InnoDB Storage Engine
2、所有表必须有主键   主键在集群复制中扮演非常重要的角色。冲突的事物改变了具体哪些行数据就是依靠主键来识别的。
3、IPv4 Network   GR 目前只支持 IPv4
4、Network Performance  好的网络性能是必须的,不需要解释。
二、配置方面:
1、开启二进制日志    log-bin=log_file_name
2、log-slave-update=on
3、binlog_format=row
4、gtid-mode=on   用于最终哪些事物发生冲突。
5、 master-info-repository=TABLE   relay-log-info-repository=TABLE 
6、 transaction-write-set-extraction=XXHASH64

GR集群的局限性:
1、Replication Event Checksums          不支持 Event Checksums,因此需要加上配置 binlog-checksum=NONE.
2、Table Locks and Named Locks        The certification process does not take into account table locks or named locks。 貌似named locks生产中很少用到过,感兴趣的同学可以研究下。
3、Savepoints Not Supported.        Transaction savepoints are not supported.   不支持事物保存点。
4、Transaction savepoints are not supported.
5、不支持 SERIALIZABLE Isolation Level 事物级别。
6、Concurrent DDL vs DML/DDL Operations     Concurrent data definition statements and data manipulation statements executing against the same object but on different servers is not supported
7、Foreign Keys with Cascading Constraints  外键约束不完全支持。在多主节点集群中有监测不到事物冲突的风险,
因此建议在各个节点打开配置 group_replication_enforce_update_everywhere_checks=ON, 在single-primary的GR集群中不存在这个问题。
8、在单主GR集群中,group_replication_enforce_update_everywhere_checks 必须设置为OFF。

===========================
GR 集群可以配置为多主集群或者单主集群两种模式。
默认模式是单主集群。同一集群节点不能配置为不同的两种模式,如果要切换模式,整个集群要相应修改配置进行重启。
在单主集群中只要主节点能够进行写入,其他节点为只读状态。主节点一般为第一台 bootstap 的机器。第一台机器启动集群后,其他成员加入后自动变为只读状态。
如果主节点失败,其他多数成员节点会推举出新的主节点。原来旧的主节点被摘除出集群。旧主节点修复再加入集群时自动变为只读状态。
通常情况,在新的主节点接管客户端应用请求之前,需要先应用执行完 relay-log中的事物。

Multi-Primary Mode(多主节点模式)
多主节点不能完全支持外键,有监测不到事物冲突的风险,因此建议在各个节点打开配置 group_replication_enforce_update_everywhere_checks=ON


============================================
一些GR相关配置信息和故障处理方案:

#single-primary  mode  (单主节点集群如何确认哪台机器是写入节点)

SELECT VARIABLE_VALUE FROM performance_schema.global_status WHERE VARIABLE_NAME= 'group_replication_primary_member';


#Enhanced Automatic Donor Switchover

Purged data scenarios
Duplicated data
Other errors - If any of the recovery threads fail  (receiver or applier threads fail) then an error occurs and recovery switches over to a new donor.


#sets the maximum number of attempts to connect to a donor
SET GLOBAL group_replication_recovery_retry_count= 10;

# The following command sets the recovery donor connection retry interval to 120 seconds.   如果是连接指向同一太机器失败,这个暂停时间间隔才会起作用。
SET GLOBAL group_replication_recovery_reconnect_interval= 120;


如果集群有5个节点,突然发现灾难性故障,其中三个节点停了,则剩下的2个节点也将不能继续工作。应为多数的3个节点同事宕机,无法做出法定人数的决策。这种场景需要认为进行故障处理。关掉剩下的两个节点,然后找出三个故障节点的故障原因。
解决后重启整个集群。


第二处理方案:把 s1,s2 建立一个新的集群。需要注意的是做这个动作前,必须确认其他节点已经宕机,否则会造成整个集群脑裂状态。

Once you know the group communication addresses of s1 (127.0.0.1:10000) and s2 (127.0.0.1:10001), you can use that on one of the two servers to inject a new membership configuration, thus overriding the existing one that has lost quorum. To do that on s1:

mysql> SET GLOBAL group_replication_force_members="127.0.0.1:10000,127.0.0.1:10001";

注意:

Therefore it is important before forcing a new membership configuration to ensure that the servers to be excluded are indeed shutdown and if they are not, shut them down before proceding.Therefore it is important before forcing a new membership configuration to ensure that the servers to be excluded are indeed shutdown and if they are not, shut them down before proceding.


设置IP 白名单。 可以在集群的每个节点上设置相互信任访问的白名单。 这样白名单之外的机器要来访问的话将被拒绝。
mysql> STOP GROUP_REPLICATION;
mysql> SET GLOBAL group_replication_ip_whitelist="10.120.40.237/18,10.178.59.44/22,127.0.0.1/8";
mysql> START GROUP_REPLICATION; 查看全部
引入组复制,是为了解决传统复制和半同步复制可能产生数据不一致的问题。组复制依靠分布式一致性协议(Paxos协议的变体),实现了分布式下数据的强一致性,提供了真正的数据高可用方案(是否真正高可用还有待商榷)。其提供的多写方案,给我们实现多活方案带来了希望。一个replication group由若干个节点(数据库实例)组成,组内各个节点维护各自的数据副本(Share Nothing),通过一致性协议实现原子消息和全局有序消息,来实现组内实例数据的强一致。

MySQL 5.7 Group Replication 安装步骤:
一台机器上安装三个实例:

第一步:

mkdir data
mysql-5.7/bin/mysqld --initialize-insecure --basedir=$PWD/mysql-5.7 --datadir=$PWD/data/s1
mysql-5.7/bin/mysqld --initialize-insecure --basedir=$PWD/mysql-5.7 --datadir=$PWD/data/s2
mysql-5.7/bin/mysqld --initialize-insecure --basedir=$PWD/mysql-5.7 --datadir=$PWD/data/s3


第二步:
 

Configuring an Instance for Group Replication
 
 [mysqld]

# server configuration
datadir=<full_path_to_data>/data/s1
basedir=<full_path_to_bin>/mysql-5.7/

port=24801
socket=<full_path_to_sock_dir>/s1.sock


#Replication Framework
#The following settings configure replication according to the MySQL Group Replication requirements.
#以下配置是 GR 要求的基础配置:

server_id=1
gtid_mode=ON
enforce_gtid_consistency=ON
master_info_repository=TABLE
relay_log_info_repository=TABLE
binlog_checksum=NONE
log_slave_updates=ON
log_bin=binlog
binlog_format=ROW
 

#Group Replication Settings

#At this point the my.cnf file ensures that the server is configured and is instructed to instantiate the replication infrastructure under a given configuration. The following section configures the Group Replication settings 
# for the server.
#这部分是GR集群第一台机器的要求的配置信息。由于翻译水平有限,直接把这些配置的英文注释信息贴出来了。

#Line 1 instructs the server that for each transaction it has to collect the write set and encode it as a hash using the XXHASH64 hashing algorithm.

transaction_write_set_extraction=XXHASH64
#Line 2 tells the plugin that the group that it is joining, or creating, is named "aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa".
loose-group_replication_group_name="aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa"   

#Line 3 instructs the plugin to not start operations automatically when the server starts.      loose-group_replication_start_on_boot=off
 
#Line 4 tells the plugin to use the IP address 127.0.0.1, or localhost, and port 24901 for incoming connections from other members in the group.
loose-group_replication_local_address= "127.0.0.1:24901"
 
#Line 5 tells the plugin that the following members on those hosts and ports should be contacted in case it needs to join the group. These are seed members, which are used when this member wants to connect to the group. 
loose-group_replication_group_seeds= "127.0.0.1:24901,127.0.0.1:24902,127.0.0.1:24903"
 
#Line 6 instructs the plugin whether to boostrap the group or not.
loose-group_replication_bootstrap_group= off
 
警告:
GR集群无法同时进行多个成员的加入。
Creating a group and joining multiple members at the same time is not supported. It may work, but chances are that the operations race and then the act of joining the group ends up in an error or a time out.

重要信息:
loose-group_replication_bootstrap_group= off 这个配置必须是在集群只有一台机器的情况下使用。(通常是初始化集群建立或者整个集群宕机,然后重新进行启动恢复的情况)
Important    (loose-group_replication_bootstrap_group= off)
This option must only be used on one server instance at any time, usually the first time you bootstrap the group (or in case the entire group is brought down and back up again). If you bootstrap the group multiple times, for example when multiple server instances have this option set, then they could create an artificial split brain scenario, in which two distinct groups with the same name exist. Disable this option after the first server instance comes online.

要点:
所有成员机器的配置信息是非常类似的。你需要具体定义每台机器的以下信息。(比如服务器id,数据目录,复制成员本机IP)
Configuration for all servers in the group is quite similar. You need to change the specifics about each server (for example server_id, datadir, group_replication_local_address). This is illustrated later in this tutorial.
 
 

 第三步:
 
#Start the server:

mysql-5.7/bin/mysqld --defaults-file=data/s1/s1.cnf

#Disable binary logging, create a user with the correct permissions and save the credentials for the Group Replication recovery channel.

SET SQL_LOG_BIN=0;
CREATE USER rpl_user@'%';
GRANT REPLICATION SLAVE ON *.* TO rpl_user@'%' IDENTIFIED BY 'rpl_pass';
FLUSH PRIVILEGES;
SET SQL_LOG_BIN=1;
CHANGE MASTER TO MASTER_USER='rpl_user', MASTER_PASSWORD='rpl_pass' FOR CHANNEL 'group_replication_recovery';
 
重要提示:

#The final line seen above configures this server to use the given credentials next time it needs to recover its state from another member. Distributed recovery is the first step taken by a server that joins the group. If these credentials are not set properly, the server can not run the recovery protocol, and ultimately can not join the group.

第四步:

INSTALL PLUGIN group_replication SONAME 'group_replication.so';   
#To check that the plugin was installed successfully, issue SHOW PLUGINS; and look at the output.
# SHOW PLUGINS; 使用该命令确认复制插件是否安装成功,如下所示,代表已经成功安装。
...  
group_replication          | ACTIVE   | GROUP REPLICATION  | group_replication.so | GPL  

第五步:
SET GLOBAL group_replication_bootstrap_group=ON;
START GROUP_REPLICATION;
SET GLOBAL group_replication_bootstrap_group=OFF;

# SELECT * FROM performance_schema.replication_group_members;   该命令查询集群成员信息。
To check that the group is now created and that there is one member in it:
#For the purpose of demonstrating that the server is indeed in a group and that it is able to handle load, create a table and add some content to it.

第六步:
server2 的配置与加入:

配置文件如下:
[mysqld]

# server configuration
datadir=<full_path_to_data>/data/s2
basedir=<full_path_to_bin>/mysql-5.7/

port=24802
socket=<full_path_to_sock_dir>/s2.sock

#
# Replication configuration parameters
#
server_id=2
gtid_mode=ON
enforce_gtid_consistency=ON
master_info_repository=TABLE
relay_log_info_repository=TABLE
binlog_checksum=NONE
log_slave_updates=ON
log_bin=binlog
binlog_format=ROW

#
# Group Replication configuration
#
transaction_write_set_extraction=XXHASH64
loose-group_replication_group_name="aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa"
loose-group_replication_start_on_boot=off
loose-group_replication_local_address= "127.0.0.1:24902"
loose-group_replication_group_seeds= "127.0.0.1:24901,127.0.0.1:24902,127.0.0.1:24903"
loose-group_replication_bootstrap_group= off

启动 server2
mysql-5.7/bin/mysqld --defaults-file=data/s2/s2.cnf
 
第七步 建立复制同步账户,安装Group Rplication 插件,启动组复制:

#Then configure the recovery credentials as follows. The commands are the same as used when setting up server s1 as the user is shared within the group.

mysql> SET SQL_LOG_BIN=0;
Query OK, 0 rows affected (0,00 sec)

mysql> CREATE USER rpl_user@'%';
Query OK, 0 rows affected (0,00 sec)

mysql> GRANT REPLICATION SLAVE ON *.* TO rpl_user@'%' IDENTIFIED BY 'rpl_pass';
Query OK, 0 rows affected, 1 warning (0,00 sec)

mysql> FLUSH PRIVILEGES;
Query OK, 0 rows affected (0,00 sec)

mysql> SET SQL_LOG_BIN=1;
Query OK, 0 rows affected (0,00 sec)

mysql> CHANGE MASTER TO MASTER_USER='rpl_user', MASTER_PASSWORD='rpl_pass' \\
FOR CHANNEL 'group_replication_recovery';
Query OK, 0 rows affected, 2 warnings (0,01 sec)

#Install the Group Replication plugin and start the process of joining the server to the group. The following example installs the plugin in the same way as used while deploying server s1.

mysql> INSTALL PLUGIN group_replication SONAME 'group_replication.so';
Query OK, 0 rows affected (0,01 sec)
 
mysql> START GROUP_REPLICATION;
 
执行到这一步时,通常报以下错误信息:
2016-12-19T09:26:51.980926Z 0 [ERROR] Plugin group_replication reported: 'This member has more executed transactions than those present in the group. Local transactions: aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:1-2,
f653f65f-c378-11e6-89ed-7427eaf07fdf:1-2 > Group transactions: aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:1-5'
2016-12-19T09:26:51.980976Z 0 [Warning] Plugin group_replication reported: 'The member contains transactions not present in the group. It is only allowed to join due to group_replication_allow_local_disjoint_gtids_join option'
2016-12-19T09:26:51.980997Z 0 [Note] Plugin group_replication reported: 'Starting group replication recovery with view_id 14821349853020005:8'

依据错误提示信息:set global  group_replication_allow_local_disjoint_gtids_join = ON
然后启动组复制(start group_replication;)。
 
重要提示信息:
Unlike the previous steps that were the same as those executed on s1, here there is a difference in that you do not issue SET GLOBAL group_replication_bootstrap_group=ON; before starting Group Replication, because the group has already been created and bootstrapped by server s1. At this point server s2 only needs to be added to the already existing group.

确认第二个基点是否成功加入:
#Checking the performance_schema.replication_group_members table again shows that there are now two ONLINE servers in the group.

mysql> SELECT * FROM performance_schema.replication_group_members;
 
确认server2 的数据和server1是否同步一致:
As server s2 is also marked as ONLINE, it must have already caught up with server s1 automatically. Verify that it has indeed synchronized with server s1 as follows.

节点加入数据同步原理:
As seen above, the second server has been added to the group and it has replicated the changes from server s1 automatically. According to the distributed recovery procedure, this means that just after joining the group and immediately before being declared online, server s2 has connected to server s1 automatically and fetched the missing data from it. In other words, it copied transactions from the binary log of s1 that it was missing, up to the point in time that it joined the group.

第八步:server3 的加入步骤和server2加入步骤一样,这里不再重复描述。
我把server3的重要配置信息贴出来。

[mysqld]

# server configuration
datadir=<full_path_to_data>/data/s3
basedir=<full_path_to_bin>/mysql-5.7/

port=24802
socket=<full_path_to_sock_dir>/s3.sock

#
# Replication configuration parameters
#
server_id=3
gtid_mode=ON
enforce_gtid_consistency=ON
master_info_repository=TABLE
relay_log_info_repository=TABLE
binlog_checksum=NONE
log_slave_updates=ON
log_bin=binlog
binlog_format=ROW

#
# Group Replication configuration
#
transaction_write_set_extraction=XXHASH64
loose-group_replication_group_name="aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa"
loose-group_replication_start_on_boot=off
loose-group_replication_local_address= "127.0.0.1:24903"
loose-group_replication_group_seeds= "127.0.0.1:24901,127.0.0.1:24902,127.0.0.1:24903"
loose-group_replication_bootstrap_group= off



=================
Group Rplication 监控参数解析

performance_schema.replication_group_member_stats  表

Channel_name    GR 复制通道名称
Member_id       复制集群中的成员 UUID(每个成员的id是唯一的) 。
Count_Transactions_in_queue        在检测到事务冲突期间的等待执行事务序列。  如果该值比较大说明复制延迟比较大。 需要相应调大GR复制的流控值。
Count_transactions_checked        显示需要进行冲突检测的事务数量
Count_conflicts_detected        显示不需要做冲突检测的事务数量
Count_transactions_validating     显示的当前数据库发生冲突的事务数量(与之相对的是每个被确认执行的事务)
Transactions_committed_all_members  显示在当前复制集群中所有成员成功执行的事务总量。这个值在一个固定的时间间隔进行更新。
Last_conflict_free_transaction        显示最后一次检测到的冲突释放后的事务标识符。


performance_schema.replication_group_members 表

Channel_name        GR 复制通道名称
Member_id            复制集群中的成员 UUID
Member_host            成员网络地址
Member_port            数据库连接监听端口
Member_state        给出集群成员的状态信息。这些状态包括(ONLINE, RECOVERING, OFFLINE or UNREACHABLE)



performance_schema.replication_connection_status 表

Channel_name        GR 复制通道名称
Group_name            显示集群名称的值。通常是一个有效的UUID
Source_UUID            显示集群标识符。该值和集群名称很相似。是作为所有在集群复制中产生的事务的UUID。
Service_state        显示该成员是否是集群中的一员。该值包括{ON, OFF and CONNECTING};
Received_transaction_set        该成员接收到在 GTID set中的事务。


performance_schema.replication_applier_status

Channel_name        GR 复制通道名称
Service_state        申请执行事务的服务的状态  (ON or OFF)
Remaining_delay        申请执行事务的延迟状况
Count_transactions_retries    重复申请执行某一个事物的次数   (The number of retries performed while applying a transaction.)
Received_transaction_set    该成员接收到在 GTID set中的事务。(Transactions in this GTID set have been received by this member of the group.)

Group Replication Server States
ONLINE
RECOVERING
OFFLINE
ERROR
UNREACHABLE

group_replication_recovery - This channel is used for the replication changes that are related to the distributed recovery phase.

group_replication_applier - This channel is used for the incoming changes from the group. This is the channel used to apply transactions coming directly from the group.


===========================================

GR 系统参数解析:

group_replication_group_name        
group_replication_start_on_boot  on/off
group_replication_local_address          ip:port  格式
group_replication_group_seeds          A list of peer addresses, comma separated list such as host1:port1,host2:port2.
group_replication_force_members        
#注释  A list of peer addresses, comma separated list such as host1:port1,host2:port2. This option is used to force a new group membership, in which the excluded members do not receive a new view and are blocked. You need to manually kill the excluded servers.

group_replication_bootstrap_group   
#Configure this server to bootstrap the group. This option must only be set on one server and only when starting the group for the first time or restarting the entire group.


group_replication_flow_control_mode
Specifies the mode used for flow control. This variable can be changed without resetting Group Replication.


很重要的流控参数,无需重启复制,直接生效。
group_replication_flow_control_certifier_threshold
#Specifies the number of waiting transactions in the certifier queue that trigger flow control. This variable can be changed without resetting Group Replication.

group_replication_flow_control_applier_threshold
#Specifies the number of waiting transactions in the applier queue that trigger flow control. This variable can be changed without resetting Group Replication.

group_replication_ip_whitelist  IP 白名单设置。


========================================

设置GR 集群的要求和限制条件:

GR集群要求:
一、基础结构要求
1、InnoDB Storage Engine
2、所有表必须有主键   主键在集群复制中扮演非常重要的角色。冲突的事物改变了具体哪些行数据就是依靠主键来识别的。
3、IPv4 Network   GR 目前只支持 IPv4
4、Network Performance  好的网络性能是必须的,不需要解释。
二、配置方面:
1、开启二进制日志    log-bin=log_file_name
2、log-slave-update=on
3、binlog_format=row
4、gtid-mode=on   用于最终哪些事物发生冲突。
5、 master-info-repository=TABLE   relay-log-info-repository=TABLE 
6、 transaction-write-set-extraction=XXHASH64

GR集群的局限性:
1、Replication Event Checksums          不支持 Event Checksums,因此需要加上配置 binlog-checksum=NONE.
2、Table Locks and Named Locks        The certification process does not take into account table locks or named locks。 貌似named locks生产中很少用到过,感兴趣的同学可以研究下。
3、Savepoints Not Supported.        Transaction savepoints are not supported.   不支持事物保存点。
4、Transaction savepoints are not supported.
5、不支持 SERIALIZABLE Isolation Level 事物级别。
6、Concurrent DDL vs DML/DDL Operations     Concurrent data definition statements and data manipulation statements executing against the same object but on different servers is not supported
7、Foreign Keys with Cascading Constraints  外键约束不完全支持。在多主节点集群中有监测不到事物冲突的风险,
因此建议在各个节点打开配置 group_replication_enforce_update_everywhere_checks=ON, 在single-primary的GR集群中不存在这个问题。
8、在单主GR集群中,group_replication_enforce_update_everywhere_checks 必须设置为OFF。

===========================
GR 集群可以配置为多主集群或者单主集群两种模式。
默认模式是单主集群。同一集群节点不能配置为不同的两种模式,如果要切换模式,整个集群要相应修改配置进行重启。
在单主集群中只要主节点能够进行写入,其他节点为只读状态。主节点一般为第一台 bootstap 的机器。第一台机器启动集群后,其他成员加入后自动变为只读状态。
如果主节点失败,其他多数成员节点会推举出新的主节点。原来旧的主节点被摘除出集群。旧主节点修复再加入集群时自动变为只读状态。
通常情况,在新的主节点接管客户端应用请求之前,需要先应用执行完 relay-log中的事物。

Multi-Primary Mode(多主节点模式)
多主节点不能完全支持外键,有监测不到事物冲突的风险,因此建议在各个节点打开配置 group_replication_enforce_update_everywhere_checks=ON


============================================
一些GR相关配置信息和故障处理方案:

#single-primary  mode  (单主节点集群如何确认哪台机器是写入节点)

SELECT VARIABLE_VALUE FROM performance_schema.global_status WHERE VARIABLE_NAME= 'group_replication_primary_member';


#Enhanced Automatic Donor Switchover

Purged data scenarios
Duplicated data
Other errors - If any of the recovery threads fail  (receiver or applier threads fail) then an error occurs and recovery switches over to a new donor.


#sets the maximum number of attempts to connect to a donor
SET GLOBAL group_replication_recovery_retry_count= 10;

# The following command sets the recovery donor connection retry interval to 120 seconds.   如果是连接指向同一太机器失败,这个暂停时间间隔才会起作用。
SET GLOBAL group_replication_recovery_reconnect_interval= 120;


如果集群有5个节点,突然发现灾难性故障,其中三个节点停了,则剩下的2个节点也将不能继续工作。应为多数的3个节点同事宕机,无法做出法定人数的决策。这种场景需要认为进行故障处理。关掉剩下的两个节点,然后找出三个故障节点的故障原因。
解决后重启整个集群。


第二处理方案:把 s1,s2 建立一个新的集群。需要注意的是做这个动作前,必须确认其他节点已经宕机,否则会造成整个集群脑裂状态。

Once you know the group communication addresses of s1 (127.0.0.1:10000) and s2 (127.0.0.1:10001), you can use that on one of the two servers to inject a new membership configuration, thus overriding the existing one that has lost quorum. To do that on s1:

mysql> SET GLOBAL group_replication_force_members="127.0.0.1:10000,127.0.0.1:10001";

注意:

Therefore it is important before forcing a new membership configuration to ensure that the servers to be excluded are indeed shutdown and if they are not, shut them down before proceding.Therefore it is important before forcing a new membership configuration to ensure that the servers to be excluded are indeed shutdown and if they are not, shut them down before proceding.


设置IP 白名单。 可以在集群的每个节点上设置相互信任访问的白名单。 这样白名单之外的机器要来访问的话将被拒绝。
mysql> STOP GROUP_REPLICATION;
mysql> SET GLOBAL group_replication_ip_whitelist="10.120.40.237/18,10.178.59.44/22,127.0.0.1/8";
mysql> START GROUP_REPLICATION;

没有入行新人的迷茫

MySQLtplinux 回复了问题 • 2 人关注 • 2 个回复 • 965 次浏览 • 2016-12-19 17:34 • 来自相关话题

北京某上市公司招聘MySQL

回复

interviewwubx 回复了问题 • 1 人关注 • 1 个回复 • 1058 次浏览 • 2016-12-19 15:49 • 来自相关话题