HA 测试常见问题
作者:网络转载 发布时间:[ 2013/11/26 15:03:34 ] 推荐标签:
常见问题:
1. configure ha.cf file ,some key need be modified
ucast eth0 10.0.38.33 // it should be another machine ip address .
node dc_13 // you should add all node in this cluster
ping 10.0.38.156 // it only is test if ip fails.
2. configure haresources file.
there are three columes.
first clolume is machine name of the primary node.
the second is ip address which never be used in this network.
the third is the application which you want to call . it usually is a script which
be in /etc/init.d ( call it "any_server")
3. any_server configure .
it is a script in /etc/init.d , and will be call by heartbeast.
3. update crm.xml file
When you modify configure file , you should perfrom /usr/lib64/heartbeat/haresources2cib.py
it will generate cib.xml file again .
4. fix the problem about master thread switch between primary and backup matchine
the problem:
when primary heartbeat thread(A machine) restart ,
1. when A heartbeat stop , HA will reset B machine as primary server
2. when A heartbeat start ,HA will reset A machine as primary server
so , it will cause the problem that don't get data information and master thread don't start etc.
Solve:
We will limit the operation of reset A machine as primary server by some configure
modified the configure item in /var/lib/heartbeat/crm/cib.xml as following
<nvpair id="cib-bootstrap-options-default-resource-stickiness" name="default-resource-stickiness" value="INFINITY"/>
5. HA在64位机器安装的一些问题
1. libnet的版本问题, 如果直接下载64位rpm 包安装,经常包send.c文件错误,我下载了源码包,编译安装可解决
2. 在默认安装的情况下,我们需要检查/etc/ha.d/shellfunc 文件,看看ha_bin 是否指导/usr/lib64/目录,如果你copy
以前安装的32位机器默认应该是/usr/lib/目录
3. 安装前一定要先加用户和用户组,安装后的补加是无效的,安装时候找不到用户无法给目录权限,在你启动时候会导致
系统重启。
Notes:
1. The logic of fsimage and fslog synchronization
when slave master start , primary master will send fsimage to slave master server . and then primary master will don't send
fsimage to slave master again and primary master will send fslog to slave master . the fslog on slave master will increase .
when master switch , slave master will perform that with fslog update fsimage file.
2. The wait time of heartbeast is 5s
3. When perform /etc/init.d/delcae start , it fail , the reson maybe master thread have exist . it's notes is not very clear.
4. About master switch . there are A and B master , A master is configer as primary master , B is slave master.
4.1 When kill A master thread (sometime , the thread will be recall by HA. till it realy down ,we will kill it again ,) , HA will switch the primary master to B .
This time ,we can monitor HA (/usr/sbin/crm_mon -i 5) , the master thread on A is error status . so HA will don't
recall this thread on A , HA will be restart on A (/etc/init.d/heartbeat restart ) if you want to make it working.
Otherwise, Even if B master down , A master don't work.
1. configure ha.cf file ,some key need be modified
ucast eth0 10.0.38.33 // it should be another machine ip address .
node dc_13 // you should add all node in this cluster
ping 10.0.38.156 // it only is test if ip fails.
2. configure haresources file.
there are three columes.
first clolume is machine name of the primary node.
the second is ip address which never be used in this network.
the third is the application which you want to call . it usually is a script which
be in /etc/init.d ( call it "any_server")
3. any_server configure .
it is a script in /etc/init.d , and will be call by heartbeast.
3. update crm.xml file
When you modify configure file , you should perfrom /usr/lib64/heartbeat/haresources2cib.py
it will generate cib.xml file again .
4. fix the problem about master thread switch between primary and backup matchine
the problem:
when primary heartbeat thread(A machine) restart ,
1. when A heartbeat stop , HA will reset B machine as primary server
2. when A heartbeat start ,HA will reset A machine as primary server
so , it will cause the problem that don't get data information and master thread don't start etc.
Solve:
We will limit the operation of reset A machine as primary server by some configure
modified the configure item in /var/lib/heartbeat/crm/cib.xml as following
<nvpair id="cib-bootstrap-options-default-resource-stickiness" name="default-resource-stickiness" value="INFINITY"/>
5. HA在64位机器安装的一些问题
1. libnet的版本问题, 如果直接下载64位rpm 包安装,经常包send.c文件错误,我下载了源码包,编译安装可解决
2. 在默认安装的情况下,我们需要检查/etc/ha.d/shellfunc 文件,看看ha_bin 是否指导/usr/lib64/目录,如果你copy
以前安装的32位机器默认应该是/usr/lib/目录
3. 安装前一定要先加用户和用户组,安装后的补加是无效的,安装时候找不到用户无法给目录权限,在你启动时候会导致
系统重启。
Notes:
1. The logic of fsimage and fslog synchronization
when slave master start , primary master will send fsimage to slave master server . and then primary master will don't send
fsimage to slave master again and primary master will send fslog to slave master . the fslog on slave master will increase .
when master switch , slave master will perform that with fslog update fsimage file.
2. The wait time of heartbeast is 5s
3. When perform /etc/init.d/delcae start , it fail , the reson maybe master thread have exist . it's notes is not very clear.
4. About master switch . there are A and B master , A master is configer as primary master , B is slave master.
4.1 When kill A master thread (sometime , the thread will be recall by HA. till it realy down ,we will kill it again ,) , HA will switch the primary master to B .
This time ,we can monitor HA (/usr/sbin/crm_mon -i 5) , the master thread on A is error status . so HA will don't
recall this thread on A , HA will be restart on A (/etc/init.d/heartbeat restart ) if you want to make it working.
Otherwise, Even if B master down , A master don't work.
本文内容不用于商业目的,如涉及知识产权问题,请权利人联系SPASVO小编(021-61079698-8054),我们将立即处理,马上删除。