Xigmanas (FreeBSD + HAST) szinkron hiba.

Fórumok

Sziasztok!

Belefutottam egy érthetetlen jelenségbe, és nem találok semmit róla.

Adva van két Xigmanas rendszer.

Azon felkonfiguráltam a HAST + CARP-ot.

Amikor az elején kézzel inicializálom a master slave-et:

hastctl role init da1
hastctl create da1

hastctl role primary da1

...

akkor minden rendben, lefut a szinkron és tartja is amíg hozzá nem nyúlok a konfighoz.

Viszont amint kézzel átváltanék, ott már gond van. Vagy akár ha eltűntetem a mastert (kikapcsolom) és a visszajön, ött is teljes a fejetlenség az eredetileg secondary server részéről.

Olyan mint ha nem tudnának kommunikálni, pedig az első inicializálásnál lefut a szinkron.

Kézi átváltásnál az alábbit mondják:

Master:

Feb 17 12:17:07 xenstorage1 kernel: carp: 1@bge0: MASTER -> INIT (hardware interface up)
Feb 17 12:17:07 xenstorage1 kernel: bge0: promiscuous mode disabled
Feb 17 12:17:07 xenstorage1 kernel: bge0: promiscuous mode enabled
Feb 17 12:17:08 xenstorage1 kernel: carp: demoted by 240 to 240 (interface down)
Feb 17 12:17:08 xenstorage1 kernel: bge0: link state changed to DOWN
Feb 17 12:17:08 xenstorage1 root: /etc/rc.d/netif: WARNING: $ifconfig_bge0_alias1 needs leading "inet" keyword for an IPv4 address.
Feb 17 12:17:11 xenstorage1 kernel: carp: 1@bge0: INIT -> BACKUP (initialization complete)
Feb 17 12:17:11 xenstorage1 kernel: carp: demoted by -240 to 0 (interface up)
Feb 17 12:17:11 xenstorage1 kernel: bge0: link state changed to UP
Feb 17 12:17:11 xenstorage1 carp-hast: Switching to secondary provider for da1. (carp=backup)
Feb 17 12:17:11 xenstorage1 carp-hast: Stopping services and unmounting disks.
Feb 17 12:17:13 xenstorage1 hastswitch: Unmount /dev/ufsid/6203b9055c61153b (/dev/hast/da1p1) from /mnt/xendrive.
Feb 17 12:17:15 xenstorage1 kernel: carp: 1@bge0: BACKUP -> MASTER (master timed out)
Feb 17 12:17:17 xenstorage1 kernel: carp: 1@bge0: MASTER -> BACKUP (more frequent advertisement received)
Feb 17 12:17:17 xenstorage1 mDNSResponderPosix: Default: mDNSCoreReceiveResponse: Received from 10.0.99.106:5353 19 110.99.0.10.in-addr.arpa. PTR xenstorage2.local.
Feb 17 12:17:17 xenstorage1 mDNSResponderPosix: Default: mDNSCoreReceiveResponse: Unexpected conflict discarding 19 110.99.0.10.in-addr.arpa. PTR xenstorage1.local.
Feb 17 12:17:18 xenstorage1 carp-hast: Role switched to secondary for resource da1.
Feb 17 12:17:18 xenstorage1 kernel: GEOM: da1: corrupt or invalid GPT detected.
Feb 17 12:17:18 xenstorage1 kernel: GEOM: da1: GPT rejected -- may not be recoverable.
Feb 17 12:17:18 xenstorage1 carp-hast: Switching to primary provider for da1. (carp=backup)
Feb 17 12:17:24 xenstorage1 1 2022-02-17T12:17:24.672456+01:00 hastd: da1 (primary) hastd 7099 - - [da1] (primary) Unable to receive handshake header from 192.168.77.2: Socket is not connected.
Feb 17 12:17:24 xenstorage1 carp-hast: Role for HAST resources da1 switched to primary.
Feb 17 12:17:24 xenstorage1 carp-hast: Mounting disks and strting services.
Feb 17 12:17:24 xenstorage1 hastswitch: Mount /dev/ufsid/6203b9055c61153b (/dev/hast/da1p1) on /mnt/xendrive.
Feb 17 12:17:25 xenstorage1 kernel: carp: 1@bge0: BACKUP -> MASTER (user requested via ifconfig)
Feb 17 12:17:25 xenstorage1 mDNSResponderPosix: Default: mDNSCoreReceiveResponse: Received from 10.0.99.106:5353 19 110.99.0.10.in-addr.arpa. PTR xenstorage2.local.
Feb 17 12:17:25 xenstorage1 mDNSResponderPosix: Default: mDNSCoreReceiveResponse: Unexpected conflict discarding 19 110.99.0.10.in-addr.arpa. PTR xenstorage1.local.
Feb 17 12:17:25 xenstorage1 carp-hast: Switching to secondary provider for da1. (carp=master)
Feb 17 12:17:25 xenstorage1 carp-hast: Stopping services and unmounting disks.
Feb 17 12:17:26 xenstorage1 hastd: [da1] (primary) We act as primary for the resource and not as secondary as requested by tcp://192.168.77.2:25781.
Feb 17 12:17:27 xenstorage1 hastswitch: Unmount /dev/ufsid/6203b9055c61153b (/dev/hast/da1p1) from /mnt/xendrive.
Feb 17 12:17:27 xenstorage1 hastd: [da1] (primary) We act as primary for the resource and not as secondary as requested by tcp://192.168.77.2:40935.
Feb 17 12:17:28 xenstorage1 hastd: [da1] (primary) We act as primary for the resource and not as secondary as requested by tcp://192.168.77.2:13992.
Feb 17 12:17:29 xenstorage1 hastd: [da1] (primary) We act as primary for the resource and not as secondary as requested by tcp://192.168.77.2:26842.
Feb 17 12:17:30 xenstorage1 hastd: [da1] (primary) We act as primary for the resource and not as secondary as requested by tcp://192.168.77.2:30531.

 

Secondary:

 

Feb 17 12:17:14 xenstorage2 kernel: carp: 1@bge0: BACKUP -> MASTER (master timed out)
Feb 17 12:17:14 xenstorage2 carp-hast: Switching to primary provider for da1. (carp=master)
Feb 17 12:17:22 xenstorage2 1 2022-02-17T12:17:22.343336+01:00 hastd: da1 (secondary) hastd 66982 - - [da1] (secondary) Unable to receive request header: Socket is not connected.
Feb 17 12:17:22 xenstorage2 kernel: GEOM: da1: corrupt or invalid GPT detected.
Feb 17 12:17:22 xenstorage2 kernel: GEOM: da1: GPT rejected -- may not be recoverable.
Feb 17 12:17:27 xenstorage2 hastd: [da1] (secondary) Worker process exited ungracefully (pid=66982, exitcode=75).
Feb 17 12:17:29 xenstorage2 mDNSResponderPosix: Default: mDNSCoreReceiveResponse: Received from 10.0.99.105:5353 19 110.99.0.10.in-addr.arpa. PTR xenstorage1.local.
Feb 17 12:17:29 xenstorage2 mDNSResponderPosix: Default: mDNSCoreReceiveResponse: Unexpected conflict discarding 19 110.99.0.10.in-addr.arpa. PTR xenstorage2.local.
Feb 17 12:17:30 xenstorage2 kernel: carp: 1@bge0: MASTER -> BACKUP (more frequent advertisement received)
Feb 17 12:17:30 xenstorage2 1 2022-02-17T12:17:30.663284+01:00 hastd: da1 (primary) hastd 67211 - - [da1] (primary) Remote node acts as primary for the resource and not as secondary.
Feb 17 12:17:30 xenstorage2 1 2022-02-17T12:17:30.663317+01:00 hastd: da1 (primary) hastd 67211 - - [da1] (primary) Waiting for remote node to become secondary for 20s.
Feb 17 12:17:31 xenstorage2 mDNSResponderPosix: Default: mDNSCoreReceiveResponse: Received from 10.0.99.105:5353 19 110.99.0.10.in-addr.arpa. PTR xenstorage1.local.
Feb 17 12:17:31 xenstorage2 mDNSResponderPosix: Default: mDNSCoreReceiveResponse: Unexpected conflict discarding 19 110.99.0.10.in-addr.arpa. PTR xenstorage2.local.
Feb 17 12:17:31 xenstorage2 1 2022-02-17T12:17:31.727691+01:00 hastd: da1 (primary) hastd 67211 - - [da1] (primary) Remote node acts as primary for the resource and not as secondary.
Feb 17 12:17:32 xenstorage2 1 2022-02-17T12:17:32.761985+01:00 hastd: da1 (primary) hastd 67211 - - [da1] (primary) Remote node acts as primary for the resource and not as secondary.
Feb 17 12:17:33 xenstorage2 1 2022-02-17T12:17:33.826060+01:00 hastd: da1 (primary) hastd 67211 - - [da1] (primary) Remote node acts as primary for the resource and not as secondary.
Feb 17 12:17:34 xenstorage2 1 2022-02-17T12:17:34.887185+01:00 hastd: da1 (primary) hastd 67211 - - [da1] (primary) Remote node acts as primary for the resource and not as secondary.

Olyan mint ha nem tudnának kommunikálni. De a fenti parancsok kiadásánál az elején mégis szinkronizál.

A HAST verziók stimmelnek, a két meghajtó mérete egyezik. A kommunikációjuk bge2-n történik. A CARP cím a bge0-án van, a host.conf kitöltve.

Van esetleg bárkinek bármi tippje?

Hozzászólások

Ez kb. ismert viselkedése a HAST-nak, hiszen nincsen quorum funkcionalitás benne, pl. szabályos shutdown esetén (mindkét node) is a következő indításnál kézzel meg kell adni, hogy ki kicsoda. De olvasd meg a HAST man-t.

"It is possible that in the case of a connection outage between the nodes the hastd primary role for the given resource will be configured on both nodes. This in turn leads to incompatible data modifications. Such a condition is called a split-brain and cannot be automatically resolved by the hastd daemon as this will lead most likely to data corruption or loss of important changes. Even though it cannot be fixed by hastd itself, it will be detected and a further connection between independently modified nodes will not be possible. Once this situation is manually resolved by an administrator, the resource on one of the nodes can be initialized (erasing local data), which makes a connection to the remote node possi- ble again. Connection of the freshly initialized component will trigger full resource synchronization."

-- Soha ne vitatkozz idiotakkal! Lesulyedsz az O szintjukre es legyoznek a rutinjukkal ! --

Igen, erről olvastam, viszont erre van egy shell script, ami ezt megteszi a CARP interface váltásnál.

Ami nekem nem kerek, az fent ez a sor:

 

1 2022-02-17T12:17:24.672456+01:00 hastd: da1 (primary) hastd 7099 - - [da1] (primary) Unable to receive handshake header from 192.168.77.2: Socket is not connected.