gluster-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gluster-devel] The return of the all-null pending matrix


From: Emmanuel Dreyfus
Subject: Re: [Gluster-devel] The return of the all-null pending matrix
Date: Tue, 23 Jul 2013 02:19:34 +0200
User-agent: MacSOUP/2.7 (unregistered for 2376 days)

Vijay Bellur <address@hidden> wrote:

> I have not been able to re-create the problem in my setup. I think it 
> would be a good idea to track this bug and address it. For now, can we 
> not use the volume set mechanism to disable eager-locking?

Our exchanges have gone off list after this message. I repost here 
the 100k last lines of log with debug mode:
http://ftp.espci.fr/shadow/manu/log

relevant part:

[2013-07-22 15:36:22.923866] D [afr-lk-common.c:447:transaction_lk_op] 
0-gfs34-replicate-0: lk op is for a transaction
[2013-07-22 15:36:22.924484] D [client-rpc-fops.c:2789:client_fdctx_destroy] 
0-gfs34-client-0: sending release on fd
[2013-07-22 15:36:22.924560] D [client-rpc-fops.c:2789:client_fdctx_destroy] 
0-gfs34-client-1: sending release on fd
[2013-07-22 15:36:22.943156] D 
[afr-self-heal-common.c:138:afr_sh_print_pending_matrix] 0-gfs34-replicate-1: 
pending_matrix: [ 0 0 ]
[2013-07-22 15:36:22.943202] D 
[afr-self-heal-common.c:138:afr_sh_print_pending_matrix] 0-gfs34-replicate-1: 
pending_matrix: [ 0 0 ]
[2013-07-22 15:36:22.943236] D [afr-self-heal-common.c:887:afr_mark_sources] 
0-gfs34-replicate-1: Number of sources: -1
[2013-07-22 15:36:22.943271] D 
[afr-self-heal-data.c:794:afr_lookup_select_read_child_by_txn_type] 
0-gfs34-replicate-1: /manu/netbsd/usr/src/lib/libterminfo/obj/tparm.po: 
Possible split-brain
[2013-07-22 15:36:22.943305] D 
[afr-self-heal-data.c:825:afr_lookup_select_read_child_by_txn_type] 
0-gfs34-replicate-1: returning read_child: 1
[2013-07-22 15:36:22.943336] D [afr-common.c:1380:afr_lookup_select_read_child] 
0-gfs34-replicate-1: Source selected as 1 for 
/manu/netbsd/usr/src/lib/libterminfo/obj/tparm.po
[2013-07-22 15:36:22.943374] D 
[afr-common.c:1117:afr_lookup_build_response_params] 0-gfs34-replicate-1: 
Building lookup response from 1
[2013-07-22 15:36:22.943409] D [afr-common.c:1265:afr_detect_self_heal_by_iatt] 
0-gfs34-replicate-1: size differs for 
/manu/netbsd/usr/src/lib/libterminfo/obj/tparm.po 
[2013-07-22 15:36:22.943444] D 
[afr-common.c:1291:afr_detect_self_heal_by_split_brain_status] 
0-gfs34-replicate-1: split brain detected during lookup of
/manu/netbsd/usr/src/lib/libterminfo/obj/tparm.po.
[2013-07-22 15:36:22.943478] D [afr-common.c:1426:afr_launch_self_heal] 
0-gfs34-replicate-1: background  data self-heal triggered. path: 
/manu/netbsd/usr/src/lib/libterminfo/obj/tparm.po, reason:
lookup detected pending operations
[2013-07-22 15:36:23.272807] D 
[afr-self-heal-metadata.c:486:afr_sh_metadata_post_nonblocking_inodelk_cbk] 
0-gfs34-replicate-1: Non Blocking metadata inodelks done for
/manu/netbsd/usr/src/lib/libterminfo/obj/tparm.po. Proceeding to FOP
[2013-07-22 15:36:23.272868] D [mem-pool.c:422:mem_get]  0-mem-pool: Mem pool 
is full. Callocing mem
[2013-07-22 15:36:23.272900] D 
[afr-self-heal-common.c:1930:afr_sh_common_lookup] 0-gfs34-replicate-1: looking 
up /manu/netbsd/usr/src/lib/libterminfo/obj/tparm.po on subvolume gfs34-client-2
[2013-07-22 15:36:23.272986] D 
[afr-self-heal-common.c:1930:afr_sh_common_lookup] 0-gfs34-replicate-1: looking 
up /manu/netbsd/usr/src/lib/libterminfo/obj/tparm.po on subvolume gfs34-client-3
[2013-07-22 15:36:23.273596] D [mem-pool.c:422:mem_get]  0-mem-pool: Mem pool 
is full. Callocing mem
[2013-07-22 15:36:23.273752] D [mem-pool.c:422:mem_get]  0-mem-pool: Mem pool 
is full. Callocing mem
[2013-07-22 15:36:23.273792] D 
[afr-self-heal-common.c:138:afr_sh_print_pending_matrix] 0-gfs34-replicate-1: 
pending_matrix: [ 0 0 ]
[2013-07-22 15:36:23.273829] D 
[afr-self-heal-common.c:138:afr_sh_print_pending_matrix] 0-gfs34-replicate-1: 
pending_matrix: [ 0 0 ]
[2013-07-22 15:36:23.273862] D [afr-self-heal-common.c:887:afr_mark_sources] 
0-gfs34-replicate-1: Number of sources: 2
[2013-07-22 15:36:23.273895] D [afr-lk-common.c:452:transaction_lk_op] 
0-gfs34-replicate-1: lk op is for a self heal
[2013-07-22 15:36:23.276705] D 
[afr-self-heal-metadata.c:61:afr_sh_metadata_done] 0-gfs34-replicate-1: 
proceeding to data check on /manu/netbsd/usr/src/lib/libterminfo/obj/tparm.po
[2013-07-22 15:36:23.278390] D 
[afr-self-heal-data.c:1158:afr_sh_data_post_nonblocking_inodelk_cbk] 
0-gfs34-replicate-1: Non Blocking data inodelks done for
/manu/netbsd/usr/src/lib/libterminfo/obj/tparm.po by 5c3e47ba. Proceeding to 
self-heal
[2013-07-22 15:36:23.278520] D [mem-pool.c:422:mem_get]  0-mem-pool: Mem pool 
is full. Callocing mem
[2013-07-22 15:36:23.278540] D [mem-pool.c:422:mem_get]  0-mem-pool: Mem pool 
is full. Callocing mem
[2013-07-22 15:36:23.280422] D [mem-pool.c:422:mem_get]  0-mem-pool: Mem pool 
is full. Callocing mem
[2013-07-22 15:36:23.281824] D [mem-pool.c:422:mem_get]  0-mem-pool: Mem pool 
is full. Callocing mem
[2013-07-22 15:36:23.282746] D 
[afr-self-heal-data.c:686:afr_sh_data_fxattrop_fstat_done] 0-gfs34-replicate-1: 
Pending matrix for: 5c3e47ba
[2013-07-22 15:36:23.282798] D 
[afr-self-heal-common.c:138:afr_sh_print_pending_matrix] 0-gfs34-replicate-1: 
pending_matrix: [ 0 0 ]
[2013-07-22 15:36:23.282831] D 
[afr-self-heal-common.c:138:afr_sh_print_pending_matrix] 0-gfs34-replicate-1: 
pending_matrix: [ 0 0 ]
[2013-07-22 15:36:23.282862] D [afr-self-heal-common.c:887:afr_mark_sources] 
0-gfs34-replicate-1: Number of sources: -1
[2013-07-22 15:36:23.282897] E 
[afr-self-heal-common.c:197:afr_sh_print_split_brain_log] 0-gfs34-replicate-1: 
Unable to self-heal contents of 
'/manu/netbsd/usr/src/lib/libterminfo/obj/tparm.po'
(possible split-brain). Please delete the file from all but the preferred 
subvolume.- Pending matrix:  [ [ 0 0 ] [ 0 0 ] ]
[2013-07-22 15:36:23.282931] D [afr-self-heal-data.c:336:afr_sh_data_fail] 
0-gfs34-replicate-1: finishing failed data selfheal of 
/manu/netbsd/usr/src/lib/libterminfo/obj/tparm.po
[2013-07-22 15:36:23.282962] D [afr-lk-common.c:452:transaction_lk_op] 
0-gfs34-replicate-1: lk op is for a self heal
[2013-07-22 15:36:23.283575] E 
[afr-self-heal-common.c:2212:afr_self_heal_completion_cbk] 0-gfs34-replicate-1: 
background  data self-heal failed on 
/manu/netbsd/usr/src/lib/libterminfo/obj/tparm.po
[2013-07-22 15:36:23.283636] D 
[afr-self-heal-common.c:138:afr_sh_print_pending_matrix] 0-gfs34-replicate-1: 
pending_matrix: [ 0 0 ]
[2013-07-22 15:36:23.283669] D 
[afr-self-heal-common.c:138:afr_sh_print_pending_matrix] 0-gfs34-replicate-1: 
pending_matrix: [ 0 0 ]
[2013-07-22 15:36:23.283700] D [afr-self-heal-common.c:887:afr_mark_sources] 
0-gfs34-replicate-1: Number of sources: -1
[2013-07-22 15:36:23.283730] D 
[afr-self-heal-data.c:794:afr_lookup_select_read_child_by_txn_type] 
0-gfs34-replicate-1: /manu/netbsd/usr/src/lib/libterminfo/obj/tparm.po: 
Possible split-brain
[2013-07-22 15:36:23.283763] D 
[afr-self-heal-data.c:825:afr_lookup_select_read_child_by_txn_type] 
0-gfs34-replicate-1: returning read_child: 1
[2013-07-22 15:36:23.283794] D [afr-common.c:1380:afr_lookup_select_read_child] 
0-gfs34-replicate-1: Source selected as 1 for 
/manu/netbsd/usr/src/lib/libterminfo/obj/tparm.po
[2013-07-22 15:36:23.283828] D 
[afr-common.c:1117:afr_lookup_build_response_params] 0-gfs34-replicate-1: 
Building lookup response from 1
[2013-07-22 15:36:23.284755] W [afr-open.c:213:afr_open] 0-gfs34-replicate-1: 
failed to open as split brain seen, returning EIO



-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
address@hidden



reply via email to

[Prev in Thread] Current Thread [Next in Thread]