hi,
I used the following test to figure out the bad commit.
#!/bin/bash
. $(dirname $0)/../include.rc
. $(dirname $0)/../volume.rc
function trigger_mount_self_heal {
find $M0 | xargs stat
}
cleanup;
TEST glusterd
TEST pidof glusterd
TEST $CLI volume create $V0 replica 2 $H0:$B0/${V0}{0,1}
TEST $CLI volume set $V0 cluster.background-self-heal-count 0
TEST $CLI volume start $V0
TEST glusterfs --volfile-id=/$V0 --volfile-server=$H0 $M0 --use-readdirp=no
--attribute-timeout=0 --entry-timeout=0
TEST touch $M0/a
TEST kill_brick $V0 $H0 $B0/${V0}0
TEST ln -s $M0/a $M0/s
TEST ! stat $B0/${V0}0/s
TEST stat $B0/${V0}1/s
TEST $CLI volume start $V0 force
EXPECT_WITHIN 20 "Y" glustershd_up_status
EXPECT_WITHIN 20 "1" afr_child_up_status_in_shd $V0 0
TEST $CLI volume heal $V0 full
TEST trigger_mount_self_heal
TEST stat $B0/${V0}0/s
TEST stat $B0/${V0}1/s
cleanup
According to git bisect run, the commit which introduced this problem is:
837422858c2e4ab447879a4141361fd382645406
commit 837422858c2e4ab447879a4141361fd382645406
Author: Anand Avati <address@hidden>
Date: Thu Nov 21 06:48:17 2013 -0800
core: fix errno for non-existent GFID
When clients refer to a GFID which does not exist, the errno to
be returned in ESTALE (and not ENOENT). Even though ENOENT might
look "proper" most of the time, as the application eventually expects
ENOENT even if a parent directory does not exist, not returning
ESTALE results in resolvers (FUSE and GFAPI) to not retry resolution
in uncached mode. This can result in spurious ENOENTs during
concurrent path modification operations.
Change-Id: I7a06ea6d6a191739f2e9c6e333a1969615e05936
BUG: 1032894
Signed-off-by: Anand Avati <address@hidden>
Reviewed-on: http://review.gluster.org/6322
Tested-by: Gluster Build System <address@hidden>
Affected branches: master, 3.5, 3.4,
Will be working with Venkatesh to get a fix for this on all these branches.
Good catch venkatesh!!. Thanks a lot for a simple case to re-create the issue
:-).