[Gluster-devel] Question about afr/self-heal

gluster-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Gluster-devel] Question about afr/self-heal

From:	Brian Hirt
Subject:	[Gluster-devel] Question about afr/self-heal
Date:	Tue, 9 Dec 2008 10:34:08 -0700

Hello,

I'm running some tests with with GlusterFS and so far I like what I'mseeing. I've got a test 4 node system set up with AFR-Unify. node1and node3 are replicated and node2 and node4 are replicated. Theseare then unified together into a single brick. afr and unify aredone on the client side. All of the servers are running ubuntu +glusterfs 1.3.12 with an underlaying ext3 filesystem.

Durning one of my tests i took down a server during an update/additionof a few thousand files. After the update was complete, i brought upthe downed node. I was able to see all the new files after i did adirectory listing on the client, but they all had a size of 0 and theupdated files still had the old contents. When I opened these fileson the client, the correct contents were returned and the once downnode was then corrected for that file.

From searching through the email archives, this seems like theintended way it supposed to work. However, in the state that thefilesystem is in now, my redundancy is lost for those changes until iopen every file and directory on the client. In my configuration Iintend to have many million files. Am I supposed to open every singleone of them after a node goes down to get the replication back insync? There will often be times where servers are brought down forroutine maintenance for 10-15 minutes at a time and during that timeonly a few hundred files might change. What is the properprocedure for resynchronizing? How are other people handling this?I've seen a few comments about fsck in the mail archive referencing apath that doesn't exist in my GlusterFS distribution (possibly it'sthe 1.4 branch)

Also the log file is very verbose about the downed server. There arelots of messages like:

2008-12-09 11:18:08 E [tcp-client.c:190:tcp_connect] brick2: non-blocking connect() returned: 111 (Connection refused)2008-12-09 11:18:08 W [client-protocol.c:332:client_protocol_xfer]brick2: not connected at the moment to submit frame type(1) op(34)2008-12-09 11:18:08 E [client-protocol.c:4430:client_lookup_cbk]brick2: no proper reply from server, returning ENOTCONN2008-12-09 11:18:08 E [tcp-client.c:190:tcp_connect] brick2: non-blocking connect() returned: 111 (Connection refused)2008-12-09 11:18:08 W [client-protocol.c:332:client_protocol_xfer]brick2: not connected at the moment to submit frame type(1) op(9)2008-12-09 11:18:08 E [client-protocol.c:2787:client_chmod_cbk]brick2: no proper reply from server, returning ENOTCONN

In some of my tests I'm seeing several hundred a second logged. Isthere some way to make this a bit less verbose?

I'm sorry if these are FAQ, but I've so far been unable to findanything on the wiki or mailing lists.


Thanks in advance for you help and this great project.

Brian Hirt

[Prev in Thread]

Current Thread

[Next in Thread]

[Gluster-devel] Question about afr/self-heal, Brian Hirt <=
- Re: [Gluster-devel] Question about afr/self-heal, Kevan Benson, 2008/12/09

Prev by Date: Re: [Gluster-devel] about glusterfs--mainline--3.0--patch-717
Next by Date: Re: [Gluster-devel] about glusterfs--mainline--3.0--patch-717
Previous by thread: [Gluster-devel] Will glusterfs-1.3.11 hang RHEL 5.2?
Next by thread: Re: [Gluster-devel] Question about afr/self-heal
Index(es):
- Date
- Thread