[Gluster-devel] (client) reconnect attempts cause load spike

gluster-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Gluster-devel] (client) reconnect attempts cause load spike

From:	Rhesa Rozendaal
Subject:	[Gluster-devel] (client) reconnect attempts cause load spike
Date:	Sun, 08 Jul 2007 00:57:13 +0200
User-agent:	Thunderbird 1.5.0.12 (X11/20070604)

I just came back from dinner to find the client (patch-284, same spec file asbefore) pegging the machine:


13268 root      16   0  601m 527m  912 S  100 26.1 174:05.69 [glusterfs]

(the load on the machine was 35, so the 100% cpu doesn't really do it justice)

Here's the relevant portion of the server log:

2007-07-07 21:03:45 E [protocol.c:262:gf_block_unserialize_transport]libglusterfs/protocol: full_read of header failed: peer (192.168.64.186)

2007-07-07 21:03:45 C [tcp.c:81:tcp_disconnect] server: connection disconnected

2007-07-07 21:05:45 E [protocol.c:262:gf_block_unserialize_transport]libglusterfs/protocol: full_read of header failed: peer (192.168.64.186)

2007-07-07 21:05:45 C [tcp.c:81:tcp_disconnect] server: connection disconnected

2007-07-07 21:09:45 E [protocol.c:262:gf_block_unserialize_transport]libglusterfs/protocol: full_read of header failed: peer (192.168.64.186)

2007-07-07 21:09:45 C [tcp.c:81:tcp_disconnect] server: connection disconnected

(it actually started showing those from 20:33, repeating the same block onceevery 4 minutes or so).


The client log is probably more interesting:

2007-07-07 21:09:47 W [client-protocol.c:207:call_bail] client31: activatingbail-out. pending frames = 1. last sent = 2007-07-07 21:05:47. last received =1970-01-01 00:00:00transport-timeout = 1202007-07-07 21:09:47 C [client-protocol.c:215:call_bail] client31: bailingtransport2007-07-07 21:09:47 D [tcp.c:131:cont_hand] tcp: forcing poll/read/write tobreak on blocked socket (if any)2007-07-07 21:09:47 W [client-protocol.c:4043:client_protocol_cleanup]client31: cleaning up state in transport object 0x86900282007-07-07 21:09:47 W [client-protocol.c:4092:client_protocol_cleanup]client31: forced unwinding frame type(1) op(2)

2007-07-07 21:09:47 C [tcp.c:81:tcp_disconnect] client31: connection 
disconnected

2007-07-07 21:09:47 E [client-protocol.c:328:client_protocol_xfer] client31:transport_submit failed2007-07-07 21:09:47 W [client-protocol.c:4012:client_protocol_reconnect]client31: attempting reconnect

2007-07-07 21:09:47 D [tcp-client.c:71:tcp_connect] client31: socket fd = 4

2007-07-07 21:09:47 D [tcp-client.c:89:tcp_connect] client31: finalized onport `1018'2007-07-07 21:09:47 D [tcp-client.c:110:tcp_connect] client31: defaultingremote-port to 69962007-07-07 21:09:47 D [tcp-client.c:142:tcp_connect] client31: connect on 4 inprogress (non-blocking)2007-07-07 21:09:47 D [tcp-client.c:186:tcp_connect] client31: connection on 4success2007-07-07 21:09:47 D [client-protocol.c:4578:notify] client31: gotGF_EVENT_CHILD_UP2007-07-07 21:09:47 D [client-protocol.c:4341:client_protocol_handshake_reply]client31: reply frame has callid: 4242422007-07-07 21:09:47 D [client-protocol.c:4375:client_protocol_handshake_reply]client31: SETVOLUME on remote-host succeeded2007-07-07 21:09:48 W [client-protocol.c:4020:client_protocol_reconnect]client31: breaking reconnect chain

That block repeats 3 times (with 4 minute intervals), after which I see about600 lines like these:2007-07-07 21:10:02 D [inode.c:236:__destroy_inode] fuse/inode: destroyinode(221481587)2007-07-07 21:10:02 D [inode.c:236:__destroy_inode] fuse/inode: destroyinode(114199670)2007-07-07 21:10:02 D [inode.c:236:__destroy_inode] fuse/inode: destroyinode(221481586)2007-07-07 21:10:02 D [inode.c:236:__destroy_inode] fuse/inode: destroyinode(221481578)


[snip 100 similar lines]

2007-07-07 21:10:02 D [client-protocol.c:1829:client_forget] client01: forgetinode (221284204)2007-07-07 21:10:02 D [inode.c:236:__destroy_inode] client01/inode: destroyinode(221284204)2007-07-07 21:10:02 D [client-protocol.c:1829:client_forget] client03: forgetinode (355043120)2007-07-07 21:10:02 D [inode.c:236:__destroy_inode] client03/inode: destroyinode(355043120)2007-07-07 21:10:02 D [client-protocol.c:1829:client_forget] client02: forgetinode (84248694)2007-07-07 21:10:02 D [inode.c:236:__destroy_inode] client02/inode: destroyinode(84248694)2007-07-07 21:10:02 D [client-protocol.c:1829:client_forget] client31: forgetinode (92176667)2007-07-07 21:10:02 D [inode.c:236:__destroy_inode] client31/inode: destroyinode(92176667)2007-07-07 21:10:02 D [client-protocol.c:1829:client_forget] ns: forget inode(221284203)2007-07-07 21:10:02 D [inode.c:236:__destroy_inode] ns/inode: destroyinode(221284203)2007-07-07 21:10:02 D [inode.c:236:__destroy_inode] export/inode: destroyinode(221284203)


[snip 500 similar lines]

Note that those were all in the same second.

Then 3 minutes silence, after which the "activating bail-out" sequence startsagain, all the way through the forget/destroy inode block.

By now, I've killed the client, and will restart the "cp -ru" process I wasrunning. I'll report back if anything happens.

I doubt this is enough info for you to figure this out, so let me know ifthere's anything I can do to help pinpoint this odd reconnection behavior.


Rhesa

[Prev in Thread]

Current Thread

[Next in Thread]

[Gluster-devel] (client) reconnect attempts cause load spike, Rhesa Rozendaal <=
- Re: [Gluster-devel] (client) reconnect attempts cause load spike, Anand Avati, 2007/07/30

Prev by Date: [Gluster-devel] Gluster HPC 1.0 Release
Next by Date: [Gluster-devel] Bug report: Seg. Fault on unify when re-running dbench (patch 284)
Previous by thread: [Gluster-devel] Gluster HPC 1.0 Release
Next by thread: Re: [Gluster-devel] (client) reconnect attempts cause load spike
Index(es):
- Date
- Thread