Re: [Gluster-devel] Improving real world performance by moving files clo

gluster-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gluster-devel] Improving real world performance by moving files clo

From:	gordan
Subject:	Re: [Gluster-devel] Improving real world performance by moving files closer to their target workloads
Date:	Wed, 21 May 2008 16:34:02 +0100 (BST)
User-agent:	Alpine 1.10 (LRH 962 2008-03-14)

On Wed, 21 May 2008, Luke McGregor wrote:

I think i misunderstood the way you were proposing implementing a quorum. I
thought there was no state data stored.


What exactly are you referring to as "state data"?

Let me just confirm how you were proposing to implement this.


[numbers added for later reference]

1

- (client) A node requests a write to a specific file and broadcasts this to
the network

- (server) the server checks that no other nodes are claiming a lock on that
file and replies accordingly if file was lockable the server locks it

Which server would this be? IIRC, every lock would need be noted on atleast 50%+1 server.

- (client) The node then waits for 50% + 1 node to respond and say that they
can write.

- (client) The node writes the file

- (client) The node broadcasts a file completed message.

- (server) Updates its locking database to free that file

Does this look correct?

I think so, but I am not entirely clear WRT whether you are talking about1 type of server or a seprate server "class" for lock servers. I don'tbelieve separate lock/metadata servers were ever proposed.

I'm not entirely sure if it would be better to store a copy of all thelocks on all the nodes, or whether the locks should only be stored on thenodes that have a copy of the file. The latter would make more sense, butit would make the "quorum" consist of only the nodes that have a copy ofthe file, not all the nodes in the cluster. The problem with this is thata list of all the nodes in the cluster is fixed (or can be reasonablyfixed), whereas in a system where files are replicating/migrating/expiringtoward the nodes that use them most heavily, maintianing the list of nodesthat have any one file would become difficult.

If so i have a few questions.

Is there any information stored by nodes on who is writing the file?


At the moment?
Not sure about unify (presumed to be the node that has the file).
In AFR, the first server in the list is the "lock server".

In the proposed new migrating translator? I'd have thought it would behandled similar to unify, only with the lock information replicated to allthe nodes storing the file, along with the file data itself. Details ofwho has the current file lock along with meta-information about that lockcould be stored in xattrs (since that is what is already used for versionmetadata).

(if so
what happens when the lock fails? wont the above model lock the file on
nodes which have no current lock but not actually hold the lock ie node1
requests but node2 has lock wont some servers have granted lock to node1 and
have that info stored)

There are several ways this could be dealt with. In theory, all nodesthat have the file should agree on the locks on that file (since that isreplicated and acknowledged by all the nodes with that file). If thisisn't the case, something went wrong. We can wait for the lock toexpire (e.g. 100ms) and see if it gets refreshed. It shouldn't, or itmeans that somehow one server is getting updated without the changepropagating to the other servers. The current AFR approach to resolvingthis is to clobber the out of sync files with new versions.

If this is not the case what happens if some servers
dont recieve the un-lock broadcast? wont they still think that the file is
locked and respond on that basis the next time they are in a quorum?

That's why locks need timeouts and refreshes to handle node death while ithas a lock, along with acks for lock/unlock approvals for normaloperation.

If we assume the simplest possible solution where there is at least one copy
of each file required, how would you identify a file which can be deleted on
the system without having to broadcast a query on every single file starting
from the oldest?

You can't. I don't think there is an easy work-around. In the case of asingle node, this shouldn't cause a huge load on the network, but when thenetwork starts getting full AND there is still a relatively high amount ofmigration required for optimization, the load would go up dramatically.

However - migration doesn't have to happen online. If there isn't enoughspace to migrate to local disk, you can do a remote open, and sort out thedisk space freeing and possibly migration of the file to the local machineasynchronously.

Obviously i agree that distributed metadata is a really good thing to have
for scalability and reliability. However i am worried that the whole
broadcasting side of things is going to cause some huge problems in
implementing our migration project.

Unify already does something similar to find what node has the file inquestion.

im especially worried about how to solve
the old file problem.

Yes, that's not the easy one to solve, but as I said, if something needsexpunging for space reasons, it could be done asynchronously. It shouldalso, in theory, not become necessary until the network starts to getrelatively full.

im also worried that every server in the network is
going to have to hold a fairly sizable set of metadata, this seems to be a
problem in terms of scaling.

The only metadata I can think of is the lock information, which can bestored in xattrs, the same as versioning for AFR. This hardly amounts to alot of data, especially since the metadata would be stored on the samenodes as the files.


Gordan

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Gluster-devel] Improving real world performance by moving files closer to their target workloads, (continued)
- Re: [Gluster-devel] Improving real world performance by moving files closer to their target workloads, Gordan Bobic, 2008/05/15

Prev by Date: Re: [Gluster-devel] Improving real world performance by moving files closer to their target workloads
Next by Date: Re: [Gluster-devel] Re: Strange behavior of AFR
Previous by thread: Re: [Gluster-devel] Improving real world performance by moving files closer to their target workloads
Next by thread: Re: [Gluster-devel] Improving real world performance by moving files closer to their target workloads
Index(es):
- Date
- Thread