gluster-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gluster-devel] Improving real world performance by moving files clo


From: gordan
Subject: Re: [Gluster-devel] Improving real world performance by moving files closer to their target workloads
Date: Wed, 21 May 2008 16:34:02 +0100 (BST)
User-agent: Alpine 1.10 (LRH 962 2008-03-14)

On Wed, 21 May 2008, Luke McGregor wrote:

I think i misunderstood the way you were proposing implementing a quorum. I
thought there was no state data stored.

What exactly are you referring to as "state data"?

Let me just confirm how you were proposing to implement this.

[numbers added for later reference]

1
- (client) A node requests a write to a specific file and broadcasts this to
the network

2
- (server) the server checks that no other nodes are claiming a lock on that
file and replies accordingly if file was lockable the server locks it

Which server would this be? IIRC, every lock would need be noted on at least 50%+1 server.

3
- (client) The node then waits for 50% + 1 node to respond and say that they
can write.

4
- (client) The node writes the file

5
- (client) The node broadcasts a file completed message.

6
- (server) Updates its locking database to free that file

Does this look correct?

I think so, but I am not entirely clear WRT whether you are talking about 1 type of server or a seprate server "class" for lock servers. I don't believe separate lock/metadata servers were ever proposed.

I'm not entirely sure if it would be better to store a copy of all the locks on all the nodes, or whether the locks should only be stored on the nodes that have a copy of the file. The latter would make more sense, but it would make the "quorum" consist of only the nodes that have a copy of the file, not all the nodes in the cluster. The problem with this is that a list of all the nodes in the cluster is fixed (or can be reasonably fixed), whereas in a system where files are replicating/migrating/expiring toward the nodes that use them most heavily, maintianing the list of nodes that have any one file would become difficult.

If so i have a few questions.

Is there any information stored by nodes on who is writing the file?

At the moment?
Not sure about unify (presumed to be the node that has the file).
In AFR, the first server in the list is the "lock server".

In the proposed new migrating translator? I'd have thought it would be handled similar to unify, only with the lock information replicated to all the nodes storing the file, along with the file data itself. Details of who has the current file lock along with meta-information about that lock could be stored in xattrs (since that is what is already used for version metadata).

(if so
what happens when the lock fails? wont the above model lock the file on
nodes which have no current lock but not actually hold the lock ie node1
requests but node2 has lock wont some servers have granted lock to node1 and
have that info stored)

There are several ways this could be dealt with. In theory, all nodes that have the file should agree on the locks on that file (since that is replicated and acknowledged by all the nodes with that file). If this isn't the case, something went wrong. We can wait for the lock to expire (e.g. 100ms) and see if it gets refreshed. It shouldn't, or it means that somehow one server is getting updated without the change propagating to the other servers. The current AFR approach to resolving this is to clobber the out of sync files with new versions.

If this is not the case what happens if some servers
dont recieve the un-lock broadcast? wont they still think that the file is
locked and respond on that basis the next time they are in a quorum?

That's why locks need timeouts and refreshes to handle node death while it has a lock, along with acks for lock/unlock approvals for normal operation.

If we assume the simplest possible solution where there is at least one copy
of each file required, how would you identify a file which can be deleted on
the system without having to broadcast a query on every single file starting
from the oldest?

You can't. I don't think there is an easy work-around. In the case of a single node, this shouldn't cause a huge load on the network, but when the network starts getting full AND there is still a relatively high amount of migration required for optimization, the load would go up dramatically.

However - migration doesn't have to happen online. If there isn't enough space to migrate to local disk, you can do a remote open, and sort out the disk space freeing and possibly migration of the file to the local machine asynchronously.

Obviously i agree that distributed metadata is a really good thing to have
for scalability and reliability. However i am worried that the whole
broadcasting side of things is going to cause some huge problems in
implementing our migration project.

Unify already does something similar to find what node has the file in question.

im especially worried about how to solve
the old file problem.

Yes, that's not the easy one to solve, but as I said, if something needs expunging for space reasons, it could be done asynchronously. It should also, in theory, not become necessary until the network starts to get relatively full.

im also worried that every server in the network is
going to have to hold a fairly sizable set of metadata, this seems to be a
problem in terms of scaling.

The only metadata I can think of is the lock information, which can be stored in xattrs, the same as versioning for AFR. This hardly amounts to a lot of data, especially since the metadata would be stored on the same nodes as the files.

Gordan




reply via email to

[Prev in Thread] Current Thread [Next in Thread]