Re: [Gluster-devel] Replicate/AFR Using Broadcast/Multicast?

gluster-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gluster-devel] Replicate/AFR Using Broadcast/Multicast?

From:	Gordan Bobic
Subject:	Re: [Gluster-devel] Replicate/AFR Using Broadcast/Multicast?
Date:	Wed, 13 Oct 2010 09:06:56 +0100
User-agent:	Mozilla-Thunderbird 2.0.0.24 (X11/20100328)

Beat Rubischon wrote:

Hello!

Quoting <address@hidden> (13.10.10 09:11):

I would also expect to see network issues as a cluster grows.

Performance reducing as the node count increases isn't seen as a bigger
issue?


I have pretty bad experience with Multicast. Running serveral clusters in
the range 500-1000 nodes in a single broadcast domain over the last year
showed that broadcast or multicast is able to kill your fabrics easily.

What sort of a cluster are you running with that many nodes? RHCS?Heartbeat? Something else entirely? In what arrangement?

Even the most expensive GigE switch chassis could be killed by 125+ MBytes
of traffic which is almost nothing :-)

Sounds like a typical example of cost not being a good measure ofquality and performance. :)

The inbound traffic must be routed to
several outbount ports which results in congestion. Throttling and packet
loss is the result, even for the normal unicast traffic. Multicast or
broadcast is a nice way for a denial of service in a LAN.

If you have that many GlusterFS nodes, you have DoS-ed the storagenetwork anyway purely by the write amplification of the inverse scaling.The idea is that you have a (mostly) dedicated VLAN for this.

In Infiniband multicast typically realized by looping through the
destinations directly on the source HBA. One of the main target in the
current development as multicast is a similar network pattern compared to
the MPI collectives. So there is no win in using multicast over such a
fabric.

Sure, but historically in the networking space, non-ethernettechnologies have always been niche, cost ineffective in terms ofprice/performance and only had a temporary performance advantage.

At the moment the only way to achieve linear-ish scaling is to connectthe nodes directly to each other, but that means that each node in ann-way cluster has to have at least n NICs (one to each other node plusat least one to the clients). That becomes impractical very quickly.

Using a parallel storage means you have communications from a large amount
of nodes to a large amount of servers. Using unicast looks bad in the first
point of view but I'm confident it's the better solution over all.

The problem is that the write bandwidth is fundamentally limited to thespeed of your interconnect, and this is shared between all the nodes. Soif you are on a 1Gb ethernet, your total volume of writes between allthe replicas cannot exceed 1Gb. If you have 10 nodes, that limits yourwrite throughput to 100Mb/s. The only way to scale is either by:

1) Putting ever more NICs into each storage chassis and turn thereplication network into what is essentially a point-to-point network.Complex, inefficient and adding to the cost - 1Gb NICs are cheap, but10Gb ones are still expensive. There is also a limit on how many NICsyou can cram into a COTS storage node.

2) Upgrading to an ever faster interconnect (10Gb ethernet, Infiniband,etc.). Expensive and still doesn't scale as you add more storage nodes.

Right now more storage nodes means slower storage, and that shouldreally be addressed.


Gordan

[Prev in Thread]

Current Thread

[Next in Thread]

[Gluster-devel] Replicate/AFR Using Broadcast/Multicast?, Gordan Bobic, 2010/10/12
- Re: [Gluster-devel] Replicate/AFR Using Broadcast/Multicast?, Craig Carl, 2010/10/12
  - Re: [Gluster-devel] Replicate/AFR Using Broadcast/Multicast?, Bernard Li, 2010/10/12
  - Re: [Gluster-devel] Replicate/AFR Using Broadcast/Multicast?, Gordan Bobic, 2010/10/13
    - Re: [Gluster-devel] Replicate/AFR Using Broadcast/Multicast?, Beat Rubischon, 2010/10/13
    - Re: [Gluster-devel] Replicate/AFR Using Broadcast/Multicast?, Gordan Bobic <=
    - Re: [Gluster-devel] Replicate/AFR Using Broadcast/Multicast?, Beat Rubischon, 2010/10/13
    - Re: [Gluster-devel] Replicate/AFR Using Broadcast/Multicast?, Gordan Bobic, 2010/10/13

Prev by Date: Re: [Gluster-devel] Replicate/AFR Using Broadcast/Multicast?
Next by Date: Re: [Gluster-devel] Replicate/AFR Using Broadcast/Multicast?
Previous by thread: Re: [Gluster-devel] Replicate/AFR Using Broadcast/Multicast?
Next by thread: Re: [Gluster-devel] Replicate/AFR Using Broadcast/Multicast?
Index(es):
- Date
- Thread