lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <32121.1189207861@death>
Date:	Fri, 07 Sep 2007 16:31:01 -0700
From:	Jay Vosburgh <fubar@...ibm.com>
To:	Rick Jones <rick.jones2@...com>
cc:	Linux Network Development list <netdev@...r.kernel.org>
Subject: Re: error(s) in 2.6.23-rc5 bonding.txt ? 

Rick Jones <rick.jones2@...com> wrote:
[...]
>>         Note that this out of order delivery occurs when both the
>>         sending and receiving systems are utilizing a multiple
>>         interface bond.  Consider a configuration in which a
>>         balance-rr bond feeds into a single higher capacity network
>>         channel (e.g., multiple 100Mb/sec ethernets feeding a single
>>         gigabit ethernet via an etherchannel capable switch).  In this
>>         configuration, traffic sent from the multiple 100Mb devices to
>>         a destination connected to the gigabit device will not see
>>         packets out of order.  
>
>My first reaction was that this was incorrect - it didn't matter if the
>receiver was using a single link or not because the packets flowing across
>the multiple 100Mb links could hit the intermediate device out of order
>and so stay that way across the GbE link.

	Usually it does matter, at least at the time I tested this.

	Usually, the even striping of traffic from the balance-rr mode
will deliver in-order to a single higher speed link (e.g., N 100Mb
feeding a single 1Gb).  I say "usually" because, although I don't see it
happen with the equipment I have, I'm willing to believe that there are
gizmos that would "bundle" packets arriving on the switch ports.

	The reordering (usually) occurs when packet coalescing stuff
(either interrupt mitigation on the device, or NAPI) happens at the
receiver end, after the packets are striped evenly into the interfaces,
e.g.,

	eth0	eth1	eth2
	P1	P2	P3
	P4	P5	P6
	P7	P8	P9

	and then eth0 goes and grabs a bunch of its packets, then eth1,
and eth2 do the same afterwards, so the received order ends up something
like P1, P4, P7, P2, P5, P8, P3, P6, P9.  In Ye Olde Dayes Of Yore, with
one packet per interrupt at 10 Mb/sec, this type of configuration
wouldn't reorder (or at least not as badly).

	The text probably is lacking in some detail, though.  The real
key is that the last sender before getting to the destination system has
to do the round-robin striping.  Most switches that I'm familiar with
(again, never seen one, but willing to believe there is one) don't have
round-robin as a load balance option for etherchannel, and thus won't
evenly stripe traffic, but instead do some math on the packets so that a
given "connection" isn't split across ports.

	That said, it's certainly plausible that, for a given set of N
ethernets all enslaved to a single bonding balance-rr, the individual
ethernets could get out of sync, as it were (e.g., one running a fuller
tx ring, and thus running "behind" the others).  If bonding is the only
feeder of the devices, then for a continuous flow of traffic, all the
slaves will generally receive packets (from the kernel, for
transmission) at pretty much the same rate, and so they won't tend to
get ahead or behind.

	I haven't investigated into this deeply for a few years, but
this is my recollection of what happened with the tests I did then.  I
did testing with multiple 100Mb devices feeding either other sets of
100Mb devices or single gigabit devices.  I'm willing to believe that
things have changed, and an N feeding into one configuration can
reorder, but I haven't seen it (or really looked for it; balance-rr
isn't much the rage these days).

	-J

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@...ibm.com
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ