[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <OFE2CC8708.AE88EB86-ON882577AB.00682F38-882577AB.006AA7EF@us.ibm.com>
Date: Mon, 27 Sep 2010 12:24:51 -0700
From: David Stevens <dlstevens@...ibm.com>
To: Christoph Lameter <cl@...ux.com>
Cc: "David S. Miller" <davem@...emloft.net>,
"linux-rdma@...r.kernel.org" <linux-rdma@...r.kernel.org>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
Bob Arendt <rda@...con.com>
Subject: Re: igmp: Staggered igmp report intervals for unsolicited igmp reports
Christoph Lameter <cl@...ux.com> wrote on 09/23/2010 08:37:48 AM:
>
> On Wed, 22 Sep 2010, David Stevens wrote:
>
> > >
> > > Also increment the frequency so that we get a 10 reports send over a
> > > few seconds.
> >
> > Except you want to conform and not conform at the same time.
:-)
> > IGMPv2 should be: default count 2, interval 10secs
> > IGMPv3 should be: default count 2, interval 1sec
>
> This is during the period of unsolicited igmp reports. We do not know if
> this group is managed using V3 or V2 since no igmp query/report has been
> received yet.
The default is IGMPv3 unless a v2 querier is present. You can
force
it to be IGMPv2 with by having an IGMPv2 querier on the network or by
using
the force_igmp_version tunable.
> > ...and no way is it a good idea to send 10 unsolicited reports on an
> > Ethernet.
>
> Why would that be an issue?
Because the traffic for all joins is multiplied by >3. If you're
joining 1 group, maybe that wouldn't be an issue, but what if I join
100, and what if hundreds of other hosts on that network do too? And
applications that dynamically join and leave groups may do this
"normally."
Even 3 reports on switched networks with low loss is really unnecessary
overkill; 10 is just wasted bandwidth.
> The IGMPv2 RFC has no strict limit and RFC3376
> mentions that the retransmission occurs "Robustness Variable" times
> minus one. Choosing 10 for the "Robustness Variable" is certainly ok.
Both of them specify the default value and say a querier is the
mechanism for changing that. If you want to follow the RFC, the default
is "2", not "10." While it'd be reasonable for a sysadmin to tune this
per-interface without a querier, it's not reasonable to make all linux
systems on all networks more than triple the number of reports they send
from the RFC-specified default. Right?!? :-)
> If we do not increase the number of reports but just limit the interval
> then the chance of outages of a second or so during mc group creation
> causing routers missing igmp reports is significantly increased.
If you can't send on a group for 1 second, all of the initial
IGMPv3 reports will be lost about half of the time if we make that
conformant (it looks like it now uses the 10sec v2 time instead of the
1 sec v3 time it should). That's a problem IB needs to solve. Ideally,
you wouldn't want to return from the hardware join until you can actually
send the reports, but I expect there are locks held and that can't be 1
second
of spinning on a processor. So, I think you really should put a queue in
IB for that hardware multicast address and send those packets when/if you
get positive acknowledgement (much as done for ARP completion, but maybe
queue more than 1) from the fabric that you can use it. If you don't get
any sort of ACK for that, then you can instrument a delay for it, but
any fixed number you use may be either too big or too small for a
particular fabric.
+-DLS
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists