netdev - Re: IGMP Join dropping multicast packets

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <91bdcedb0903172050td2ef895he48168987ad94472@mail.gmail.com>
Date:	Tue, 17 Mar 2009 22:50:21 -0500
From:	Dave Boutcher <daveboutcher@...il.com>
To:	Eric Dumazet <dada1@...mosbay.com>
Cc:	netdev@...r.kernel.org
Subject: Re: IGMP Join dropping multicast packets

On Mon, Mar 16, 2009 at 2:01 PM, Eric Dumazet <dada1@...mosbay.com> wrote:
> Dave Boutcher a écrit :
>> On Sat, Mar 14, 2009 at 9:37 PM, Eric Dumazet <dada1@...mosbay.com> wrote:
>>> Dave Boutcher a écrit :
>>>> I'm running into an interesting problem with joining multiple
>>>> multicast feeds.  If you join multiple multicast feeds using
>>>> setsockopt(...,IP_ADD_MEMBERSHIP...) it causes packets on UNRELATED
>>>> multicast feeds to get dropped.  We have a multicast feed on a rock
>>>> solid network, and we were very surprised to see dropped packets.  The
>>>> cause was a different process/program being run by a different user
>>>> joining a bunch of mulitcast feeds.
>>> I could not reproduce the problem on my machines (bnx2 adapter), even if changing
>>> NUMSOCK from 55 to 200 in joiner.c
>>
>> Thanks for trying Eric.  Based on your email I did some more testing
>> and thus far I've
>> only recreated this on x86_64 arches, not on i386.  Which arch did you
>> try it on?
>
> I tried both, 32 and 64 bit kernels. No problems so far.
>
> Could you post a linux kernel .config of a non 'working' machine, and dmesg output ?

Eric, based on your inability to recreate this, I tried on some other
hardware I had lying around that has an AMD chipset built-in NIC.
I could not recreate the problem on that hardware.  I'm starting to
think this is an e1000 problem.  In both the e1000 and e1000e
drivers they do the following logic:

      /* clear the old settings from the multicast hash table */

       for (i = 0; i < mta_reg_count; i++) {
               E1000_WRITE_REG_ARRAY(hw, MTA, i, 0);
               E1000_WRITE_FLUSH();
       }

       /* load any remaining addresses into the hash table */

       for (; mc_ptr; mc_ptr = mc_ptr->next) {
               hash_value = e1000_hash_mc_addr(hw, mc_ptr->da_addr);
               e1000_mta_set(hw, hash_value);
       }

There's clearly a window where the NIC doesn't have the multicast
addresses loaded.  This may just be broken-as-designed.  If anyone
else happens to have some e1000 hardware and wants to see if you
can recreate this, I'd be curious.

Some other notes just FYI...

- RcvbufErrors in /proc/net/snmp doesn't get incremented when this happens
- there are no messages in dmesg
- frames get dropped when the program calls exit() and all the sockets
get closed
  (and multicast joins dropped) as well as when the ADD_MEMBERSHIPs happen
- The problem happens even when adding a sleep(1) in between each of the
  ADD_MEMBERSHIP calls.

-- 
Dave B
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html