lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20130524154931.GA9245@sbohrermbp13-local.rgmadvisors.com>
Date:	Fri, 24 May 2013 10:49:31 -0500
From:	Shawn Bohrer <shawn.bohrer@...il.com>
To:	netdev@...r.kernel.org
Cc:	Or Gerlitz <or.gerlitz@...il.com>,
	Hadar Hen Zion <hadarh@...lanox.com>,
	Rony Efraim <ronye@...lanox.com>,
	Amir Vadai <amirv@...lanox.com>
Subject: 3.10.0-rc2 mlx4 not receiving packets for some multicast groups

I just started testing the 3.10 kernel, previously we were on 3.4 so
there is a fairly large jump.  I've additionally applied the following
four patches to the 3.10.0-rc2 kernel that I'm testing:

https://patchwork.kernel.org/patch/2484651/
https://patchwork.kernel.org/patch/2484671/
https://patchwork.kernel.org/patch/2484681/
https://patchwork.kernel.org/patch/2484641/

I don't know if those patches are related to my issues or not but I
plan on trying to reproduce without them soon.

The issue I'm seeing is that our applications listen on a number of
multicast addresses.  In this case I'm listening to about 350
different addresses per machine, across many different processes, with
usually one socket per address.  The problem is that some of the
sockets are not receiving any data and some are, even though they all
should be.  If I put the device in promiscuous mode then I start
receiving data on all of my sockets.  Running netstat -g shows all of
my memberships so it appears to me that the kernel and the switch
think I've joined the groups, but the card may be filtering the data.
This is with:

05:00.0 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3]

# ethtool -i eth4
driver: mlx4_en
version: 2.0 (Dec 2011)
firmware-version: 2.11.500
bus-info: 0000:05:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: no
supports-register-dump: no

The other strange part is that I've got multiple machines all running
the same kernel and not all of them are experiencing the issue.  At
one point they were all working fine, but the issue appeared after I
rebooted one of the machines and multiple reboots later it is still in
this bad state.  Rebooting that machine back to 3.4 causes it to work
as expected but no luck under 3.10.  I've now got two machines in this
bad state and they both started immediately after a reboot.

Does anyone have any ideas?

Thanks,
Shawn
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ