lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <87vd5z5mh1.fsf@small.ssi.corp>
Date:	Tue, 21 Sep 2010 14:07:22 +0200
From:	arno@...isbad.org (Arnaud Ebalard)
To:	Jeff Kirsher <jeffrey.t.kirsher@...el.com>,
	Jesse Brandeburg <jesse.brandeburg@...el.com>,
	Bruce Allan <bruce.w.allan@...el.com>,
	Alex Duyck <alexander.h.duyck@...el.com>,
	PJ Waskiewicz <peter.p.waskiewicz.jr@...el.com>,
	John Ronciak <john.ronciak@...el.com>
Cc:	netdev@...r.kernel.org, Brian Haley <brian.haley@...com>,
	Alexey Kuznetsov <kuznet@....inr.ac.ru>,
	Stefan Rompf <sux@...lof.de>,
	David Miller <davem@...emloft.net>
Subject: [BUG,E1000E] first packets after device is reported up are silently dropped

Hi,

When the link is reported up again (after plugging a cable) by the
driver (E1000E) the first packets sent immediately after that event
are sometimes *silently* dropped by the hardware.

Before describing the tests, here are some info on the hardware and
software. Don't hesitate to ask if you need more:
 
  Kernel: 2.6.35.4
 
  Hardware: Intel 82567LM (rev3) Gigagbit adapter on a DELL E4300
  Here is what ethtool reports for the driver:
   driver: e1000e
   version: 1.0.2-k4
   firmware-version: 1.7-7
   bus-info: 0000:00:19.0

  Switch: tested with a Cisco Catalyst 2960 (100Mbits/s), Planex
          FX08-Mini (100Mbit/s), PLanex 5 ports Gigabit
 
The setup is pretty simple: two different userland tools (umip and
netplug) monitor netlink NEWLINK events and respectively send an ICMPv6
Router Solicitation packet and an IPv4 DHCP request when they receive
the information the interface is UP and RUNNING:

00:21:70:bd:ef:fc > 33:33:00:00:00:02, ethertype IPv6 (0x86dd), length 62: :: > ff02::2: ICMP6, router solicitation, length 8
00:21:70:bd:ef:fc > ff:ff:ff:ff:ff:ff, ethertype IPv4 (0x0800), length 342: 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from 

The idea is to have address autoconfiguration performed as soon as
possible. Because the reemission occurs only after a few seconds, the
net result is a long delay.

I noticed that sometimes the first few packets emitted by the tools are
not answered. I put a tcpdump on the other side. Nothing arrives. I even
checked the led on the switch. Does not blink.

I first thought adding a few ms of delay beteween the reception of the
NEWLINK and the emission of the packets. It seems the higher the better
but at *550ms* I still managed to have the initial packet dropped from
time to time.

I then spent time in the kernel (net/core/dev.c, net/sched/sch_generic.c
drivers/net/e1000e/netdev.c) following the first packet to see where it
gets dropped. I ended up in e1000_xmit_frame() in which everything seems
to be ok. AFAICT, the packet is delivered to the hardware and then
silently killed for some unknown reason.

I added various debug statements in the code (custom printk(),
calls to e1000e_dump()) to try and understand what can be different in
the driver's state when the first packet is deliver and when it is
not. Nothing interesting.

I am currently out of idea. Is this a known bug? What can be happening?
If you have patches you want me to test to get additional info, don't
hesitate!

Cheers,

a+
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ