netdev - Re: Problems with dropped packets on bonded interface for 3.x kernels

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1321857123.17419.2.camel@edumazet-laptop>
Date:	Mon, 21 Nov 2011 07:32:03 +0100
From:	Eric Dumazet <eric.dumazet@...il.com>
To:	netdev@...r.kernel.org
Subject: Re: Problems with dropped packets on bonded interface for 3.x
 kernels

Le dimanche 20 novembre 2011 à 23:16 -0600, Albert Chin a écrit :
> I'm running Ubuntu 11.10 on an Intel SR2625URLXR system with an Intel
> S5520UR motherboard and an internal Intel E1G44HT (I340-T4) Quad Port
> Server Adapter. I am seeing dropped packets on a bonded interface,
> comprised of two GigE ports on the Intel E1G44HT Quad Port Server
> Adapter. The following kernels exhibit this problem:
>   3.0.0-12-server, 3.0.0-13-server, 3.1.0-2-server, 3.2.0-rc2
> Installing Fedora 16 with a 3.1.1-1.fc16.x86_64 also showed dropped
> packets.
> 
> I also tried RHEL6 with a 2.6.32-131.17.1.el6.x86_64 kernel and didn't
> see any dropped packets. Testing an older 2.6.32-28.55-generic Ubuntu
> kernel also didn't show any dropped packets.
> 
> So, with 2.6, I don't see dropped packets, but everything including
> 3.0 and after show dropped packets.
> 
> # ifconfig bond0
> bond0     Link encap:Ethernet  HWaddr 00:1b:21:d3:f6:0a  
>           inet6 addr: fe80::21b:21ff:fed3:f60a/64 Scope:Link
>           UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1
>           RX packets:225 errors:0 dropped:186 overruns:0 frame:0
>           TX packets:231 errors:0 dropped:0 overruns:0 carrier:0
>           collisions:0 txqueuelen:0 
>           RX bytes:25450 (25.4 KB)  TX bytes:28368 (28.3 KB)
> 
> With lacp_rate=fast, I see higher packet loss than with
> lacp_rate=slow. I've tried bonding t
> 
> This server has the following network controllers for the two internal
> NICs:
>   # lspci -vv
>   01:00.0 Ethernet controller: Intel Corporation 82575EB Gigabit Network Connection (rev 02)
>   01:00.1 Ethernet controller: Intel Corporation 82575EB Gigabit Network Connection (rev 02)
> 
> And it has the following network controllers for the four NICs on the
> I340-T4 PCI-E card:
>   # lspci -vv
>   0a:00.0 Ethernet controller: Intel Corporation 82580 Gigabit Network Connection (rev 01)
>   0a:00.1 Ethernet controller: Intel Corporation 82580 Gigabit Network Connection (rev 01)
>   0a:00.2 Ethernet controller: Intel Corporation 82580 Gigabit Network Connection (rev 01)
>   0a:00.3 Ethernet controller: Intel Corporation 82580 Gigabit Network Connection (rev 01)
> 
> I tried bonding the two 82575EB NICs rather than two NICs on the 82580
> but see the same dropped packet issue.
> 
> I have replaced the cables, tested each port individually on the
> switch without bonding, and don't see any reason to expect hardware as
> the issue. The switch is a Summit Extreme 400-48t.
> 
> I am using a 802.3ad configuration:
> # cat /proc/net/bonding/bond0
> Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)
> 
> Bonding Mode: IEEE 802.3ad Dynamic link aggregation
> Transmit Hash Policy: layer2 (0)
> MII Status: up
> MII Polling Interval (ms): 100
> Up Delay (ms): 200
> Down Delay (ms): 0
> 
> 802.3ad info
> LACP rate: fast
> Aggregator selection policy (ad_select): stable
> Active Aggregator Info:
>         Aggregator ID: 1
>         Number of ports: 1
>         Actor Key: 17
>         Partner Key: 24
>         Partner Mac Address: 00:04:96:18:54:d5
> 
> Slave Interface: eth4
> MII Status: up
> Speed: 1000 Mbps
> Duplex: full
> Link Failure Count: 0
> Permanent HW addr: 00:1b:21:d3:f6:0a
> Aggregator ID: 1
> Slave queue ID: 0
> 
> Slave Interface: eth5
> MII Status: up
> Speed: 1000 Mbps
> Duplex: full
> Link Failure Count: 0
> Permanent HW addr: 00:1b:21:d3:f6:0b
> Aggregator ID: 2
> Slave queue ID: 0
> 
> Anyone have any ideas?
> 

Old kernels were dropping some packets (unknown protocols...) without
counting them.

So following patch was added in 2.6.37 :

You could use tcdpump to identify what are these dropped packets :)

commit caf586e5f23cebb2a68cbaf288d59dbbf2d74052
Author: Eric Dumazet <eric.dumazet@...il.com>
Date:   Thu Sep 30 21:06:55 2010 +0000

    net: add a core netdev->rx_dropped counter
    
    In various situations, a device provides a packet to our stack and we
    drop it before it enters protocol stack :
    - softnet backlog full (accounted in /proc/net/softnet_stat)
    - bad vlan tag (not accounted)
    - unknown/unregistered protocol (not accounted)
    
    We can handle a per-device counter of such dropped frames at core level,
    and automatically adds it to the device provided stats (rx_dropped), so
    that standard tools can be used (ifconfig, ip link, cat /proc/net/dev)
    
    This is a generalization of commit 8990f468a (net: rx_dropped
    accounting), thus reverting it.
    
    Signed-off-by: Eric Dumazet <eric.dumazet@...il.com>
    Signed-off-by: David S. Miller <davem@...emloft.net>


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html