lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <1403842621.15139.2.camel@edumazet-glaptop2.roam.corp.google.com>
Date:	Thu, 26 Jun 2014 21:17:01 -0700
From:	Eric Dumazet <eric.dumazet@...il.com>
To:	Tom Herbert <therbert@...gle.com>
Cc:	Linux Netdev List <netdev@...r.kernel.org>
Subject: Re: Performance loss with GRO enabled on tunnels

On Thu, 2014-06-26 at 17:59 -0700, Tom Herbert wrote:
> I'm seeing quite a performance difference with GRO enabled/disabled on
> the tun interface for GRE and IPIP. Physical interface is bnx2x with
> lro enable and GRO disabled. Looks like tun interface inherits GRO as
> an "always supported SW feature".
> 
> 200 connection TCP_RR with GRE and no GRO on tun0
>   1.06046e+06 tps
>   71.06% CPU utilization
> 
> With GRO enabled on tun0
>   406879 tps
>   28.14% CPU utilization
> 
> Given CPU utilization is not particularly high, so I would guess
> things are being slowed down by something like lock contention.
> 
> Generally, I wonder if there's really any value on enabling GRO on the
> tunnel interface anyway, seems like GRO is going to be most beneficial
> if we do this at the physical device-- if we're aggregating at the
> tunnel interface we've already done a lot of processing on the
> individual packets.

GRO is enabled on all interfaces by default.

There is nothing special about tunnels to 'turn it off by default'

Here at Google, we cooked patches to address this problem, please take a
look at Google-Bug-Id: 13655458


commit 974ca81a5948be5d99be679296f2929d70131422
Author: Eric Dumazet <edumazet@...gle.com>
Date:   Sat May 31 17:36:15 2014 -0700

    net-gro: use one gro_cell per cpu
    
    GRO layer used on GRE tunnels uses skb->queue_mapping as a hash
    to select a gro_cell from an array of 8 available cells.
    
    Unfortunately queue_mapping can be cleared before reaching this
    selection, and even if not cleared, its possible all GRE packets
    are received into a single RX queue on the NIC.
    
    If we believe RPS/RFS did a good job, then we simply should use
    one gro_cell per cpu, and use current cpu id instead of a field
    taken from skb. This helps because the overhead of TCP checksum
    validation can be spread among all cpus.
    
    Tested:
     Ran a tunnel on lpk51/lpk52 (bnx2x 2x10Gb NIC), and tested
     super_netperf 200 -t TCP_STREAM to check no lock contention
     was happening on the gro_cell's spinlock
    Effort: net-gro
    Google-Bug-Id: 13655458
    Change-Id: Ie260771992c40bb23dcdd53476954f8436fdf097


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ