lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20161117155753.17b76f5a@redhat.com>
Date:   Thu, 17 Nov 2016 15:57:53 +0100
From:   Jesper Dangaard Brouer <brouer@...hat.com>
To:     Eric Dumazet <eric.dumazet@...il.com>
Cc:     Rick Jones <rick.jones2@....com>, netdev@...r.kernel.org,
        brouer@...hat.com
Subject: Re: Netperf UDP issue with connected sockets

On Thu, 17 Nov 2016 06:17:38 -0800
Eric Dumazet <eric.dumazet@...il.com> wrote:

> On Thu, 2016-11-17 at 14:42 +0100, Jesper Dangaard Brouer wrote:
> 
> > I can see that qdisc layer does not activate xmit_more in this case.
> >   
> 
> Sure. Not enough pressure from the sender(s).
> 
> The bottleneck is not the NIC or qdisc in your case, meaning that BQL
> limit is kept at a small value.
> 
> (BTW not all NIC have expensive doorbells)

I believe this NIC mlx5 (50G edition) does.

I'm seeing UDP TX of 1656017.55 pps, which is per packet:
2414 cycles(tsc) 603.86 ns

Perf top shows (with my own udp_flood, that avoids __ip_select_ident):

 Samples: 56K of event 'cycles', Event count (approx.): 51613832267
   Overhead  Command        Shared Object        Symbol
 +    8.92%  udp_flood      [kernel.vmlinux]     [k] _raw_spin_lock
   - _raw_spin_lock
      + 90.78% __dev_queue_xmit
      + 7.83% dev_queue_xmit
      + 1.30% ___slab_alloc
 +    5.59%  udp_flood      [kernel.vmlinux]     [k] skb_set_owner_w
 +    4.77%  udp_flood      [mlx5_core]          [k] mlx5e_sq_xmit
 +    4.09%  udp_flood      [kernel.vmlinux]     [k] fib_table_lookup
 +    4.00%  swapper        [mlx5_core]          [k] mlx5e_poll_tx_cq
 +    3.11%  udp_flood      [kernel.vmlinux]     [k] __ip_route_output_key_hash
 +    2.49%  swapper        [kernel.vmlinux]     [k] __slab_free

In this setup the spinlock in __dev_queue_xmit should be uncongested.
An uncongested spin_lock+unlock cost 32 cycles(tsc) 8.198 ns on this system.

But 8.92% of the time is spend on it, which corresponds to a cost of 215
cycles (2414*0.0892).  This cost is too high, thus something else is
going on... I claim this mysterious extra cost is the tailptr/doorbell.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ