lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a7cfa596d6b9979c67fa8dbd633f7dd8293337a1.camel@mellanox.com>
Date:   Thu, 1 Nov 2018 20:23:19 +0000
From:   Saeed Mahameed <saeedm@...lanox.com>
To:     "aaron.lu@...el.com" <aaron.lu@...el.com>,
        "brouer@...hat.com" <brouer@...hat.com>
CC:     "pstaszewski@...are.pl" <pstaszewski@...are.pl>,
        "eric.dumazet@...il.com" <eric.dumazet@...il.com>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        Tariq Toukan <tariqt@...lanox.com>,
        "ilias.apalodimas@...aro.org" <ilias.apalodimas@...aro.org>,
        "yoel@...knet.dk" <yoel@...knet.dk>,
        "mgorman@...hsingularity.net" <mgorman@...hsingularity.net>
Subject: Re: Kernel 4.19 network performance - forwarding/routing normal users
 traffic

On Thu, 2018-11-01 at 23:27 +0800, Aaron Lu wrote:
> On Thu, Nov 01, 2018 at 10:22:13AM +0100, Jesper Dangaard Brouer
> wrote:
> ... ...
> > Section copied out:
> > 
> >   mlx5e_poll_tx_cq
> >   |          
> >    --16.34%--napi_consume_skb
> >              |          
> >              |--12.65%--__free_pages_ok
> >              |          |          
> >              |           --11.86%--free_one_page
> >              |                     |          
> >              |                     |--10.10%
> > --queued_spin_lock_slowpath
> >              |                     |          
> >              |                      --0.65%--_raw_spin_lock
> 
> This callchain looks like it is freeing higher order pages than order
> 0:
> __free_pages_ok is only called for pages whose order are bigger than
> 0.

mlx5 rx uses only order 0 pages, so i don't know where these high order
tx SKBs are coming from.. 

> 
> >              |          
> >              |--1.55%--page_frag_free
> >              |          
> >               --1.44%--skb_release_data
> > 
> > 
> > Let me explain what (I think) happens.  The mlx5 driver RX-page
> > recycle
> > mechanism is not effective in this workload, and pages have to go
> > through the page allocator.  The lock contention happens during
> > mlx5
> > DMA TX completion cycle.  And the page allocator cannot keep up at
> > these speeds.
> > 
> > One solution is extend page allocator with a bulk free API.  (This
> > have
> > been on my TODO list for a long time, but I don't have a
> > micro-benchmark that trick the driver page-recycle to fail).  It
> > should
> > fit nicely, as I can see that kmem_cache_free_bulk() does get
> > activated (bulk freeing SKBs), which means that DMA TX completion
> > do
> > have a bulk of packets. 
> > 
> > We can (and should) also improve the page recycle scheme in the
> > driver.
> > After LPC, I have a project with Tariq and Ilias (Cc'ed) to improve
> > the
> > page_pool, and we will (attempt) to generalize this, for both high-
> > end
> > mlx5 and more low-end ARM64-boards (macchiatobin and espressobin).
> > 
> > The MM-people is in parallel working to improve the performance of
> > order-0 page returns.  Thus, the explicit page bulk free API might
> > actually become less important.  I actually think (Cc.) Aaron have
> > a
> > patchset he would like you to test, which removes the (zone->)lock
> > you hit in free_one_page().
> 
> Thanks Jesper.
> 
> Yes, the said patchset is in this branch:
> https://github.com/aaronlu/linux no_merge_cluster_alloc_4.19-rc5
> 
> But as I said above, I think the lock contention here is for
> order > 0 pages so my current patchset will not work here,
> unfortunately.
> 
> BTW, Mel Gorman has suggested an alternative way to improve page
> allocator's scalability and I'm working on it right now, it will
> improve page allocator's scalability for all order pages. I might be
> able to post it some time next week, will CC all of you when it's
> ready.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ