lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20181101152716.GA13895@intel.com>
Date:   Thu, 1 Nov 2018 23:27:16 +0800
From:   Aaron Lu <aaron.lu@...el.com>
To:     Jesper Dangaard Brouer <brouer@...hat.com>
Cc:     Paweł Staszewski <pstaszewski@...are.pl>,
        Eric Dumazet <eric.dumazet@...il.com>,
        netdev <netdev@...r.kernel.org>,
        Tariq Toukan <tariqt@...lanox.com>,
        Ilias Apalodimas <ilias.apalodimas@...aro.org>,
        Yoel Caspersen <yoel@...knet.dk>,
        Mel Gorman <mgorman@...hsingularity.net>
Subject: Re: Kernel 4.19 network performance - forwarding/routing normal
 users traffic

On Thu, Nov 01, 2018 at 10:22:13AM +0100, Jesper Dangaard Brouer wrote:
... ...
> Section copied out:
> 
>   mlx5e_poll_tx_cq
>   |          
>    --16.34%--napi_consume_skb
>              |          
>              |--12.65%--__free_pages_ok
>              |          |          
>              |           --11.86%--free_one_page
>              |                     |          
>              |                     |--10.10%--queued_spin_lock_slowpath
>              |                     |          
>              |                      --0.65%--_raw_spin_lock

This callchain looks like it is freeing higher order pages than order 0:
__free_pages_ok is only called for pages whose order are bigger than 0.

>              |          
>              |--1.55%--page_frag_free
>              |          
>               --1.44%--skb_release_data
> 
> 
> Let me explain what (I think) happens.  The mlx5 driver RX-page recycle
> mechanism is not effective in this workload, and pages have to go
> through the page allocator.  The lock contention happens during mlx5
> DMA TX completion cycle.  And the page allocator cannot keep up at
> these speeds.
> 
> One solution is extend page allocator with a bulk free API.  (This have
> been on my TODO list for a long time, but I don't have a
> micro-benchmark that trick the driver page-recycle to fail).  It should
> fit nicely, as I can see that kmem_cache_free_bulk() does get
> activated (bulk freeing SKBs), which means that DMA TX completion do
> have a bulk of packets. 
> 
> We can (and should) also improve the page recycle scheme in the driver.
> After LPC, I have a project with Tariq and Ilias (Cc'ed) to improve the
> page_pool, and we will (attempt) to generalize this, for both high-end
> mlx5 and more low-end ARM64-boards (macchiatobin and espressobin).
> 
> The MM-people is in parallel working to improve the performance of
> order-0 page returns.  Thus, the explicit page bulk free API might
> actually become less important.  I actually think (Cc.) Aaron have a
> patchset he would like you to test, which removes the (zone->)lock
> you hit in free_one_page().

Thanks Jesper.

Yes, the said patchset is in this branch:
https://github.com/aaronlu/linux no_merge_cluster_alloc_4.19-rc5

But as I said above, I think the lock contention here is for
order > 0 pages so my current patchset will not work here, unfortunately.

BTW, Mel Gorman has suggested an alternative way to improve page
allocator's scalability and I'm working on it right now, it will
improve page allocator's scalability for all order pages. I might be
able to post it some time next week, will CC all of you when it's ready.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ