lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Fri, 28 Jul 2023 20:39:24 +0800
From: Yunsheng Lin <linyunsheng@...wei.com>
To: Alexander Lobakin <aleksander.lobakin@...el.com>, "David S. Miller"
	<davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>, Jakub Kicinski
	<kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>
CC: Maciej Fijalkowski <maciej.fijalkowski@...el.com>, Larysa Zaremba
	<larysa.zaremba@...el.com>, Alexander Duyck <alexanderduyck@...com>, Jesper
 Dangaard Brouer <hawk@...nel.org>, Ilias Apalodimas
	<ilias.apalodimas@...aro.org>, Simon Horman <simon.horman@...igine.com>,
	<netdev@...r.kernel.org>, <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH net-next 6/9] page_pool: avoid calling no-op externals
 when possible

On 2023/7/27 22:43, Alexander Lobakin wrote:
> Turned out page_pool_put{,_full}_page() can burn quite a bunch of cycles
> even when on DMA-coherent platforms (like x86) with no active IOMMU or
> swiotlb, just for the call ladder.
> Indeed, it's
> 
> page_pool_put_page()
>   page_pool_put_defragged_page()                  <- external
>     __page_pool_put_page()
>       page_pool_dma_sync_for_device()             <- non-inline
>         dma_sync_single_range_for_device()
>           dma_sync_single_for_device()            <- external
>             dma_direct_sync_single_for_device()
>               dev_is_dma_coherent()               <- exit
> 
> For the inline functions, no guarantees the compiler won't uninline them
> (they're clearly not one-liners and sometimes compilers uninline even
> 2 + 2). The first external call is necessary, but the rest 2+ are done
> for nothing each time, plus a bunch of checks here and there.
> Since Page Pool mappings are long-term and for one "device + addr" pair
> dma_need_sync() will always return the same value (basically, whether it
> belongs to an swiotlb pool), addresses can be tested once right after
> they're obtained and the result can be reused until the page is unmapped.
> Define the new PP DMA sync operation type, which will mean "do DMA syncs
> for the device, but only when needed" and turn it on by default when the
> driver asks to sync pages. When a page is mapped, check whether it needs
> syncs and if so, replace that "sync when needed" back to "always do
> syncs" globally for the whole pool (better safe than sorry). As long as
> the pool has no pages requiring DMA syncs, this cuts off a good piece
> of calls and checks. When at least one page required it, the pool
> conservatively falls back to "always call sync functions", no per-page
> verdicts. It's a fairly rare case anyway that only a few pages would
> require syncing.
> On my x86_64, this gives from 2% to 5% performance benefit with no
> negative impact for cases when IOMMU is on and the shortcut can't be
> used.
> 

It seems other subsystem may have the similar problem as page_pool,
is it possible to implement this kind of trick in the dma subsystem
instead of every subsystem inventing their own trick?

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ