[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <84115706-6eac-d4c9-a815-6def8675fb56@intel.com>
Date: Mon, 3 Jul 2023 15:38:01 +0200
From: Alexander Lobakin <aleksander.lobakin@...el.com>
To: Alexander Duyck <alexander.duyck@...il.com>
CC: "David S. Miller" <davem@...emloft.net>, Eric Dumazet
<edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>, Paolo Abeni
<pabeni@...hat.com>, Maciej Fijalkowski <maciej.fijalkowski@...el.com>,
Larysa Zaremba <larysa.zaremba@...el.com>, Yunsheng Lin
<linyunsheng@...wei.com>, Alexander Duyck <alexanderduyck@...com>, "Jesper
Dangaard Brouer" <hawk@...nel.org>, Ilias Apalodimas
<ilias.apalodimas@...aro.org>, <netdev@...r.kernel.org>,
<linux-kernel@...r.kernel.org>
Subject: Re: [PATCH RFC net-next 2/4] net: page_pool: avoid calling no-op
externals when possible
From: Alexander Duyck <alexander.duyck@...il.com>
Date: Fri, 30 Jun 2023 11:28:30 -0700
> On Fri, Jun 30, 2023 at 8:34 AM Alexander Lobakin
> <aleksander.lobakin@...el.com> wrote:
[...]
>> On my setup and with patch #4, I have literally 0 allocations once a
>> ring is filled. This means dma_need_sync() is not called at all during
>> Rx, while sync_for_device() would be called all the time.
>> When pages go through ptr_ring, sometimes new allocations happen, but
>> still the number of times dma_need_sync() is called is thousands times
>> lower.
>
> I see, so you are using it as a screener for pages as they are added
> to the pool. However the first time somebody trips the dma_need_sync
> then everybody in the pool is going to be getting hit with the sync
> code.
Right. "Better safe than sorry". If at least one page needs sync, let's
drop the shortcut. It won't make it worse than the mainline anyway.
>
>>> dma_need_sync for every frame then maybe we should look at folding it
>>> into page_pool_dma_sync_for_device itself since that is the only
>>> consumer of it it or just fold it into the PP_FLAG_DMA_SYNC_DEV if
>>> statement after the flag check rather than adding yet another flag
>>> that will likely always be true for most devices. Otherwise you are
>>
>> What you suggest is either calling dma_need_sync() each time a page is
>> requested or introducing a flag to store it somewhere in struct page to
>> allow some optimization for really-not-common-cases when dma_need_sync()
>> might return different values due to swiotlb etc. Did I get it right?
>
> Yeah, my thought would be to have a flag in the page to indicate if it
> will need the sync bits or not. Then you could look at exposing that
> to the drivers as well so that they could cut down on their own
> overhead. We could probably look at just embedding a flag in the lower
> bits of the DMA address stored in the page since I suspect at a
> minimum the resultant DMA address for a page would always be at least
> aligned to a long if not a full page.
As for drivers, I could add a wrapper like page_pool_need_sync() to test
for DMA_SYNC_DEV. I'm not sure it's worth it to check on a per-page basis.
Also, having that bit in the struct page forces us to always fetch it to
a cacheline. Right now, if the sync is skipped, this also avoids
touching struct page or at least postpones it. page_address() (for
&xdp_buff or build_skb()) doesn't touch it.
As for possible implementation, I also thought of the lowest bit of DMA
address. It probably can be lower than %PAGE_SIZE in some cases (at
least non-PP), but not to the 1-byte granularity.
But again, we need some group decision on if it's worth to do on a
per-page basis :)
Thanks,
Olek
Powered by blists - more mailing lists