[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <875xkj1t70.fsf@toke.dk>
Date: Sat, 08 Mar 2025 15:40:51 +0100
From: Toke Høiland-Jørgensen <toke@...hat.com>
To: Yunsheng Lin <yunshenglin0825@...il.com>, Yunsheng Lin
<linyunsheng@...wei.com>, davem@...emloft.net, kuba@...nel.org,
pabeni@...hat.com
Cc: zhangkun09@...wei.com, liuyonglong@...wei.com, fanghaiqing@...wei.com,
Alexander Lobakin <aleksander.lobakin@...el.com>, Robin Murphy
<robin.murphy@....com>, Alexander Duyck <alexander.duyck@...il.com>,
Andrew Morton <akpm@...ux-foundation.org>, Gaurav Batra
<gbatra@...ux.ibm.com>, Matthew Rosato <mjrosato@...ux.ibm.com>, IOMMU
<iommu@...ts.linux.dev>, MM <linux-mm@...ck.org>, Alexei Starovoitov
<ast@...nel.org>, Daniel Borkmann <daniel@...earbox.net>, Jesper Dangaard
Brouer <hawk@...nel.org>, John Fastabend <john.fastabend@...il.com>,
Matthias Brugger <matthias.bgg@...il.com>, AngeloGioacchino Del Regno
<angelogioacchino.delregno@...labora.com>, netdev@...r.kernel.org,
intel-wired-lan@...ts.osuosl.org, bpf@...r.kernel.org,
linux-kernel@...r.kernel.org, linux-arm-kernel@...ts.infradead.org,
linux-mediatek@...ts.infradead.org, Eric Dumazet <edumazet@...gle.com>
Subject: Re: [PATCH net-next v11 0/4] fix the DMA API misuse problem for
page_pool
Yunsheng Lin <yunshenglin0825@...il.com> writes:
> On 3/7/2025 10:15 PM, Toke Høiland-Jørgensen wrote:
>
> ...
>
>>
>> You are making this incredibly complicated. You've basically implemented
>> a whole new slab allocator for those page_pool_item objects, and you're
>> tracking every page handed out by the page pool instead of just the ones
>> that are DMA-mapped. None of this is needed.
> > > I took a stab at implementing the xarray-based tracking first suggested
>> by Mina[0]:
>
> I did discuss Mina' suggestion with Ilias below in case you didn't
> notice:
> https://lore.kernel.org/all/0ef315df-e8e9-41e8-9ba8-dcb69492c616@huawei.com/
I didn't; thanks for the pointer. See below.
> Anyway, It is great that you take the effort to actually implement
> the idea to have some more concrete comparison here.
:)
>>
>> https://git.kernel.org/toke/c/e87e0edf9520
>>
>> And, well, it's 50 lines of extra code, none of which are in the fast
>> path.
>
> I wonder what is the overhead for the xarray idea regarding the
> time_bench_page_pool03_slow() testcase before we begin to discuss
> if xarray idea is indeed possible.
Well, just running that benchmark shows no impact:
| | Baseline | xarray |
| | Cycles | ns | Cycles | ns |
|-------------------------------+----------+--------+--------+--------|
| no-softirq-page_pool01 | 20 | 5.713 | 19 | 5.516 |
| no-softirq-page_pool02 | 56 | 15.560 | 57 | 15.864 |
| no-softirq-page_pool03 | 225 | 62.763 | 222 | 61.728 |
| tasklet_page_pool01_fast_path | 19 | 5.399 | 19 | 5.505 |
| tasklet_page_pool02_ptr_ring | 54 | 15.090 | 54 | 15.018 |
| tasklet_page_pool03_slow | 238 | 66.134 | 239 | 66.498 |
...however, the benchmark doesn't actually do any DMA mapping, so it's
not super surprising that it doesn't show any difference: it's not
exercising any of the xarray code. Your series shows a difference on
this benchmark only because it does the page_pool_item allocation
regardless of whether DMA is used or not.
I guess we should try to come up with a micro-benchmark that does
exercise the DMA code. Or just hack up the xarray patch to do the
tracking regardless, for benchmarking purposes.
>> Jesper has kindly helped with testing that it works for normal packet
>> processing, but I haven't yet verified that it resolves the original
>> crash. Will post the patch to the list once I have verified this (help
>> welcome!).
>
> RFC seems like a good way to show and discuss the basic idea.
Sure, I can send it as an RFC straight away if you prefer. Note that I'm
on my way to netdevconf, though, so will probably have limited time to
pay attention to this for the next week or so.
> I only took a glance at git code above, it seems reusing the
> _pp_mapping_pad for pp_dma_index seems like a wrong direction
> as mentioned in discussion with Ilias above as the field might
> be used when a page is mmap'ed to user space, and reusing that
> field in 'struct page' seems to disable the tcp_zerocopy feature,
> see the below commit from Eric:
> https://github.com/torvalds/linux/commit/577e4432f3ac810049cb7e6b71f4d96ec7c6e894
>
> Also, I am not sure if a page_pool owned page can be spliced into the fs
> subsystem yet, but if it does, I am not sure how is reusing the
> page->mapping possible if that page is called in __filemap_add_folio()?
>
> https://elixir.bootlin.com/linux/v6.14-rc5/source/mm/filemap.c#L882
Hmm, so I did look at the mapping field, but concluded using it wouldn't
interfere with anything relevant as long as it's reset back to zero
before the page is returned to the page allocator. However, I definitely
missed the TCP zero-copy thing, and other things as well, it would seem
(cf the discussion you referred to above).
However, I did consider alternatives: AFAICT there should be space in
the pp_magic field (used for the PP_SIGNATURE), so that with a bit of
care we can stick an ID into the upper bits and still avoid ending up
with a value that could look like a valid pointer.
I didn't implement that initially because I wasn't sure it was
necessary, but seeing as it is, I will take another look at it. I have
one or two other ideas if this turns out not to pan out.
-Toke
Powered by blists - more mailing lists