netdev - Re: [RFC 00/12] net: huge page backed page

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <8b50a49e-5df8-dccd-154e-4423f0e8eda5@redhat.com>
Date: Wed, 12 Jul 2023 14:43:32 +0200
From: Jesper Dangaard Brouer <jbrouer@...hat.com>
To: Yunsheng Lin <linyunsheng@...wei.com>, Jakub Kicinski <kuba@...nel.org>,
 Jesper Dangaard Brouer <jbrouer@...hat.com>
Cc: brouer@...hat.com, netdev@...r.kernel.org, almasrymina@...gle.com,
 hawk@...nel.org, ilias.apalodimas@...aro.org, edumazet@...gle.com,
 dsahern@...il.com, michael.chan@...adcom.com, willemb@...gle.com
Subject: Re: [RFC 00/12] net: huge page backed page_pool



On 12/07/2023 13.47, Yunsheng Lin wrote:
> On 2023/7/12 8:08, Jakub Kicinski wrote:
>> On Tue, 11 Jul 2023 17:49:19 +0200 Jesper Dangaard Brouer wrote:
>>> I see you have discovered that the next bottleneck are the IOTLB misses.
>>> One of the techniques for reducing IOTLB misses is using huge pages.
>>> Called "super-pages" in article (below), and they report that this trick
>>> doesn't work on AMD (Pacifica arch).
>>>
>>> I think you have convinced me that the pp_provider idea makes sense for
>>> *this* use-case, because it feels like natural to extend PP with
>>> mitigations for IOTLB misses. (But I'm not 100% sure it fits Mina's
>>> use-case).
>>
>> We're on the same page then (no pun intended).
>>
>>> What is your page refcnt strategy for these huge-pages. I assume this
>>> rely on PP frags-scheme, e.g. using page->pp_frag_count.
>>> Is this correctly understood?
>>
>> Oh, I split the page into individual 4k pages after DMA mapping.
>> There's no need for the host memory to be a huge page. I mean,
>> the actual kernel identity mapping is a huge page AFAIU, and the
>> struct pages are allocated, anyway. We just need it to be a huge
>> page at DMA mapping time.
>>
>> So the pages from the huge page provider only differ from normal
>> alloc_page() pages by the fact that they are a part of a 1G DMA
>> mapping.

So, Jakub you are saying the PP refcnt's are still done "as usual" on 
individual pages.

> 
> If it is about DMA mapping, is it possible to use dma_map_sg()
> to enable a big continuous dma map for a lot of discontinuous
> 4k pages to avoid allocating big huge page?
> 
> As the comment:
> "The scatter gather list elements are merged together (if possible)
> and tagged with the appropriate dma address and length."
> 
> https://elixir.free-electrons.com/linux/v4.16.18/source/arch/arm/mm/dma-mapping.c#L1805
> 

This is interesting for two reasons.

(1) if this DMA merging helps IOTLB misses (?)

(2) PP could use dma_map_sg() to amortize dma_map call cost.

For case (2) __page_pool_alloc_pages_slow() already does bulk allocation
of pages (alloc_pages_bulk_array_node()), and then loops over the pages
to DMA map them individually.  It seems like an obvious win to use
dma_map_sg() here?

--Jesper