netdev - Re: [RFC 00/12] net: huge page backed page

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <74430f6d-b253-989c-4a31-059ffbe83063@huawei.com>
Date: Fri, 14 Jul 2023 21:05:11 +0800
From: Yunsheng Lin <linyunsheng@...wei.com>
To: Jakub Kicinski <kuba@...nel.org>, Jesper Dangaard Brouer
	<jbrouer@...hat.com>
CC: <brouer@...hat.com>, <netdev@...r.kernel.org>, <almasrymina@...gle.com>,
	<hawk@...nel.org>, <ilias.apalodimas@...aro.org>, <edumazet@...gle.com>,
	<dsahern@...il.com>, <michael.chan@...adcom.com>, <willemb@...gle.com>
Subject: Re: [RFC 00/12] net: huge page backed page_pool

On 2023/7/13 1:01, Jakub Kicinski wrote:
> On Wed, 12 Jul 2023 14:43:32 +0200 Jesper Dangaard Brouer wrote:
>> On 12/07/2023 13.47, Yunsheng Lin wrote:
>>> On 2023/7/12 8:08, Jakub Kicinski wrote:  
>>>> Oh, I split the page into individual 4k pages after DMA mapping.
>>>> There's no need for the host memory to be a huge page. I mean,
>>>> the actual kernel identity mapping is a huge page AFAIU, and the
>>>> struct pages are allocated, anyway. We just need it to be a huge
>>>> page at DMA mapping time.
>>>>
>>>> So the pages from the huge page provider only differ from normal
>>>> alloc_page() pages by the fact that they are a part of a 1G DMA
>>>> mapping.  
>>
>> So, Jakub you are saying the PP refcnt's are still done "as usual" on 
>> individual pages.
> 
> Yes - other than coming from a specific 1G of physical memory 
> the resulting pages are really pretty ordinary 4k pages.
> 
>>> If it is about DMA mapping, is it possible to use dma_map_sg()
>>> to enable a big continuous dma map for a lot of discontinuous
>>> 4k pages to avoid allocating big huge page?
>>>
>>> As the comment:
>>> "The scatter gather list elements are merged together (if possible)
>>> and tagged with the appropriate dma address and length."
>>>
>>> https://elixir.free-electrons.com/linux/v4.16.18/source/arch/arm/mm/dma-mapping.c#L1805
>>>   
>>
>> This is interesting for two reasons.
>>
>> (1) if this DMA merging helps IOTLB misses (?)
> 
> Maybe I misunderstand how IOMMU / virtual addressing works, but I don't
> see how one can merge mappings from physically non-contiguous pages.
> IOW we can't get 1G-worth of random 4k pages and hope that thru some
> magic they get strung together and share an IOTLB entry (if that's
> where Yunsheng's suggestion was going..)

>From __arm_lpae_map(), it does seems that smmu in arm can install
pte in different level to point to page of different size.

> 
>> (2) PP could use dma_map_sg() to amortize dma_map call cost.
>>
>> For case (2) __page_pool_alloc_pages_slow() already does bulk allocation
>> of pages (alloc_pages_bulk_array_node()), and then loops over the pages
>> to DMA map them individually.  It seems like an obvious win to use
>> dma_map_sg() here?

For mapping, the above should work, the tricky problem is we need to ensure
all pages belonging to the same big dma mapping is released before we can do
the dma unmapping.

> 
> That could well be worth investigating!
> .
>