[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAGsJ_4xb--mwsPHVFXzcpnZ29Wh8N-OTZNyNVW2CZd-U00A_ww@mail.gmail.com>
Date: Tue, 9 Dec 2025 19:37:03 +0800
From: Barry Song <21cnbao@...il.com>
To: Ryan Roberts <ryan.roberts@....com>
Cc: gao xu <gaoxu2@...or.com>, "sumit.semwal@...aro.org" <sumit.semwal@...aro.org>,
Benjamin Gaignard <benjamin.gaignard@...labora.com>, Brian Starkey <Brian.Starkey@....com>,
John Stultz <jstultz@...gle.com>, "T.J. Mercier" <tjmercier@...gle.com>,
Christian König <christian.koenig@....com>,
"linux-media@...r.kernel.org" <linux-media@...r.kernel.org>,
"dri-devel@...ts.freedesktop.org" <dri-devel@...ts.freedesktop.org>,
"linaro-mm-sig@...ts.linaro.org" <linaro-mm-sig@...ts.linaro.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>, "surenb@...gle.com" <surenb@...gle.com>,
zhouxiaolong <zhouxiaolong9@...or.com>
Subject: Re: [RFC] dma-buf: system_heap: add PTE_CONT for larger contiguous
On Mon, Dec 8, 2025 at 6:38 PM Ryan Roberts <ryan.roberts@....com> wrote:
>
> On 08/12/2025 09:52, Barry Song wrote:
> > On Mon, Dec 8, 2025 at 5:41 PM gao xu <gaoxu2@...or.com> wrote:
> >>
> >> commit 04c7adb5871a ("dma-buf: system_heap: use larger contiguous mappings
> >> instead of per-page mmap") facilitates the use of PTE_CONT. The system_heap
> >> allocates pages of order 4 and 8 that meet the alignment requirements for
> >> PTE_CONT. enabling PTE_CONT for larger contiguous mappings.
> >
> > Unfortunately, we don't have pte_cont for architectures other than
> > AArch64. On the other hand, AArch64 isn't automatically mapping
> > cont_pte for mmap. It might be better if this were done
> > automatically by the ARM code.
>
> Yes indeed; CONT_PTE_MASK and PTE_CONT are arm64-specific macros that cannot be
> used outside of the arm64 arch code.
>
> >
> > Ryan(Cced) is the expert on automatically setting cont_pte for
> > contiguous mapping, so let's ask for some advice from Ryan.
>
> arm64 arch code will automatically and transparently apply PTE_CONT whenever it
> detects suitable conditions. Those suitable conditions include:
>
> - physically contiguous block of 64K, aligned to 64K
> - virtually contiguous block of 64K, aligned to 64K
> - 64K block has the same access permissions
> - 64K block all belongs to the same folio
> - not a special mapping
>
> The last 2 requirements are the tricky ones here: We require that every page in
> the block belongs to the same folio because a contigous mapping only maintains a
> single access and dirty bit for the whole 64K block, so we are losing fidelity
> vs per-page mappings. But the kernel tracks access/dirty per folio, so the extra
> fidelity we get for per-page mappings is ingored by the kernel anyway if the
> contiguous mapping only maps pages from a single folio. We reject special
> mappings because they are not backed by a folio at all.
>
> For your case, remap_pfn_range() will create special mappings so we will never
> set the PTE_CONT bit.
>
> Likely we are being a bit too conservative here and we may be able to relax this
> requirement if we know that nothing will ever consume the access/dirty
> information for special mappings? I'm not if that is the case in general though
> - it would need some investigation.
>
> With that issue resolved, there is still a second issue; there are 2 ways the
> arm64 arch code detects suitable contiguous mappings. The primary way is via a
> call to set_ptes(). This part of the "PTE batching" API and explicitly tells the
> implementaiton that all the conditions are met (including the memory being
> backed by a folio). This is the most efficient approach. See contpte_set_ptes().
>
> There is a second (hacky) approach which attempts to recognise when the last PTE
> of a contiguous block is set and automatically "fold" the mapping. See
> contpte_try_fold(). This approach has a cost because (for systems without
> BBML2_NOABORT) we have to issue a TLBI when we fold the range.
>
> For remap_pfn_range(), we would be relying on the second approach since it is
> not currently batched (and could not use set_ptes() as currently spec'ed due to
> there being no folio). If we are going to add support for contiguous pfn-mapped
> PTEs, it would be preferable to add equivalent batching APIs (or relax set_ptes()).
>
Thanks a lot, Ryan. It seems quite tricky to support automatic cont_pte.
> I think this would be a useful improvement, but it's not as straightforward as
> adding PTE_CONT in system_heap_mmap().
Since it's just a driver, I'm not sure if it's acceptable to use CONFIG_ARM64.
However, I can find many instances of it in drivers.
drivers % git grep CONFIG_ARM64 | wc -l
127
On the other hand, a corner case is when the dma-buf is partially unmapped.
I assume cont_pte can still be automatically unfolded, even for
special mappings?
Thanks
Barry
Powered by blists - more mailing lists