linux-kernel - Re: [PATCH v4 03/13] drm/shmem-helper: Map huge pages in fault handlers

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <e19cdd3a-0d33-4c06-9c9a-7e9e2df51c4b@collabora.com>
Date: Mon, 10 Nov 2025 15:39:37 +0100
From: Loïc Molinari <loic.molinari@...labora.com>
To: Matthew Wilcox <willy@...radead.org>
Cc: Maarten Lankhorst <maarten.lankhorst@...ux.intel.com>,
 Maxime Ripard <mripard@...nel.org>, Thomas Zimmermann <tzimmermann@...e.de>,
 David Airlie <airlied@...il.com>, Simona Vetter <simona@...ll.ch>,
 Jani Nikula <jani.nikula@...ux.intel.com>,
 Joonas Lahtinen <joonas.lahtinen@...ux.intel.com>,
 Rodrigo Vivi <rodrigo.vivi@...el.com>, Tvrtko Ursulin
 <tursulin@...ulin.net>, Boris Brezillon <boris.brezillon@...labora.com>,
 Rob Herring <robh@...nel.org>, Steven Price <steven.price@....com>,
 Liviu Dudau <liviu.dudau@....com>, Melissa Wen <mwen@...lia.com>,
 Maíra Canal <mcanal@...lia.com>,
 Hugh Dickins <hughd@...gle.com>, Baolin Wang
 <baolin.wang@...ux.alibaba.com>, Andrew Morton <akpm@...ux-foundation.org>,
 Al Viro <viro@...iv.linux.org.uk>, Mikołaj Wasiak
 <mikolaj.wasiak@...el.com>, Christian Brauner <brauner@...nel.org>,
 Nitin Gote <nitin.r.gote@...el.com>, Andi Shyti
 <andi.shyti@...ux.intel.com>, Jonathan Corbet <corbet@....net>,
 Christopher Healy <healych@...zon.com>, Bagas Sanjaya
 <bagasdotme@...il.com>, linux-kernel@...r.kernel.org,
 dri-devel@...ts.freedesktop.org, intel-gfx@...ts.freedesktop.org,
 linux-mm@...ck.org, linux-doc@...r.kernel.org, kernel@...labora.com
Subject: Re: [PATCH v4 03/13] drm/shmem-helper: Map huge pages in fault
 handlers

Hi Matthew,

On 17/10/2025 23:42, Matthew Wilcox wrote:
> On Thu, Oct 16, 2025 at 01:17:07PM +0200, Loïc Molinari wrote:
>>> It looks to me like we have an opportunity to do better here by
>>> adding a vmf_insert_pfns() interface.  I don't think we should delay
>>> your patch series to add it, but let's not forget to do that; it can
>>> have very good performnce effects on ARM to use contptes.
>>
>> Agreed. I initially wanted to provide such an interface based on set_ptes()
>> to benefit from arm64 contptes but thought it'd better be a distinct patch
>> series.
> 
> Agreed.
> 
>>>
>>>> @@ -617,8 +645,9 @@ static vm_fault_t drm_gem_shmem_fault(struct vm_fault *vmf)
>>> [...]
>>>> -		ret = vmf_insert_pfn(vma, vmf->address, page_to_pfn(page));
>>>> +	if (drm_gem_shmem_map_pmd(vmf, vmf->address, pages[page_offset])) {
>>>> +		ret = VM_FAULT_NOPAGE;
>>>> +		goto out;
>>>>    	}
>>>
>>> Does this actually work?
>>
>> Yes, it does. Huge pages are successfully mapped from both map_pages and
>> fault handlers. Anything wrong with it?
> 
> No, I just wasn't sure that this would work correctly.
> 
>> There seems to be an another issue thought. There are failures [1], all
>> looking like that one [2]. I think it's because map_pages is called with the
>> RCU read lock taken and the DRM GEM map_pages handler must lock the GEM
>> object before accessing pages with dma_resv_lock(). The locking doc says:
>> "If it's not possible to reach a page without blocking, filesystem should
>> skip it.". Unlocking the RCU read lock in the handler seems wrong and doing
>> without a map_pages implementation would be unfortunate. What would you
>> recommend here?
> 
> I'm not familiar with GEM locking, so let me describe briefly how
> pagecache locking works.
> 
> Calling mmap bumps the refcount on the inode.  That keeps the inode
> around while the page fault handler runs.  For each folio, we
> get a refcount on it, then we trylock it.  Then we map each page in the
> folio.
> 
> So maybe you can trylock the GEM object?  It isn't clear to me whether
> you want finer grained locking than that.  If the trylock fails, no big
> deal, you just fall through to the fault path (with the slightly more
> heavy-weight locking that allows you to sleep).

I proposed a series v5 using dma_resv_trylock(). This actually fails 
later because the vmf_insert_pfn*() functions end up locking too. Not 
sure how to fix that yet so I proposed a v6 with no fault-around path 
and will get back to it (along with contptes) later.

Loïc