linux-kernel - Re: [PATCH] mm: support large mapping building for tmpfs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4c5ea64d-c33c-4cf5-8e71-08bc50a5f940@redhat.com>
Date: Wed, 2 Jul 2025 13:55:15 +0200
From: David Hildenbrand <david@...hat.com>
To: Baolin Wang <baolin.wang@...ux.alibaba.com>, akpm@...ux-foundation.org,
 hughd@...gle.com
Cc: ziy@...dia.com, lorenzo.stoakes@...cle.com, Liam.Howlett@...cle.com,
 npache@...hat.com, ryan.roberts@....com, dev.jain@....com,
 baohua@...nel.org, vbabka@...e.cz, rppt@...nel.org, surenb@...gle.com,
 mhocko@...e.com, linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] mm: support large mapping building for tmpfs

On 02.07.25 13:38, David Hildenbrand wrote:
> 
>>> So by mapping more in a single page fault, you end up increasing "RSS".
>>> But I wouldn't
>>> call that "expected". I rather suspect that nobody will really care :)
>>
>> But tmpfs is a little special here. It uses the 'huge=' option to
>> control large folio allocation. So, I think users should know they want
>> to use large folios and build the whole mapping for the large folios.
>> That is why I call it 'expected'.
> 
> Well, if your distribution decides to set huge= on /tmp or something
> like that, your application might have very little saying in that, right? :)
> 
> Again, I assume it's fine, but we might find surprises on the way.
> 
>>>
>>> The thing is, when you *allocate* a new folio, it must adhere at least to
>>> pagecache alignment (e.g., cannot place an order-2 folio at pgoff 1) --
>>
>> Yes, agree.
>>
>>> that is what
>>> thp_vma_suitable_order() checks. Otherwise you cannot add it to the
>>> pagecache.
>>
>> But this alignment is not done by thp_vma_suitable_order().
>>
>> For tmpfs, it will check the alignment in shmem_suitable_orders() via:
>> "
>> 	if (!xa_find(&mapping->i_pages, &aligned_index,
>> 			aligned_index + pages - 1, XA_PRESENT))
>> "
> 
> That's not really alignment check, that's just checking whether a
> suitable folio order spans already-present entries, no?
> 
> Finding suitable orders is still up to other code IIUC.
> 
>>
>> For other fs systems, it will check the alignment in
>> __filemap_get_folio() via:
>> "
>> 	/* If we're not aligned, allocate a smaller folio */
>> 	if (index & ((1UL << order) - 1))
>> 		order = __ffs(index);
>> "
>>
>>> But once you *obtain* a folio from the pagecache and are supposed to map it
>>> into the page tables, that must already hold true.
>>>
>>> So you should be able to just blindly map whatever is given to you here
>>> AFAIKS.
>>>
>>> If you would get a pagecache folio that violates the linear page offset
>>> requirement
>>> at that point, something else would have messed up the pagecache.
>>
>> Yes. But the comments from thp_vma_suitable_order() is not about the
>> pagecache alignment, it says "the order-aligned addresses in the VMA map
>> to order-aligned offsets within the file",
> 
> Let's dig, it's confusing.
> 
> The code in question is:
> 
> if (!IS_ALIGNED((vma->vm_start >> PAGE_SHIFT) - vma->vm_pgoff,
> 		hpage_size >> PAGE_SHIFT))
> 
> So yes, I think this tells us: if we would have a PMD THP in the
> pagecache, would we be able to map it with a PMD. If not, then don't
> bother with allocating a PMD THP.
> 
> Of course, this also applies to other orders, but for PMD THPs it's
> probably most relevant: if we cannot even map it through a PMD, then
> probably it could be a wasted THP.
> 
> So yes, I agree: if we are both no missing something, then this
> primarily relevant for the PMD case.
> 
> And it's more about "optimization" than "correctness" I guess?

Correction: only if a caller doesn't assume that this is an implicit 
pagecache alignment check. Not sure if that might be the case for shmem 
when it calls thp_vma_suitable_order() with a VMA ...

-- 
Cheers,

David / dhildenb