lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Wed, 14 Feb 2024 11:35:50 -0500
From: Zi Yan <ziy@...dia.com>
To: David Hildenbrand <david@...hat.com>
Cc: Ryan Roberts <ryan.roberts@....com>,
 "\"Pankaj Raghav (Samsung)\"" <kernel@...kajraghav.com>, linux-mm@...ck.org,
 "\"Matthew Wilcox (Oracle)\"" <willy@...radead.org>,
 Yang Shi <shy828301@...il.com>, Yu Zhao <yuzhao@...gle.com>,
 "\"Kirill A . Shutemov\"" <kirill.shutemov@...ux.intel.com>,
 "Michal Koutný" <mkoutny@...e.com>,
 Roman Gushchin <roman.gushchin@...ux.dev>,
 "\"Zach O'Keefe\"" <zokeefe@...gle.com>, Hugh Dickins <hughd@...gle.com>,
 Mcgrof Chamberlain <mcgrof@...nel.org>,
 Andrew Morton <akpm@...ux-foundation.org>, linux-kernel@...r.kernel.org,
 cgroups@...r.kernel.org, linux-fsdevel@...r.kernel.org,
 linux-kselftest@...r.kernel.org
Subject: Re: [PATCH v4 0/7] Split a folio to any lower order folios

On 14 Feb 2024, at 5:55, David Hildenbrand wrote:

> On 14.02.24 11:50, Ryan Roberts wrote:
>> On 13/02/2024 22:31, Zi Yan wrote:
>>> On 13 Feb 2024, at 17:21, David Hildenbrand wrote:
>>>
>>>> On 13.02.24 22:55, Zi Yan wrote:
>>>>> From: Zi Yan <ziy@...dia.com>
>>>>>
>>>>> Hi all,
>>>>>
>>>>> File folio supports any order and multi-size THP is upstreamed[1], so both
>>>>> file and anonymous folios can be >0 order. Currently, split_huge_page()
>>>>> only splits a huge page to order-0 pages, but splitting to orders higher than
>>>>> 0 is going to better utilize large folios. In addition, Large Block
>>>>> Sizes in XFS support would benefit from it[2]. This patchset adds support for
>>>>> splitting a large folio to any lower order folios and uses it during file
>>>>> folio truncate operations.
>>>>>
>>>>> For Patch 6, Hugh did not like my approach to minimize the number of
>>>>> folios for truncate[3]. I would like to get more feedback, especially
>>>>> from FS people, on it to decide whether to keep it or not.
>>>>
>>>> I'm curious, would it make sense to exclude the "more" controversial parts (i.e., patch #6) for now, and focus on the XFS use case only?
>>>
>>> Sure. Patch 6 was there to make use of split_huge_page_to_list_to_order().
>>> Now we have multi-size THP and XFS use cases, it can be dropped.
>>
>> What are your plans for how to determine when to split THP and to what order? I
>> don't see anything in this series that would split anon THP to non-zero order?
>>
>> We have talked about using hints from user space in the past (e.g.  mremap,
>> munmap, madvise, etc). But chrome has a use case where it temporarily mprotects
>> a single (4K) page as part of garbage collection (IIRC). If you eagerly split on
>> that hint, you will have lost the benefits of the large folio when it later
>> mprotects back to the original setting.
>
> Not only that, splitting will make some of these operations more expensive, possibly with no actual benefit.
>
>>
>> I guess David will suggest this would be a good use case for the khugepaged-lite
>> machanism we have been talking about. I dunno - it seems wasteful to split then
>> collapse again.
>
> I agree. mprotect() and even madvise(), ... might not be good candidates for splitting. mremap() likely is, if the folio is mapped exclusively. MADV_DONTNEED/munmap()/mlock() might be good candidates (again, if mapped exclusively). This will need a lot of thought I'm afraid (as you say, deferred splitting is another example).

My initial use was for splitting 1GB THP to 2MB THP, but 1GB THP is not upstream
yet. So for now, this might only be used by XFS. For anonymous large folios,
we will use this when we find a justified use case. What I can think of is
when a PMD-mapped THP happens to be split and the resulting order can be a HW/SW
favored order, like 64KB or 32KB (to be able to use contig PTE), we split
to that order, otherwise, we still split to order-0.

--
Best Regards,
Yan, Zi

Download attachment "signature.asc" of type "application/pgp-signature" (855 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ