lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2de0617e-d1d7-49ec-9cb8-206eaf37caed@arm.com>
Date:   Tue, 5 Dec 2023 09:34:23 +0000
From:   Ryan Roberts <ryan.roberts@....com>
To:     Andrew Morton <akpm@...ux-foundation.org>
Cc:     Matthew Wilcox <willy@...radead.org>,
        Yin Fengwei <fengwei.yin@...el.com>,
        David Hildenbrand <david@...hat.com>,
        Yu Zhao <yuzhao@...gle.com>,
        Catalin Marinas <catalin.marinas@....com>,
        Anshuman Khandual <anshuman.khandual@....com>,
        Yang Shi <shy828301@...il.com>,
        "Huang, Ying" <ying.huang@...el.com>, Zi Yan <ziy@...dia.com>,
        Luis Chamberlain <mcgrof@...nel.org>,
        Itaru Kitayama <itaru.kitayama@...il.com>,
        "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
        John Hubbard <jhubbard@...dia.com>,
        David Rientjes <rientjes@...gle.com>,
        Vlastimil Babka <vbabka@...e.cz>,
        Hugh Dickins <hughd@...gle.com>,
        Kefeng Wang <wangkefeng.wang@...wei.com>,
        Barry Song <21cnbao@...il.com>,
        Alistair Popple <apopple@...dia.com>, linux-mm@...ck.org,
        linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v8 00/10] Multi-size THP for anonymous memory

On 04/12/2023 19:30, Andrew Morton wrote:
> On Mon,  4 Dec 2023 10:20:17 +0000 Ryan Roberts <ryan.roberts@....com> wrote:
> 
>> Hi All,
>>
>>
>> Prerequisites
>> =============
>>
>> Some work items identified as being prerequisites are listed on page 3 at [9].
>> The summary is:
>>
>> | item                          | status                  |
>> |:------------------------------|:------------------------|
>> | mlock                         | In mainline (v6.7)      |
>> | madvise                       | In mainline (v6.6)      |
>> | compaction                    | v1 posted [10]          |
>> | numa balancing                | Investigated: see below |
>> | user-triggered page migration | In mainline (v6.7)      |
>> | khugepaged collapse           | In mainline (NOP)       |
> 
> What does "prerequisites" mean here?  Won't compile without?  Kernel
> crashes without?  Nice-to-have-after?  Please expand on this.

Short answer: It's supposed to mean things that either need to be done to prevent the mm from regressing (both correctness and performance) when multi-size THP is present but disabled, or things that need to be done to make the mm robust (but not neccessarily optimially performant) when multi-size THP is enabled. But in reality, all of the things on the list could really be reclassified as "nice-to-have-after", IMHO; their absence will neither cause compilation nor runtime errors.

Longer answer: When I first started looking at this, I was advised that there were likely a number of corners which made assumptions about large folios always being PMD-sized, and if not found and fixed, could lead to stability issues. At the time I was also pursuing a strategy of multi-size THP being a compile-time feature with no runtime control, so I decided it was important for multi-size THP to not effectively disable other features (e.g. various madvise ops used to ignore PTE-mapped large folios). This list represents all the things that I could find based on code review, as well as things suggested by others, and in the end, they all fall into that last category of "PTE-mapped large folios efectively disable existing features". But given we now have runtime controls to opt-in to multi-size THP, I'm not sure we need to classify these as prerequisites. But I didn't want to unilaterally make that decision, given this list has previously been discussed and agreed by others.

It's also worth noting that in the case of compaction, that's already a problem for large folios in the page cache; large folios will be skipped.

> 
> I looked at [9], but access is denied.

Sorry about that; its owned by David Rientjes so I can't fix that for you. It's a PDF of a slide with the following table:

+-------------------------------+------------------------------------------------------------------------+--------------+--------------------+
| Item                          | Description                                                            | Assignee     | Status             |
+-------------------------------+------------------------------------------------------------------------+--------------+--------------------+
| mlock                         | Large, pte-mapped folios are ignored when mlock is requested.          | Yin, Fengwei | In mainline (v6.7) |
|                               | Code comment for mlock_vma_folio() says "...filter out pte mappings    |              |                    |
|                               | of THPs which cannot be consistently counted: a pte mapping of the     |              |                    |
|                               | THP head cannot be distinguished by the page alone."                   |              |                    |
| madvise                       | MADV_COLD, MADV_PAGEOUT, MADV_FREE: For large folios, code assumes     | Yin, Fengwei | In mainline (v6.6) |
|                               | exclusive only if mapcount==1, else skips remainder of operation.      |              |                    |
|                               | For large, pte-mapped folios, exclusive folios can have mapcount       |              |                    |
|                               | upto nr_pages and still be exclusive. Even better; don't split         |              |                    |
|                               | the folio if it fits entirely within the range.                        |              |                    |
| compaction                    | Raised at LSFMM: Compaction skips non-order-0 pages.                   | Zi Yan       | v1 posted          |
|                               | Already problem for page-cache pages today.                            |              |                    |
| numa balancing                | Large, pte-mapped folios are ignored by numa-balancing code. Commit    | John Hubbard | Investigated:      |
|                               | comment (e81c480): "We're going to have THP mapped with PTEs. It       |              | Not prerequisite   |
|                               | will confuse numabalancing. Let's skip them for now."                  |              |                    |
| user-triggered page migration | mm/migrate.c (migrate_pages syscall) We don't want to migrate folio    | Kefeng Wang  | In mainline (v6.7) |
|                               | that is shared.                                                        |              |                    |
| khugepaged collapse           | collapse small-sized THP to PMD-sized THP in khugepaged/MADV_COLLAPSE. | Ryan Roberts | In mainline (NOP)  |
|                               | Kirill thinks khugepage should already be able to collapse             |              |                    |
|                               | small large folios to PMD-sized THP; verification required.            |              |                    |
+-------------------------------+------------------------------------------------------------------------+--------------+--------------------+

Thanks,
Ryan

> 
>> [9] https://drive.google.com/file/d/1GnfYFpr7_c1kA41liRUW5YtCb8Cj18Ud/view?usp=sharing&resourcekey=0-U1Mj3-RhLD1JV6EThpyPyA
> 
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ