linux-kernel - Re: [v2 PATCH 0/8] Make khugepaged collapse readonly FS THP more consistent

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <YjTT5Meqdn8fiuC2@casper.infradead.org>
Date:   Fri, 18 Mar 2022 18:48:04 +0000
From:   Matthew Wilcox <willy@...radead.org>
To:     Yang Shi <shy828301@...il.com>
Cc:     Dave Chinner <david@...morbit.com>, vbabka@...e.cz,
        kirill.shutemov@...ux.intel.com, linmiaohe@...wei.com,
        songliubraving@...com, riel@...riel.com, ziy@...dia.com,
        akpm@...ux-foundation.org, tytso@....edu, adilger.kernel@...ger.ca,
        darrick.wong@...cle.com, linux-mm@...ck.org,
        linux-fsdevel@...r.kernel.org, linux-ext4@...r.kernel.org,
        linux-xfs@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [v2 PATCH 0/8] Make khugepaged collapse readonly FS THP more
 consistent

On Fri, Mar 18, 2022 at 11:04:29AM -0700, Yang Shi wrote:
> I agree once page cache huge page is fully supported,
> READ_ONLY_THP_FOR_FS could be deprecated. But actually this patchset
> makes khugepaged collapse file THP more consistently. It guarantees
> the THP could be collapsed as long as file THP is supported and
> configured properly and there is suitable file vmas, it is not
> guaranteed by the current code. So it should be useful even though
> READ_ONLY_THP_FOR_FS is gone IMHO.

I don't know if it's a good thing or not.  Experiments with 64k
PAGE_SIZE on arm64 shows some benchmarks improving and others regressing.
Just because we _can_ collapse a 2MB range of pages into a single 2MB
page doesn't mean we _should_.  I suspect the right size folio for any
given file will depend on the access pattern.  For example, dirtying a
few bytes in a folio will result in the entire folio being written back.
Is that what you want?  Maybe!  It may prompt the filesystem to defragment
that range, which would be good.  On the other hand, if you're bandwidth
limited, it may decrease your performance.  And if your media has limited
write endurance, it may result in your drive wearing out more quickly.

Changing the heuristics should come with data.  Preferably from a wide
range of systems and use cases.  I know that's hard to do, but how else
can we proceed?

And I think you ignored my point that READ_ONLY_THP_FOR_FS required
no changes to filesystems.  It was completely invisible to them, by
design.  Now this patchset requires each filesystem to do something.
That's not a great step.

P.S. khugepaged currently does nothing if a range contains a compound
page.  It assumes that the page is compound because it's now a THP.
Large folios break that assumption, so khugepaged will now never
collapse a range which includes large folios.  Thanks to commit
    mm/filemap: Support VM_HUGEPAGE for file mappings
we'll always try to bring in PMD-sized pages for MADV_HUGEPAGE, so
it _probably_ doesn't matter.  But it's something we should watch
for as filesystems grow support for large folios.