linux-kernel - Re: [PATCH v8 01/10] fs: Allow fine-grained control of folio sizes

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20240705141414.72yy6m75aajmlhvt@quentin>
Date: Fri, 5 Jul 2024 14:14:14 +0000
From: "Pankaj Raghav (Samsung)" <kernel@...kajraghav.com>
To: Ryan Roberts <ryan.roberts@....com>
Cc: Dave Chinner <david@...morbit.com>,
	Matthew Wilcox <willy@...radead.org>, chandan.babu@...cle.com,
	djwong@...nel.org, brauner@...nel.org, akpm@...ux-foundation.org,
	linux-kernel@...r.kernel.org, yang@...amperecomputing.com,
	linux-mm@...ck.org, john.g.garry@...cle.com,
	linux-fsdevel@...r.kernel.org, hare@...e.de, p.raghav@...sung.com,
	mcgrof@...nel.org, gost.dev@...sung.com, cl@...amperecomputing.com,
	linux-xfs@...r.kernel.org, hch@....de, Zi Yan <zi.yan@...t.com>
Subject: Re: [PATCH v8 01/10] fs: Allow fine-grained control of folio sizes

> >>
> >>> If the device is
> >>> asking for a blocksize > PAGE_SIZE and CONFIG_TRANSPARENT_HUGEPAGE is
> >>> not set, you should also decline to mount the filesystem.
> >>
> >> What does CONFIG_TRANSPARENT_HUGEPAGE have to do with filesystems
> >> being able to use large folios?
> >>
> >> If that's an actual dependency of using large folios, then we're at
> >> the point where the mm side of large folios needs to be divorced
> >> from CONFIG_TRANSPARENT_HUGEPAGE and always supported.
> >> Alternatively, CONFIG_TRANSPARENT_HUGEPAGE needs to selected by the
> >> block layer and also every filesystem that wants to support
> >> sector/blocks sizes larger than PAGE_SIZE.  IOWs, large folio
> >> support needs to *always* be enabled on systems that say
> >> CONFIG_BLOCK=y.
> > 
> > Why CONFIG_BLOCK? I think it is enough if it comes from the FS side
> > right? And for now, the only FS that needs that sort of bs > ps 
> > guarantee is XFS with this series. Other filesystems such as bcachefs 
> > that call mapping_set_large_folios() only enable it as an optimization
> > and it is not needed for the filesystem to function.
> > 
> > So this is my conclusion from the conversation:
> > - Add a dependency in Kconfig on THP for XFS until we fix the dependency
> >   of large folios on THP
> 
> THP isn't supported on some arches, so isn't this effectively saying XFS can no
> longer be used with those arches, even if the bs <= ps? I think while pagecache
> large folios depend on THP, you need to make this a mount-time check in the FS?
> 
> But ideally, MAX_PAGECACHE_ORDER would be set to 0 for
> !CONFIG_TRANSPARENT_HUGEPAGE so you can just check against that and don't have
> to worry about THP availability directly.

Yes, that would be better. We should have a way to probe it during mount
time without requiring any address_space mapping. We could have a helper
something as follows:

static inline unsigned int mapping_max_folio_order_supported()
{
    if (!IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE))
      return 0;
    return MAX_PAGECACHE_ORDER;
}

This could be used by the FS to verify during mount time.

> 
> Willy; Why is MAX_PAGECACHE_ORDER set to 8 when THP is disabled currently?
> 

This appeared in this patch with the following comment:
https://lore.kernel.org/linux-fsdevel/20230710130253.3484695-8-willy@infradead.org/
 
+/*
+ * There are some parts of the kernel which assume that PMD entries
+ * are exactly HPAGE_PMD_ORDER.  Those should be fixed, but until then,
+ * limit the maximum allocation order to PMD size.  I'm not aware of any
+ * assumptions about maximum order if THP are disabled, but 8 seems like
+ * a good order (that's 1MB if you're using 4kB pages)
+ */ 

> > - Add a BUILD_BUG_ON(XFS_MAX_BLOCKSIZE > MAX_PAGECACHE_ORDER)
> > - Add a WARN_ON_ONCE() and clamp the min and max value in
> >   mapping_set_folio_order_range() ?
> > 
> > Let me know what you all think @willy, @dave and @ryan.
> > 
> > --
> > Pankaj
>