linux-kernel - Re: [PATCH v8 01/10] fs: Allow fine-grained control of folio sizes

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240709165047.GS1998502@frogsfrogsfrogs>
Date: Tue, 9 Jul 2024 09:50:47 -0700
From: "Darrick J. Wong" <djwong@...nel.org>
To: "Pankaj Raghav (Samsung)" <kernel@...kajraghav.com>
Cc: david@...morbit.com, willy@...radead.org, ryan.roberts@....com,
	linux-kernel@...r.kernel.org, yang@...amperecomputing.com,
	linux-mm@...ck.org, john.g.garry@...cle.com,
	linux-fsdevel@...r.kernel.org, hare@...e.de, p.raghav@...sung.com,
	mcgrof@...nel.org, gost.dev@...sung.com, cl@...amperecomputing.com,
	linux-xfs@...r.kernel.org, hch@....de, Zi Yan <zi.yan@...t.com>,
	akpm@...ux-foundation.org, chandan.babu@...cle.com
Subject: Re: [PATCH v8 01/10] fs: Allow fine-grained control of folio sizes

On Tue, Jul 09, 2024 at 04:29:07PM +0000, Pankaj Raghav (Samsung) wrote:
> For now, this is the only patch that is blocking for the next version.
> 
> Based on the discussion, is the following logical @ryan, @dave and
> @willy?
> 
> - We give explicit VM_WARN_ONCE if we try to set folio order range if
>   the THP is disabled, min and max is greater than MAX_PAGECACHE_ORDER.
> 
> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
> index 14e1415f7dcf4..313c9fad61859 100644
> --- a/include/linux/pagemap.h
> +++ b/include/linux/pagemap.h
> @@ -394,13 +394,24 @@ static inline void mapping_set_folio_order_range(struct address_space *mapping,
>                                                  unsigned int min,
>                                                  unsigned int max)
>  {
> -       if (!IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE))
> +       if (!IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) {
> +               VM_WARN_ONCE(1, 
> +       "THP needs to be enabled to support mapping folio order range");
>                 return;
> +       }
>  
> -       if (min > MAX_PAGECACHE_ORDER)
> +       if (min > MAX_PAGECACHE_ORDER) {
> +               VM_WARN_ONCE(1, 
> +       "min order > MAX_PAGECACHE_ORDER. Setting min_order to MAX_PAGECACHE_ORDER");
>                 min = MAX_PAGECACHE_ORDER;
> -       if (max > MAX_PAGECACHE_ORDER)
> +       }
> +
> +       if (max > MAX_PAGECACHE_ORDER) {
> +               VM_WARN_ONCE(1, 
> +       "max order > MAX_PAGECACHE_ORDER. Setting max_order to MAX_PAGECACHE_ORDER");
>                 max = MAX_PAGECACHE_ORDER;
> +       }
> +
>         if (max < min)
>                 max = min;
> 
> - We make THP an explicit dependency for XFS:
> 
> diff --git a/fs/xfs/Kconfig b/fs/xfs/Kconfig
> index d41edd30388b7..be2c1c0e9fe8b 100644
> --- a/fs/xfs/Kconfig
> +++ b/fs/xfs/Kconfig
> @@ -5,6 +5,7 @@ config XFS_FS
>         select EXPORTFS
>         select LIBCRC32C
>         select FS_IOMAP
> +       select TRANSPARENT_HUGEPAGE
>         help
>           XFS is a high performance journaling filesystem which originated
>           on the SGI IRIX platform.  It is completely multi-threaded, can
> 
> OR
> 
> We create a helper in page cache that FSs can use to check if a specific
> order can be supported at mount time:

I like this solution better; if XFS is going to drop support for o[ld]d
architectures I think we need /some/ sort of notice period.  Or at least
a better story than "we want to support 64k fsblocks on x64 so we're
withdrawing support even for 4k fsblocks and smallish filesystems on
m68k".

You probably don't want bs>ps support to block on some arcane discussion
about 32-bit, right? ;)

> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
> index 14e1415f7dcf..9be775ef11a5 100644
> --- a/include/linux/pagemap.h
> +++ b/include/linux/pagemap.h
> @@ -374,6 +374,14 @@ static inline void mapping_set_gfp_mask(struct address_space *m, gfp_t mask)
>  #define MAX_XAS_ORDER          (XA_CHUNK_SHIFT * 2 - 1)
>  #define MAX_PAGECACHE_ORDER    min(MAX_XAS_ORDER, PREFERRED_MAX_PAGECACHE_ORDER)
>  
> +
> +static inline unsigned int mapping_max_folio_order_supported()
> +{
> +    if (!IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE))
> +      return 0;

Shouldn't this line be indented by two tabs, not six spaces?

> +    return MAX_PAGECACHE_ORDER;
> +}

Alternately, should this return the max folio size in bytes?

static inline size_t mapping_max_folio_size(void)
{
	if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE))
		return 1U << (PAGE_SHIFT + MAX_PAGECACHE_ORDER);
	return PAGE_SIZE;
}

Then the validation looks like:

	const size_t	max_folio_size = mapping_max_folio_size();

	if (mp->m_sb.sb_blocksize > max_folio_size) {
		xfs_warn(mp,
 "block size (%u bytes) not supported; maximum folio size is %u.",
				mp->m_sb.sb_blocksize, max_folio_size);
		error = -ENOSYS;
		goto out_free_sb;
	}

(Don't mind me bikeshedding here.)

> +
> 
> 
> diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> index b8a93a8f35cac..e2be8743c2c20 100644
> --- a/fs/xfs/xfs_super.c
> +++ b/fs/xfs/xfs_super.c
> @@ -1647,6 +1647,15 @@ xfs_fs_fill_super(
>                         goto out_free_sb;
>                 }
>  
> +               if (mp->m_sb.sb_blocklog - PAGE_SHIFT >
> +                   mapping_max_folio_order_supported()) {
> +                       xfs_warn(mp,
> +"Block Size (%d bytes) is not supported. Check MAX_PAGECACHE_ORDER",
> +                       mp->m_sb.sb_blocksize);

You might as well print MAX_PAGECACHE_ORDER here to make analysis
easier on less-familiar architectures:

			xfs_warn(mp,
 "block size (%d bytes) is not supported; max folio size is %u.",
					mp->m_sb.sb_blocksize,
					1U << mapping_max_folio_order_supported());

(I wrote this comment first.)

--D

> +                       error = -ENOSYS;
> +                       goto out_free_sb;
> +               }
> +
>                 xfs_warn(mp,
>  "EXPERIMENTAL: V5 Filesystem with Large Block Size (%d bytes) enabled.",
>                         mp->m_sb.sb_blocksize);
> 
> 
> --
> Pankaj