linux-kernel - Re: [PATCH v7 11/11] xfs: enable block size larger than page size support

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20240617163136.ozxrlxljmblcgny3@quentin>
Date: Mon, 17 Jun 2024 16:31:36 +0000
From: "Pankaj Raghav (Samsung)" <kernel@...kajraghav.com>
To: Christoph Hellwig <hch@....de>
Cc: Dave Chinner <david@...morbit.com>, djwong@...nel.org,
	chandan.babu@...cle.com, brauner@...nel.org,
	akpm@...ux-foundation.org, willy@...radead.org, mcgrof@...nel.org,
	linux-mm@...ck.org, hare@...e.de, linux-kernel@...r.kernel.org,
	yang@...amperecomputing.com, Zi Yan <zi.yan@...t.com>,
	linux-xfs@...r.kernel.org, p.raghav@...sung.com,
	linux-fsdevel@...r.kernel.org, gost.dev@...sung.com,
	cl@...amperecomputing.com, john.g.garry@...cle.com
Subject: Re: [PATCH v7 11/11] xfs: enable block size larger than page size
 support

On Mon, Jun 17, 2024 at 08:51:04AM +0200, Christoph Hellwig wrote:
> On Mon, Jun 17, 2024 at 11:29:42AM +1000, Dave Chinner wrote:
> > > > +	if (mp->m_sb.sb_blocksize > PAGE_SIZE)
> > > > +		igeo->min_folio_order = mp->m_sb.sb_blocklog - PAGE_SHIFT;
> > > > +	else
> > > > +		igeo->min_folio_order = 0;
> > > >  }
> > > 
> > > The minimum folio order isn't really part of the inode (allocation)
> > > geometry, is it?
> > 
> > I suggested it last time around instead of calculating the same
> > constant on every inode allocation. We're already storing in-memory
> > strunct xfs_inode allocation init values in this structure. e.g. in
> > xfs_inode_alloc() we see things like this:
> 
> While new_diflags2 isn't exactly inode geometry, it at least is part
> of the inode allocation.  Folio min order for file data has nothing
> to do with this at all.
> 
> > The only other place we might store it is the struct xfs_mount, but
> > given all the inode allocation constants are already in the embedded
> > mp->m_ino_geo structure, it just seems like a much better idea to
> > put it will all the other inode allocation constants than dump it
> > randomly into the struct xfs_mount....
> 
> Well, it is very closely elated to say the m_blockmask field in
> struct xfs_mount.  The again modern CPUs tend to get a you simple
> subtraction for free in most pipelines doing other things, so I'm
> not really sure it's worth caching for use in inode allocation to
> start with, but I don't care strongly about that.

But there will also be an extra conditional apart from subtraction
right? 

Initially it was something like this:

@@ -73,6 +73,7 @@ xfs_inode_alloc(
 	xfs_ino_t		ino)
 {
 	struct xfs_inode	*ip;
+	int			min_order = 0;
 
 	/*
 	 * XXX: If this didn't occur in transactions, we could drop GFP_NOFAIL
@@ -88,7 +89,8 @@ xfs_inode_alloc(
 	/* VFS doesn't initialise i_mode or i_state! */
 	VFS_I(ip)->i_mode = 0;
 	VFS_I(ip)->i_state = 0;
-	mapping_set_large_folios(VFS_I(ip)->i_mapping);
+	min_order = max(min_order, ilog2(mp->m_sb.sb_blocksize) - PAGE_SHIFT);
+	mapping_set_folio_orders(VFS_I(ip)->i_mapping, min_order, MAX_PAGECACHE_ORDER);
 
 	XFS_STATS_INC(mp, vn_active);
 	ASSERT(atomic_read(&ip->i_pincount) == 0);
@@ -313,6 +315,7 @@ xfs_reinit_inode(
 	dev_t			dev = inode->i_rdev;
 	kuid_t			uid = inode->i_uid;
 	kgid_t			gid = inode->i_gid;
+	int			min_order = 0;
 
 	error = inode_init_always(mp->m_super, inode);
 
@@ -323,7 +326,8 @@ xfs_reinit_inode(
 	inode->i_rdev = dev;
 	inode->i_uid = uid;
 	inode->i_gid = gid;
-	mapping_set_large_folios(inode->i_mapping);
+	min_order = max(min_order, ilog2(mp->m_sb.sb_blocksize) - PAGE_SHIFT);
+	mapping_set_folio_orders(inode->i_mapping, min_order, MAX_PAGECACHE_ORDER);
 	return error;
 }

It does introduce a conditional in the inode allocation hot path so I
went with what Chinner proposed as it is something we use when we
initialize an inode.