[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZnDEME/qMAzqli8l@dread.disaster.area>
Date: Tue, 18 Jun 2024 09:18:08 +1000
From: Dave Chinner <david@...morbit.com>
To: Christoph Hellwig <hch@....de>
Cc: "Pankaj Raghav (Samsung)" <kernel@...kajraghav.com>, djwong@...nel.org,
chandan.babu@...cle.com, brauner@...nel.org,
akpm@...ux-foundation.org, willy@...radead.org, mcgrof@...nel.org,
linux-mm@...ck.org, hare@...e.de, linux-kernel@...r.kernel.org,
yang@...amperecomputing.com, Zi Yan <zi.yan@...t.com>,
linux-xfs@...r.kernel.org, p.raghav@...sung.com,
linux-fsdevel@...r.kernel.org, gost.dev@...sung.com,
cl@...amperecomputing.com, john.g.garry@...cle.com
Subject: Re: [PATCH v7 11/11] xfs: enable block size larger than page size
support
On Mon, Jun 17, 2024 at 08:51:04AM +0200, Christoph Hellwig wrote:
> On Mon, Jun 17, 2024 at 11:29:42AM +1000, Dave Chinner wrote:
> > > > + if (mp->m_sb.sb_blocksize > PAGE_SIZE)
> > > > + igeo->min_folio_order = mp->m_sb.sb_blocklog - PAGE_SHIFT;
> > > > + else
> > > > + igeo->min_folio_order = 0;
> > > > }
> > >
> > > The minimum folio order isn't really part of the inode (allocation)
> > > geometry, is it?
> >
> > I suggested it last time around instead of calculating the same
> > constant on every inode allocation. We're already storing in-memory
> > strunct xfs_inode allocation init values in this structure. e.g. in
> > xfs_inode_alloc() we see things like this:
>
> While new_diflags2 isn't exactly inode geometry, it at least is part
> of the inode allocation. Folio min order for file data has nothing
> to do with this at all.
Yet ip->i_diflags2 is *not* initialised in xfs_init_new_inode()
when we physically allocate and initialise a new inode. It is set
for all inodes when they are allocated in memory, regardless of
their use.
Whether that is the right thing to do or not is a separate issue -
xfs_inode_from_disk() will overwrite it in every inode read case
that isn't a create.
Indeed, We could do the folio order initialisation in
xfs_setup_inode() where we set up the mapping gfp mask, but that
doesn't change the fact we set it up for every inode that is
instantiated in memory or that we want it pre-calculated...
> > The only other place we might store it is the struct xfs_mount, but
> > given all the inode allocation constants are already in the embedded
> > mp->m_ino_geo structure, it just seems like a much better idea to
> > put it will all the other inode allocation constants than dump it
> > randomly into the struct xfs_mount....
>
> Well, it is very closely elated to say the m_blockmask field in
> struct xfs_mount.
Not really. The block mask is a property of the and used primarily
for manipulating lengths in units of FSB to/from byte counts and
vice versa. It is used all over the place, and the only guaranteed
common structure that all those callers have access to is the
xfs_mount.
OTOH, the folio order is only used for regular files to tell the
page cache how to behave. The scope of the folio order setup is the
same as mapping_set_gfp_mask() - is it only used in one place and
used for inode configuration. I may have called the structure "inode
geometry" because that described what it contained when I first
implemented it, but that doesn't mean that is all that is can
contain. It contains static, precalculated inode configuration
values, and that what we are adding here...
> The again modern CPUs tend to get a you simple
> subtraction for free in most pipelines doing other things, so I'm
> not really sure it's worth caching for use in inode allocation to
> start with, but I don't care strongly about that.
It's not the cost of a subtraction that is the problem -
precalculation is about avoiding a potential branch misprediction in
a hot path that would stall the CPU. If there were no branches, it
wouldn't be an issue, but this value cannot be calculated without at
least one branch in the logic.
-Dave.
--
Dave Chinner
david@...morbit.com
Powered by blists - more mailing lists