[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230531-symmetrie-absender-8e9af6834753@brauner>
Date: Wed, 31 May 2023 13:15:43 +0200
From: Christian Brauner <brauner@...nel.org>
To: Amir Goldstein <amir73il@...il.com>
Cc: chenzhiyin <zhiyin.chen@...el.com>, viro@...iv.linux.org.uk,
linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
nanhai.zou@...el.com
Subject: Re: [PATCH] fs.h: Optimize file struct to prevent false sharing
On Tue, May 30, 2023 at 01:02:06PM +0300, Amir Goldstein wrote:
> On Tue, May 30, 2023 at 12:31 PM Christian Brauner <brauner@...nel.org> wrote:
> >
> > On Mon, May 29, 2023 at 10:06:26PM -0400, chenzhiyin wrote:
> > > In the syscall test of UnixBench, performance regression occurred
> > > due to false sharing.
> > >
> > > The lock and atomic members, including file::f_lock, file::f_count
> > > and file::f_pos_lock are highly contended and frequently updated
> > > in the high-concurrency test scenarios. perf c2c indentified one
> > > affected read access, file::f_op.
> > > To prevent false sharing, the layout of file struct is changed as
> > > following
> > > (A) f_lock, f_count and f_pos_lock are put together to share the
> > > same cache line.
> > > (B) The read mostly members, including f_path, f_inode, f_op are
> > > put into a separate cache line.
> > > (C) f_mode is put together with f_count, since they are used
> > > frequently at the same time.
> > >
> > > The optimization has been validated in the syscall test of
> > > UnixBench. performance gain is 30~50%, when the number of parallel
> > > jobs is 16.
> > >
> > > Signed-off-by: chenzhiyin <zhiyin.chen@...el.com>
> > > ---
> >
> > Sounds interesting, but can we see the actual numbers, please?
> > So struct file is marked with __randomize_layout which seems to make
> > this whole reordering pointless or at least only useful if the
> > structure randomization Kconfig is turned off. Is there any precedence
> > to optimizing structures that are marked as randomizable?
>
> Good question!
>
> Also does the impressive improvement is gained only with (A)+(B)+(C)?
>
> (A) and (B) make sense, but something about the claim (C) does not sit right.
> Can you explain this claim?
>
> Putting the read mostly f_mode with frequently updated f_count seems
> counter to the goal of your patch.
> Aren't f_mode and f_flags just as frequently accessed as f_op?
> Shouldn't f_mode belong with the read-mostly members?
>
> What am I missing?
I think that f_mode will be more heavily used because it's checked
everytime you call fget variants. For example, f_mode is used to check
whether the file you're about to get a reference to is an O_PATH file
and, depending on the fget variant that the caller used, denies or
allows the caller to get a reference on that file depending on whether
FMODE_PATH is or isn't set. So you have
if (unlikely(file->f_mode & mask))
if (unlikely(!get_file_rcu(file))) // this is just try to bump f_count
everytime you call an fget variant which should be substantial. Other
places are fdget_pos() where f_mode is also checked right after an
fdget()...
Powered by blists - more mailing lists