[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d3736f199973c0c4284f421850bdf8852dc193d8.camel@kernel.org>
Date: Mon, 30 Sep 2024 16:12:11 -0400
From: Jeff Layton <jlayton@...nel.org>
To: Thomas Gleixner <tglx@...utronix.de>, John Stultz <jstultz@...gle.com>,
Stephen Boyd <sboyd@...nel.org>, Alexander Viro <viro@...iv.linux.org.uk>,
Christian Brauner <brauner@...nel.org>, Jan Kara <jack@...e.cz>, Steven
Rostedt <rostedt@...dmis.org>, Masami Hiramatsu <mhiramat@...nel.org>,
Mathieu Desnoyers <mathieu.desnoyers@...icios.com>, Jonathan Corbet
<corbet@....net>, Chandan Babu R <chandan.babu@...cle.com>, "Darrick J.
Wong" <djwong@...nel.org>, Theodore Ts'o <tytso@....edu>, Andreas Dilger
<adilger.kernel@...ger.ca>, Chris Mason <clm@...com>, Josef Bacik
<josef@...icpanda.com>, David Sterba <dsterba@...e.com>, Hugh Dickins
<hughd@...gle.com>, Andrew Morton <akpm@...ux-foundation.org>, Chuck Lever
<chuck.lever@...cle.com>, Vadim Fedorenko <vadim.fedorenko@...ux.dev>
Cc: Randy Dunlap <rdunlap@...radead.org>, linux-kernel@...r.kernel.org,
linux-fsdevel@...r.kernel.org, linux-trace-kernel@...r.kernel.org,
linux-doc@...r.kernel.org, linux-xfs@...r.kernel.org,
linux-ext4@...r.kernel.org, linux-btrfs@...r.kernel.org,
linux-nfs@...r.kernel.org, linux-mm@...ck.org
Subject: Re: [PATCH v8 01/11] timekeeping: move multigrain timestamp floor
handling into timekeeper
On Mon, 2024-09-30 at 21:43 +0200, Thomas Gleixner wrote:
> On Thu, Sep 19 2024 at 18:50, Jeff Layton wrote:
> > The fix for this is to establish a floor value for the coarse-grained
> > clock. When stamping a file with a fine-grained timestamp, we update
> > the floor value with the current monotonic time (using cmpxchg). Then
> > later, when a coarse-grained timestamp is requested, check whether the
> > floor is later than the current coarse-grained time. If it is, then the
> > kernel will return the floor value (converted to realtime) instead of
> > the current coarse-grained clock. That allows us to maintain the
> > ordering guarantees.
> >
> > My original implementation of this tracked the floor value in
> > fs/inode.c (also using cmpxchg), but that caused a performance
> > regression, mostly due to multiple calls into the timekeeper functions
> > with seqcount loops. By adding the floor to the timekeeper we can get
> > that back down to 1 seqcount loop.
> >
> > Let me know if you have more questions about this, or suggestions about
> > how to do this better. The timekeeping code is not my area of expertise
> > (obviously) so I'm open to doing this a better way if there is one.
>
> The comments I made about races and the clock_settime() inconsistency
> vs. the change log aside, I don't see room for improvement there.
>
> What worries me is the atomic_cmpxchg() under contention on large
> machines, but as it is not a cmpxchg() loop it might be not completely
> horrible.
>
Thanks for taking a look.
The code does try to avoid requesting a fine-grained timestamp unless
there is no other option. In practice, that means that a
write()+stat()+write() has to occur within the same coarse-grained
timer tick. That obviously does happen, but I'm hoping it's not very
frequent on real-world workloads.
The main place where we see a performance hit from this set is the
will-it-scale pipe test, which does 1-byte writes and reads on a pipe
as fast as possible.
The previous patchset tracked the floor inside fs/inode.c, and each
timestamp update on the pipe inode could make multiple trips into the
timekeeper seqcount loops. That added extra barriers and slowed things
down.
This patchset moves the code into the timekeeper, which means only a
single barrier. That gets the cost down, but it's not entirely free.
There is some small cost to dealing with the floor, regardless.
--
Jeff Layton <jlayton@...nel.org>
Powered by blists - more mailing lists