[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200920232303.GW12096@dread.disaster.area>
Date: Mon, 21 Sep 2020 09:23:03 +1000
From: Dave Chinner <david@...morbit.com>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Matthew Wilcox <willy@...radead.org>,
Michael Larabel <Michael@...haellarabel.com>,
Matthieu Baerts <matthieu.baerts@...sares.net>,
Amir Goldstein <amir73il@...il.com>,
Ted Ts'o <tytso@...gle.com>,
Andreas Dilger <adilger.kernel@...ger.ca>,
Ext4 Developers List <linux-ext4@...r.kernel.org>,
Jan Kara <jack@...e.cz>,
linux-fsdevel <linux-fsdevel@...r.kernel.org>
Subject: Re: Kernel Benchmarking
On Thu, Sep 17, 2020 at 12:47:16PM -0700, Linus Torvalds wrote:
> On Thu, Sep 17, 2020 at 12:27 PM Matthew Wilcox <willy@...radead.org> wrote:
> >
> > Ah, I see what you mean. Hold the i_mmap_rwsem for write across,
> > basically, the entirety of truncate_inode_pages_range().
>
> I really suspect that will be entirely unacceptable for latency
> reasons, but who knows. In practice, nobody actually truncates a file
> _while_ it's mapped, that's just crazy talk.
>
> But almost every time I go "nobody actually does this", I tend to be
> surprised by just how crazy some loads are, and it turns out that
> _somebody_ does it, and has a really good reason for doing odd things,
> and has been doing it for years because it worked really well and
> solved some odd problem.
>
> So the "hold it for the entirety of truncate_inode_pages_range()"
> thing seems to be a really simple approach, and nice and clean, but it
> makes me go "*somebody* is going to do bad things and complain about
> page fault latencies".
I don't think there's a major concern here because that's what we
are already doing at the filesystem level. In this case, it is
because some filesystems need to serialise IO to the inode -before-
calling truncate_setsize(). e.g.
- we have to wait for inflight direct IO that may be beyond the new
EOF to drain before we start changing where EOF lies.
- we have data vs metadata ordering requirements that mean we have
to ensure dirty data is stable before we change the inode size.
Hence we've already been locking out page faults for the entire
truncate operation for a few years on both XFS and ext4. We haven't
heard of any problems result from truncate-related page fault
latencies....
FWIW, if the fs layer is already providing this level of IO
exclusion w.r.t. address space access, does it need to be replicated
at the address space level?
Cheers,
Dave.
--
Dave Chinner
david@...morbit.com
Powered by blists - more mailing lists