linux-ext4 - Re: Kernel Benchmarking

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20200920232303.GW12096@dread.disaster.area>
Date:   Mon, 21 Sep 2020 09:23:03 +1000
From:   Dave Chinner <david@...morbit.com>
To:     Linus Torvalds <torvalds@...ux-foundation.org>
Cc:     Matthew Wilcox <willy@...radead.org>,
        Michael Larabel <Michael@...haellarabel.com>,
        Matthieu Baerts <matthieu.baerts@...sares.net>,
        Amir Goldstein <amir73il@...il.com>,
        Ted Ts'o <tytso@...gle.com>,
        Andreas Dilger <adilger.kernel@...ger.ca>,
        Ext4 Developers List <linux-ext4@...r.kernel.org>,
        Jan Kara <jack@...e.cz>,
        linux-fsdevel <linux-fsdevel@...r.kernel.org>
Subject: Re: Kernel Benchmarking

On Thu, Sep 17, 2020 at 12:47:16PM -0700, Linus Torvalds wrote:
> On Thu, Sep 17, 2020 at 12:27 PM Matthew Wilcox <willy@...radead.org> wrote:
> >
> > Ah, I see what you mean.  Hold the i_mmap_rwsem for write across,
> > basically, the entirety of truncate_inode_pages_range().
> 
> I really suspect that will be entirely unacceptable for latency
> reasons, but who knows. In practice, nobody actually truncates a file
> _while_ it's mapped, that's just crazy talk.
> 
> But almost every time I go "nobody actually does this", I tend to be
> surprised by just how crazy some loads are, and it turns out that
> _somebody_ does it, and has a really good reason for doing odd things,
> and has been doing it for years because it worked really well and
> solved some odd problem.
> 
> So the "hold it for the entirety of truncate_inode_pages_range()"
> thing seems to be a really simple approach, and nice and clean, but it
> makes me go "*somebody* is going to do bad things and complain about
> page fault latencies".

I don't think there's a major concern here because that's what we
are already doing at the filesystem level. In this case, it is
because some filesystems need to serialise IO to the inode -before-
calling truncate_setsize(). e.g.

- we have to wait for inflight direct IO that may be beyond the new
  EOF to drain before we start changing where EOF lies.

- we have data vs metadata ordering requirements that mean we have
  to ensure dirty data is stable before we change the inode size.

Hence we've already been locking out page faults for the entire
truncate operation for a few years on both XFS and ext4. We haven't
heard of any problems result from truncate-related page fault
latencies....

FWIW, if the fs layer is already providing this level of IO
exclusion w.r.t. address space access, does it need to be replicated
at the address space level?

Cheers,

Dave.
-- 
Dave Chinner
david@...morbit.com