lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200920232303.GW12096@dread.disaster.area>
Date:   Mon, 21 Sep 2020 09:23:03 +1000
From:   Dave Chinner <david@...morbit.com>
To:     Linus Torvalds <torvalds@...ux-foundation.org>
Cc:     Matthew Wilcox <willy@...radead.org>,
        Michael Larabel <Michael@...haellarabel.com>,
        Matthieu Baerts <matthieu.baerts@...sares.net>,
        Amir Goldstein <amir73il@...il.com>,
        Ted Ts'o <tytso@...gle.com>,
        Andreas Dilger <adilger.kernel@...ger.ca>,
        Ext4 Developers List <linux-ext4@...r.kernel.org>,
        Jan Kara <jack@...e.cz>,
        linux-fsdevel <linux-fsdevel@...r.kernel.org>
Subject: Re: Kernel Benchmarking

On Thu, Sep 17, 2020 at 12:47:16PM -0700, Linus Torvalds wrote:
> On Thu, Sep 17, 2020 at 12:27 PM Matthew Wilcox <willy@...radead.org> wrote:
> >
> > Ah, I see what you mean.  Hold the i_mmap_rwsem for write across,
> > basically, the entirety of truncate_inode_pages_range().
> 
> I really suspect that will be entirely unacceptable for latency
> reasons, but who knows. In practice, nobody actually truncates a file
> _while_ it's mapped, that's just crazy talk.
> 
> But almost every time I go "nobody actually does this", I tend to be
> surprised by just how crazy some loads are, and it turns out that
> _somebody_ does it, and has a really good reason for doing odd things,
> and has been doing it for years because it worked really well and
> solved some odd problem.
> 
> So the "hold it for the entirety of truncate_inode_pages_range()"
> thing seems to be a really simple approach, and nice and clean, but it
> makes me go "*somebody* is going to do bad things and complain about
> page fault latencies".

I don't think there's a major concern here because that's what we
are already doing at the filesystem level. In this case, it is
because some filesystems need to serialise IO to the inode -before-
calling truncate_setsize(). e.g.

- we have to wait for inflight direct IO that may be beyond the new
  EOF to drain before we start changing where EOF lies.

- we have data vs metadata ordering requirements that mean we have
  to ensure dirty data is stable before we change the inode size.

Hence we've already been locking out page faults for the entire
truncate operation for a few years on both XFS and ext4. We haven't
heard of any problems result from truncate-related page fault
latencies....

FWIW, if the fs layer is already providing this level of IO
exclusion w.r.t. address space access, does it need to be replicated
at the address space level?

Cheers,

Dave.
-- 
Dave Chinner
david@...morbit.com

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ