linux-ext4 - Re: [PATCH] fs: i_version mntopt gets visible through /proc/mounts

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20200622220354.GU2005@dread.disaster.area>
Date:   Tue, 23 Jun 2020 08:03:54 +1000
From:   Dave Chinner <david@...morbit.com>
To:     "J. Bruce Fields" <bfields@...ldses.org>
Cc:     Masayoshi Mizuma <msys.mizuma@...il.com>,
        Eric Sandeen <sandeen@...deen.net>,
        "Darrick J. Wong" <darrick.wong@...cle.com>,
        Christoph Hellwig <hch@...radead.org>,
        Theodore Ts'o <tytso@....edu>,
        Andreas Dilger <adilger.kernel@...ger.ca>,
        Alexander Viro <viro@...iv.linux.org.uk>,
        Masayoshi Mizuma <m.mizuma@...fujitsu.com>,
        linux-ext4@...r.kernel.org, linux-fsdevel@...r.kernel.org,
        linux-xfs <linux-xfs@...r.kernel.org>, jlayton@...hat.com
Subject: Re: [PATCH] fs: i_version mntopt gets visible through /proc/mounts

On Mon, Jun 22, 2020 at 05:26:12PM -0400, J. Bruce Fields wrote:
> On Sun, Jun 21, 2020 at 09:54:08AM +1000, Dave Chinner wrote:
> > On Fri, Jun 19, 2020 at 09:56:33PM -0400, J. Bruce Fields wrote:
> > > On Sat, Jun 20, 2020 at 11:49:57AM +1000, Dave Chinner wrote:
> > > > However, other people have different opinions on this matter (and we
> > > > know that from the people who considered XFS v4 -> v5 going slower
> > > > because iversion a major regression), and so we must acknowledge
> > > > those opinions even if we don't agree with them.
> > > 
> > > Do you have any of those reports handy?  Were there numbers?
> > 
> > e.g.  RH BZ #1355813 when v5 format was enabled by default in RHEL7.
> > Numbers were 40-47% performance degradation for in-cache writes
> > caused by the original IVERSION implementation using iozone.  There
> > were others I recall, all realted to similar high-IOP small random
> > writes workloads typical of databases....
> 
> Thanks, that's an interesting bug!  Though a bit tangled.  This is where
> you identified the change attribute as the main culprit:
> 
> 	https://bugzilla.redhat.com/show_bug.cgi?id=1355813#c42
> 
> 	The test was running at 70,000 writes/s (2.2GB/s), so it was one
> 	transaction per write() syscall: timestamp updates. On CRC
> 	enabled filesystems, we have a change counter for NFSv4 - it
> 	gets incremented on every write() syscall, even when the
> 	timestamp doesn't change. That's the difference in behaviour and
> 	hence performance in this test.
> 
> In RHEL8, or anything post-v4.16, the frequency of change attribute
> updates should be back down to that of timestamp updates on this
> workload.  So it'd be interesting to repeat that experiment now.

Yup, which in itself has been a problem for similar workloads.
There's a reason we now recommend the use of lazytime for high
performance database workloads that can do hundreds of thousands of
small write IOs a second...

> The bug was reporting in-house testing, and doesn't show any evidence
> that particular regression was encountered by users; Eric said:
> 
> 	https://bugzilla.redhat.com/show_bug.cgi?id=1355813#c52
> 
> 	Root cause of this minor in-memory regression was inode
> 	versioning behavior; as it's unlikely to have real-world effects
> 	(and has been open for years with no customer complaints) I'm
> 	closing this WONTFIX to get it off the radar.

It's just the first I found because bugzilla has a slow, less than
useful search engine. We know that real applications have
hit this, and we know even the overhead of timestamp updates on
writes is way too high for them.

> The typical user may just skip an upgrade or otherwise work around the
> problem rather than root-causing it like this, so absence of reports
> isn't conclusive.  I understand wanting to err on the side of caution.

Yup, it's a generic problem - just because we've worked around or
mitigated the most common situations it impacts performance, that
doesn't mean they work for everyone....

Cheers,

Dave.
-- 
Dave Chinner
david@...morbit.com