linux-kernel - Re: [LKP] [ext4] fde872682e: fsmark.files_per

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20190313153056.GB672@mit.edu>
Date:   Wed, 13 Mar 2019 11:30:56 -0400
From:   "Theodore Ts'o" <tytso@....edu>
To:     huang ying <huang.ying.caritas@...il.com>
Cc:     kernel test robot <rong.a.chen@...el.com>,
        LKML <linux-kernel@...r.kernel.org>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        LKP ML <lkp@...org>, Huang Ying <ying.huang@...el.com>
Subject: Re: [LKP] [ext4] fde872682e: fsmark.files_per_sec -38.0% regression

On Wed, Mar 13, 2019 at 03:26:39PM +0800, huang ying wrote:
> >
> >
> > commit: fde872682e175743e0c3ef939c89e3c6008a1529 ("ext4: force inode writes when nfsd calls commit_metadata()")
> > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
> 
> It appears that this is a performance regression caused by a
> functionality fixing.  So we should ignore this?

Yes, this is a correctness issue that we discovered while tracking
down user data loss issue after a crash of the NFS server, so this is
a change we have to keep.  When the NFS folks added the
commit_metadata() hook, they didn't realize that the fallback path in
nfsd/vfs.c using sync_inode_metadata() doesn't work on all file
systems --- and in particular doesn't work for ext3 and ext4 because
of how we do journalling.

It only applies to NFS serving, not local ext4 use cases, so most ext4
users won't be impacted on it; only those who export those file
systems using NFS.

I do have some plans on how to claw back the performance hit.  The
good news is that it won't require an on-disk format change; the bad
news is that it's a non-trivial change to how journalling works, and
it's not something we can backport to the stable kernel series.  It's
something we're going to have to leave to a distribution who is
willing to do a lot of careful regression testing, once the change is
available, maybe in 3 months or so.

     	 	      	  	      	  - Ted

P.S.  I *believe* all other file systems should be OK, and I didn't
want to impose a performance tax on all other file systems (such as
btrfs), so I fixed it in an ext4-specific way.  The more
general/conservative change would be to fall back to using fsync in
nfs/vfs.c:commit_metadata() unless the file system specifically set a
superblock flag indicating that using sync_inode_metadata is safe.
OTOH we lived with this flaw in ext3/ext4 for *years* without anyone
noticing or complaining, so....