lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190313153056.GB672@mit.edu>
Date:   Wed, 13 Mar 2019 11:30:56 -0400
From:   "Theodore Ts'o" <tytso@....edu>
To:     huang ying <huang.ying.caritas@...il.com>
Cc:     kernel test robot <rong.a.chen@...el.com>,
        LKML <linux-kernel@...r.kernel.org>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        LKP ML <lkp@...org>, Huang Ying <ying.huang@...el.com>
Subject: Re: [LKP] [ext4] fde872682e: fsmark.files_per_sec -38.0% regression

On Wed, Mar 13, 2019 at 03:26:39PM +0800, huang ying wrote:
> >
> >
> > commit: fde872682e175743e0c3ef939c89e3c6008a1529 ("ext4: force inode writes when nfsd calls commit_metadata()")
> > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
> 
> It appears that this is a performance regression caused by a
> functionality fixing.  So we should ignore this?

Yes, this is a correctness issue that we discovered while tracking
down user data loss issue after a crash of the NFS server, so this is
a change we have to keep.  When the NFS folks added the
commit_metadata() hook, they didn't realize that the fallback path in
nfsd/vfs.c using sync_inode_metadata() doesn't work on all file
systems --- and in particular doesn't work for ext3 and ext4 because
of how we do journalling.

It only applies to NFS serving, not local ext4 use cases, so most ext4
users won't be impacted on it; only those who export those file
systems using NFS.

I do have some plans on how to claw back the performance hit.  The
good news is that it won't require an on-disk format change; the bad
news is that it's a non-trivial change to how journalling works, and
it's not something we can backport to the stable kernel series.  It's
something we're going to have to leave to a distribution who is
willing to do a lot of careful regression testing, once the change is
available, maybe in 3 months or so.

     	 	      	  	      	  - Ted

P.S.  I *believe* all other file systems should be OK, and I didn't
want to impose a performance tax on all other file systems (such as
btrfs), so I fixed it in an ext4-specific way.  The more
general/conservative change would be to fall back to using fsync in
nfs/vfs.c:commit_metadata() unless the file system specifically set a
superblock flag indicating that using sync_inode_metadata is safe.
OTOH we lived with this flaw in ext3/ext4 for *years* without anyone
noticing or complaining, so....

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ