lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20180305204836.qznlcm6uwurfs2n4@quack2.suse.cz>
Date:   Mon, 5 Mar 2018 21:48:36 +0100
From:   Jan Kara <jack@...e.cz>
To:     Dexuan Cui <decui@...rosoft.com>
Cc:     "linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>,
        Jan Kara <jack@...e.cz>, Amir Goldstein <amir73il@...il.com>,
        Miklos Szeredi <mszeredi@...hat.com>,
        Haiyang Zhang <haiyangz@...rosoft.com>,
        "'linux-kernel@...r.kernel.org'" <linux-kernel@...r.kernel.org>,
        Jork Loeser <Jork.Loeser@...rosoft.com>
Subject: Re: Any known soft lockup issue with vfs_write()->fsnotify()?

Hi!

On Fri 02-03-18 22:28:50, Dexuan Cui wrote:
> Recently people are getting a soft lock issue with vfs_write()->fsnotify(). 
> The detailed calltrace is available at:
> https://github.com/coreos/bugs/issues/2356
> https://github.com/coreos/bugs/issues/2364

I didn't see them yet.

> The kernel versions showing up the issue are:
> 4.14.11-coreos 
> 4.14.19-coreos
> 4.13.0-1009 -- this is the kernel with which I'm personally seeing the lockup.
> 
> I have not got a chance to try the latest mainline kernel yet.

It would be good to try 4.15 kernel to see whether recent fixes from Miklos
didn't fix your problem. They should be present in 4.14.11/19 kernels as
well but one never knows...

> Before the lockup error message suddenly appears, Linux has been running
> fine for many hours.  I have NOT found a consistent way to reproduce the
> lockup yet.
> 
> Looks the kernel is stuck in fsnotify(), when it tries to get the
> fsnotify_mark_srcu lock.

It is not possible that we would 'hang' in srcu_read_lock() - that is
just a read of one variable and increment of another. We'd have to be
looping somewhere and watchdog would have to happen to hit us always at
that place. Weird. Are you sure RIP points to srcu_read_lock?

> "git log fs/notify/fsnotify.c" on the latest mainline shows that some
> recent patches might help.
> 
> I'd like to check if this is a known issue.

As I've mentioned above, so far I didn't see reports like this...

								Honza
-- 
Jan Kara <jack@...e.com>
SUSE Labs, CR

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ