lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <BL0PR2101MB11085E320B6928A5A59DEF4ECADF0@BL0PR2101MB1108.namprd21.prod.outlook.com>
Date:   Thu, 8 Mar 2018 15:07:55 +0000
From:   Haiyang Zhang <haiyangz@...rosoft.com>
To:     Jan Kara <jack@...e.cz>, Dexuan Cui <decui@...rosoft.com>
CC:     "linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>,
        Amir Goldstein <amir73il@...il.com>,
        Miklos Szeredi <mszeredi@...hat.com>,
        "'linux-kernel@...r.kernel.org'" <linux-kernel@...r.kernel.org>,
        Jork Loeser <Jork.Loeser@...rosoft.com>,
        Stephen Hemminger <sthemmin@...rosoft.com>
Subject: RE: Any known soft lockup issue with vfs_write()->fsnotify()?

There was another report of the same issue on CoreOS, 4.14.11-coreos. The host/guest is AWS G4. So the problem is not limited to Azure VMs. It doesn't happen on older kernel like 4.4. Maybe the problem is related to some (recent) changes on fsnotify or other fs code?

Soft lockup kernel panic reboot on AWS instance on fsnotify and vfs_write  #2356
	https://github.com/coreos/bugs/issues/2356

Thanks,
- Haiyang

> -----Original Message-----
> From: Jan Kara <jack@...e.cz>
> Sent: Monday, March 5, 2018 3:49 PM
> To: Dexuan Cui <decui@...rosoft.com>
> Cc: linux-fsdevel@...r.kernel.org; Jan Kara <jack@...e.cz>; Amir Goldstein
> <amir73il@...il.com>; Miklos Szeredi <mszeredi@...hat.com>; Haiyang
> Zhang <haiyangz@...rosoft.com>; 'linux-kernel@...r.kernel.org' <linux-
> kernel@...r.kernel.org>; Jork Loeser <Jork.Loeser@...rosoft.com>
> Subject: Re: Any known soft lockup issue with vfs_write()->fsnotify()?
> 
> Hi!
> 
> On Fri 02-03-18 22:28:50, Dexuan Cui wrote:
> > Recently people are getting a soft lock issue with vfs_write()->fsnotify().
> > The detailed calltrace is available at:
> > https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithu
> >
> b.com%2Fcoreos%2Fbugs%2Fissues%2F2356&data=04%7C01%7Chaiyangz%40
> micros
> >
> oft.com%7Ca1b1bc6822c9442195ad08d582da7942%7C72f988bf86f141af91ab2
> d7cd
> >
> 011db47%7C1%7C0%7C636558797237925702%7CUnknown%7CTWFpbGZsb3d8
> eyJWIjoiM
> > C4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwifQ%3D%3D%7C-
> 2&sdata=pdwtsbU
> > 0%2FW3y7Zy%2BX%2Ffkbx%2FPktoKVBgimfxMyVk6Lyw%3D&reserved=0
> > https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithu
> >
> b.com%2Fcoreos%2Fbugs%2Fissues%2F2364&data=04%7C01%7Chaiyangz%40
> micros
> >
> oft.com%7Ca1b1bc6822c9442195ad08d582da7942%7C72f988bf86f141af91ab2
> d7cd
> >
> 011db47%7C1%7C0%7C636558797237925702%7CUnknown%7CTWFpbGZsb3d8
> eyJWIjoiM
> > C4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwifQ%3D%3D%7C-
> 2&sdata=w%2Bjed
> > u0yIYlpRut5sHa2%2Bhs5cdcdxp1dd3sHkyvRCPw%3D&reserved=0
> 
> I didn't see them yet.
> 
> > The kernel versions showing up the issue are:
> > 4.14.11-coreos
> > 4.14.19-coreos
> > 4.13.0-1009 -- this is the kernel with which I'm personally seeing the lockup.
> >
> > I have not got a chance to try the latest mainline kernel yet.
> 
> It would be good to try 4.15 kernel to see whether recent fixes from Miklos
> didn't fix your problem. They should be present in 4.14.11/19 kernels as well
> but one never knows...
> 
> > Before the lockup error message suddenly appears, Linux has been
> > running fine for many hours.  I have NOT found a consistent way to
> > reproduce the lockup yet.
> >
> > Looks the kernel is stuck in fsnotify(), when it tries to get the
> > fsnotify_mark_srcu lock.
> 
> It is not possible that we would 'hang' in srcu_read_lock() - that is just a read of
> one variable and increment of another. We'd have to be looping somewhere
> and watchdog would have to happen to hit us always at that place. Weird. Are
> you sure RIP points to srcu_read_lock?
> 
> > "git log fs/notify/fsnotify.c" on the latest mainline shows that some
> > recent patches might help.
> >
> > I'd like to check if this is a known issue.
> 
> As I've mentioned above, so far I didn't see reports like this...
> 
> 								Honza
> --
> Jan Kara <jack@...e.com>
> SUSE Labs, CR

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ