lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 8 Jul 2016 13:05:40 +0000
From:	Trond Myklebust <trondmy@...marydata.com>
To:	Seth Forshee <seth.forshee@...onical.com>
CC:	Chinner Dave <david@...morbit.com>,
	Jeff Layton <jlayton@...hat.com>,
	Schumaker Anna <anna.schumaker@...app.com>,
	"linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>,
	"linux-nfs@...r.kernel.org" <linux-nfs@...r.kernel.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	Tycho Andersen <tycho.andersen@...onical.com>
Subject: Re: Hang due to nfs letting tasks freeze with locked inodes


> On Jul 8, 2016, at 08:55, Trond Myklebust <trondmy@...marydata.com> wrote:
> 
> 
>> On Jul 8, 2016, at 08:48, Seth Forshee <seth.forshee@...onical.com> wrote:
>> 
>> On Fri, Jul 08, 2016 at 09:53:30AM +1000, Dave Chinner wrote:
>>> On Wed, Jul 06, 2016 at 06:07:18PM -0400, Jeff Layton wrote:
>>>> On Wed, 2016-07-06 at 12:46 -0500, Seth Forshee wrote:
>>>>> We're seeing a hang when freezing a container with an nfs bind mount while
>>>>> running iozone. Two iozone processes were hung with this stack trace.
>>>>> 
>>>>> [] schedule+0x35/0x80
>>>>> [] schedule_preempt_disabled+0xe/0x10
>>>>> [] __mutex_lock_slowpath+0xb9/0x130
>>>>> [] mutex_lock+0x1f/0x30
>>>>> [] do_unlinkat+0x12b/0x2d0
>>>>> [] SyS_unlink+0x16/0x20
>>>>> [] entry_SYSCALL_64_fastpath+0x16/0x71
>>>>> 
>>>>> This seems to be due to another iozone thread frozen during unlink with
>>>>> this stack trace:
>>>>> 
>>>>> [] __refrigerator+0x7a/0x140
>>>>> [] nfs4_handle_exception+0x118/0x130 [nfsv4]
>>>>> [] nfs4_proc_remove+0x7d/0xf0 [nfsv4]
>>>>> [] nfs_unlink+0x149/0x350 [nfs]
>>>>> [] vfs_unlink+0xf1/0x1a0
>>>>> [] do_unlinkat+0x279/0x2d0
>>>>> [] SyS_unlink+0x16/0x20
>>>>> [] entry_SYSCALL_64_fastpath+0x16/0x71
>>>>> 
>>>>> Since nfs is allowing the thread to be frozen with the inode locked it's
>>>>> preventing other threads trying to lock the same inode from freezing. It
>>>>> seems like a bad idea for nfs to be doing this.
>>>>> 
>>>> 
>>>> Yeah, known problem. Not a simple one to fix though.
>>> 
>>> Actually, it is simple to fix.
>>> 
>>> <insert broken record about suspend should be using freeze_super(),
>>> not sys_sync(), to suspend filesystem operations>
>>> 
>>> i.e. the VFS blocks new operations from starting, and then then the
>>> NFS client simply needs to implement ->freeze_fs to drain all it's
>>> active operations before returning. Problem solved.
>> 
>> No, this won't solve my problem. We're not doing a full suspend, rather
>> using a freezer cgroup to freeze a subset of processes. We don't want to
>> want to fully freeze the filesystem.
> 
> …and therein lies the rub. The whole cgroup freezer stuff assumes that you can safely deactivate a bunch of processes that may or may not hold state in the filesystem. That’s definitely not OK when you hold locks etc that can affect processes that lies outside the cgroup (and/or outside the NFS client itself).
> 

In case it wasn’t clear, I’m not just talking about VFS mutexes here. I’m also talking about all the other stuff, a lot of which the kernel has no control over, including POSIX file locking, share locks, leases/delegations, etc.

Trond

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ