linux-kernel - Re: Killing process in D state on mount to dead NFS server. (when process is in fsync)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAAMCDeeRWTEXu_UTWJ_aC_6Pb3286ijZByeDpwKwAeMqGBAODQ@mail.gmail.com>
Date:	Fri, 1 Aug 2014 20:50:13 -0500
From:	Roger Heflin <rogerheflin@...il.com>
To:	Jeff Layton <jlayton@...chiereds.net>
Cc:	NeilBrown <neilb@...e.de>, Ben Greear <greearb@...delatech.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	"linux-nfs@...r.kernel.org" <linux-nfs@...r.kernel.org>,
	Kernel development list <linux-kernel@...r.kernel.org>,
	linux-mm@...ck.org, linux-fsdevel@...r.kernel.org
Subject: Re: Killing process in D state on mount to dead NFS server. (when
 process is in fsync)

Doesn't NFS have an intr flag to allow kill -9 to work?   Whenever I
have had that set it has appeared to work after about 30 seconds or
so...without that kill -9 does not work when the nfs server is
missing.



On Fri, Aug 1, 2014 at 8:21 PM, Jeff Layton <jlayton@...chiereds.net> wrote:
> On Fri, 1 Aug 2014 07:50:53 +1000
> NeilBrown <neilb@...e.de> wrote:
>
>> On Thu, 31 Jul 2014 14:20:07 -0700 Ben Greear <greearb@...delatech.com> wrote:
>>
>> > -----BEGIN PGP SIGNED MESSAGE-----
>> > Hash: SHA1
>> >
>> > On 07/31/2014 01:42 PM, NeilBrown wrote:
>> > > On Thu, 31 Jul 2014 11:00:35 -0700 Ben Greear <greearb@...delatech.com> wrote:
>> > >
>> > >> So, this has been asked all over the interweb for years and years, but the best answer I can find is to reboot the system or create a fake NFS server
>> > >> somewhere with the same IP as the gone-away NFS server.
>> > >>
>> > >> The problem is:
>> > >>
>> > >> I have some mounts to an NFS server that no longer exists (crashed/powered down).
>> > >>
>> > >> I have some processes stuck trying to write to files open on these mounts.
>> > >>
>> > >> I want to kill the process and unmount.
>> > >>
>> > >> umount -l will make the mount go a way, sort of.  But process is still hung. umount -f complains: umount2:  Device or resource busy umount.nfs: /mnt/foo:
>> > >> device is busy
>> > >>
>> > >> kill -9 does not work on process.
>> > >
>> > > Kill -1 should work (since about 2.6.25 or so).
>> >
>> > That is -[ONE], right?  Assuming so, it did not work for me.
>>
>> No, it was "-9" .... sorry, I really shouldn't be let out without my proof
>> reader.
>>
>> However the 'stack' is sufficient to see what is going on.
>>
>> The problem is that it is blocked inside the "VM" well away from NFS and
>> there is no way for NFS to say "give up and go home".
>>
>> I'd suggest that is a bug.   I cannot see any justification for fsync to not
>> be killable.
>> It wouldn't be too hard to create a patch to make it so.
>> It would be a little harder to examine all call paths and create a
>> convincing case that the patch was safe.
>> It might be herculean task to convince others that it was the right thing
>> to do.... so let's start with that one.
>>
>> Hi Linux-mm and fs-devel people.  What do people think of making "fsync" and
>> variants "KILLABLE" ??
>>
>> I probably only need a little bit of encouragement to write a patch....
>>
>> Thanks,
>> NeilBrown
>>
>
>
> It would be good to fix this in some fashion once and for all, and the
> wait_on_page_writeback wait is a major source of pain for a lot of
> people.
>
> So to summarize...
>
> The problem in a nutshell is that Ben has some cached writes to the
> NFS server, but the server has gone away (presumably forever). The
> question is -- how do we communicate to the kernel that that server
> isn't coming back and that those dirty pages should be invalidated so
> that we can umount the filesystem?
>
> Allowing fsync/close to be killable sounds reasonable to me as at least
> a partial solution. Both close(2) and fsync(2) are allowed to return
> EINTR according to the POSIX spec. Allowing a kill -9 there seems
> like it should be fine, and maybe we ought to even consider letting it
> be susceptible to lesser signals.
>
> That still leaves some open questions though...
>
> Is that enough to fix it? You'd still have the dirty pages lingering
> around, right? Would a umount -f presumably work at that point?
>
>> >
>> > Kernel is 3.14.4+, with some of extra patches, but probably nothing that
>> > influences this particular behaviour.
>> >
>> > [root@...005-14010010 ~]# cat /proc/3805/stack
>> > [<ffffffff811371ba>] sleep_on_page+0x9/0xd
>> > [<ffffffff8113738e>] wait_on_page_bit+0x71/0x78
>> > [<ffffffff8113769a>] filemap_fdatawait_range+0xa2/0x16d
>> > [<ffffffff8113780e>] filemap_write_and_wait_range+0x3b/0x77
>> > [<ffffffffa0f04734>] nfs_file_fsync+0x37/0x83 [nfs]
>> > [<ffffffff811a8d32>] vfs_fsync_range+0x19/0x1b
>> > [<ffffffff811a8d4b>] vfs_fsync+0x17/0x19
>> > [<ffffffffa0f05305>] nfs_file_flush+0x6b/0x6f [nfs]
>> > [<ffffffff81183e46>] filp_close+0x3f/0x71
>> > [<ffffffff8119c8ae>] __close_fd+0x80/0x98
>> > [<ffffffff81183de5>] SyS_close+0x1c/0x3e
>> > [<ffffffff815c55f9>] system_call_fastpath+0x16/0x1b
>> > [<ffffffffffffffff>] 0xffffffffffffffff
>> > [root@...005-14010010 ~]# kill -1 3805
>> > [root@...005-14010010 ~]# cat /proc/3805/stack
>> > [<ffffffff811371ba>] sleep_on_page+0x9/0xd
>> > [<ffffffff8113738e>] wait_on_page_bit+0x71/0x78
>> > [<ffffffff8113769a>] filemap_fdatawait_range+0xa2/0x16d
>> > [<ffffffff8113780e>] filemap_write_and_wait_range+0x3b/0x77
>> > [<ffffffffa0f04734>] nfs_file_fsync+0x37/0x83 [nfs]
>> > [<ffffffff811a8d32>] vfs_fsync_range+0x19/0x1b
>> > [<ffffffff811a8d4b>] vfs_fsync+0x17/0x19
>> > [<ffffffffa0f05305>] nfs_file_flush+0x6b/0x6f [nfs]
>> > [<ffffffff81183e46>] filp_close+0x3f/0x71
>> > [<ffffffff8119c8ae>] __close_fd+0x80/0x98
>> > [<ffffffff81183de5>] SyS_close+0x1c/0x3e
>> > [<ffffffff815c55f9>] system_call_fastpath+0x16/0x1b
>> > [<ffffffffffffffff>] 0xffffffffffffffff
>> >
>> > Thanks,
>> > Ben
>> >
>> > > If it doesn't please report the kernel version and cat /proc/$PID/stack
>> > >
>> > > for some processes that cannot be killed.
>> > >
>> > > NeilBrown
>> > >
>> > >>
>> > >>
>> > >> Aside from bringing a fake NFS server back up on the same IP, is there any other way to get these mounts unmounted and the processes killed without
>> > >> rebooting?
>> > >>
>> > >> Thanks, Ben
>> > >>
>> > >
>> >
>> >
>> > - --
>> > Ben Greear <greearb@...delatech.com>
>> > Candela Technologies Inc  http://www.candelatech.com
>> >
>> > -----BEGIN PGP SIGNATURE-----
>> > Version: GnuPG v1.4.13 (GNU/Linux)
>> > Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
>> >
>> > iQEcBAEBAgAGBQJT2rLiAAoJELbHqkYeJT4OqPgH/0taKW6Be90c1mETZf9yeqZF
>> > YMLZk8XC2wloEd9nVz//mXREmiu18Hc+5p7Upd4Os21J2P4PBMGV6P/9DMxxehwH
>> > YX1HKha0EoAsbO5ILQhbLf83cRXAPEpvJPgYHrq6xjlKB8Q8OxxND37rY7kl19Zz
>> > sdAw6GiqHICF3Hq1ATa/jvixMluDnhER9Dln3wOdAGzmmuFYqpTsV4EwzbKKqInJ
>> > 6C15q+cq/9aYh6usN6z2qJhbHgqM9EWcPL6jOrCwX4PbC1XjKHekpFN0t9oKQClx
>> > qSPuweMQ7fP4IBd2Ke8L/QlyOVblAKSE7t+NdrjfzLmYPzyHTyfLABR/BI053to=
>> > =/9FJ
>> > -----END PGP SIGNATURE-----
>>
>
>
> --
> Jeff Layton <jlayton@...chiereds.net>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/