lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <875zk0e7cd.fsf@notabene.neil.brown.name>
Date:   Mon, 04 Nov 2019 12:52:18 +1100
From:   NeilBrown <neilb@...e.de>
To:     Stephan <stephanwib@...glemail.com>, linux-kernel@...r.kernel.org
Subject: Re: Process waiting on NFS transitions to uninterruptable sleep when receiving a signal with custom signal handler

On Mon, Oct 28 2019, Stephan wrote:

> Hello everyone,
>
> I have asked this question on Stackoverflow a while ago but
> unfortunately nobody had an idea on this.
>
> I am currently doing some research on how we can extend the monitoring
> solution for Linux in our datacenter in order to detect inaccessible
> NFS mounts. My idea was to look for NFS mounts in /proc/self/mountinfo
> and then for each mount, call alarm(), issue a syncronous
> interruptible call via stat()/fsstat() or similar, and in case of an
> alarm, return an error in the signal handler. However, I experienced
> the following behaviour which I am not sure how to explain or debug.
>
> It turned out that when a process waiting in the stat system call on a
> mountpoint of a diconnected NFS server, it responds to signals as
> expected. For example, one can exit it pressing Strc+C, or it displays
> "Alarm clock" and ends when the alarm timer fires. The same applies
> e.g. to SIGUSR1/2, leading the program to display "User defined signal
> 1" (or "2") and end. I suspect these messages come from a general
> signal dispatcher inside glibc, but it would be nice to hear some
> details on how this works.

The messages come from your shell (e.g. bash).  The process exits with a
status that means "I was killed by signal XX", and bash reports that.

>
> In all cases in which a custom signal handler was registered, the
> process transitions to an uninterruptible sleep state when a signal
> for this custom handler is scheduled; leading to no other signal being
> processed anymore. Of course this applies to SIGALRM as well when the
> alarm() timer sends the signal. All signals show up in
> /proc/PID/status as below:

In these cases, NFS does a 'killable' wait.  That means that only way to
interrupt the wait is to kill the process (so that it dies).
One justification for this is that there is no error that POSIX allows
stat (or other calls) to return if it takes "too long".  So the
systemcall cannot just fail - instead the whole process needs to die.

So you want your monitoring process to fork, and then access the
filesystem from the child.  If that takes too long, kill the child from
the parent - or set an alarm in the child and let it kill itself.

The parent can then respond to th fact that the child didn't exit
cleanly.

NeilBrown

Download attachment "signature.asc" of type "application/pgp-signature" (833 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ