lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-id: <169922107188.24305.7903791112230110428@noble.neil.brown.name>
Date:   Mon, 06 Nov 2023 08:51:11 +1100
From:   "NeilBrown" <neilb@...e.de>
To:     "Donald Buczek" <buczek@...gen.mpg.de>
Cc:     "Linux Kernel Mailing List" <linux-kernel@...r.kernel.org>,
        linux-fsdevel@...r.kernel.org
Subject: Re: Heisenbug: I/O freeze can be resolved by cat $task/cmdline of
 unrelated process

On Sun, 05 Nov 2023, Donald Buczek wrote:
....
> 
>      for task in /proc/*/task/*; do
>          echo  "# # $task: $(cat $task/comm) : $(cat $task/cmdline | xargs -0 echo)"
>          cmd cat $task/stack
>      done
> 
> which can further be reduced to
> 
>      for task in /proc/*/task/*; do echo $task $(cat $task/cmdline | xargs -0 echo); done
> 
> This is absolutely reproducible. Above line unblocks the system reliably.
> 
> Another remarkable thing: We've modified above code to do the
> processes slowly one by one and checking after each step if I/O
> resumed.  And each time we've tested that, it was one of the 64 nfsd
> processes (but not the very first one tried).  While the systems
> exports filesystems, we have absolutely no reason to assume, that any
> client actually tries to access this nfs server.  Additionally, when
> the full script is run, the stack traces show all nfsd tasks in their
> normal idle state ( [<0>] svc_recv+0x7bd/0x8d0 [sunrpc] ).
> 
> Does anybody have an idea, how a `cat /proc/PID/cmdline` on a specific
> assumed-to-be-idle nfsd thread could have such an "healing" effect?

/proc/PID/cmndline for an nfsd thread is empty.  So it probably isn't
accessing 'cmdline' specifically that unblocks, but any (or almost any)
proc file for the process might help.

You say that *after* accessing cmdline, the "stack" file shows a normal
stack trace.  It might be interesting to see if that same stack is
present *before* accessing cmdline.  But my guess is that nfsd is mostly
a distraction.

It would help to see the fully "echo t > /proc/sysrq-trigger" list of all
process stacks.  That should reveal where the blockage is.

NeilBrown


> 
> I'm well aware, that, for example, a hardware problem might result in
> just anything and that the question might not be answerable at all.
> If so: please excuse the noise.
> 
> Thanks
> Donald
> -- 
> Donald Buczek
> buczek@...gen.mpg.de
> Tel: +49 30 8413 1433
> 
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ