lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Zip7qnyExaCNZhSy@tissot.1015granger.net>
Date: Thu, 25 Apr 2024 11:50:02 -0400
From: Chuck Lever <chuck.lever@...cle.com>
To: Chris Packham <Chris.Packham@...iedtelesis.co.nz>,
        Neil Brown <neilb@...e.de>
Cc: Jeff Layton <jlayton@...nel.org>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Linux NFS Mailing List <linux-nfs@...r.kernel.org>,
        netdev <netdev@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "stable@...r.kernel.org" <stable@...r.kernel.org>
Subject: Re: kernel BUG at net/sunrpc/svc.c:570 after updating from v5.15.153
 to v5.15.155

On Wed, Apr 24, 2024 at 02:03:22PM +0000, Chuck Lever III wrote:
> 
> > On Apr 24, 2024, at 9:33 AM, Chuck Lever III <chuck.lever@...cle.com> wrote:
> > 
> >> On Apr 24, 2024, at 3:42 AM, Chris Packham <Chris.Packham@...iedtelesis.co.nz> wrote:
> >> 
> >> On 24/04/24 13:38, Chris Packham wrote:
> >>> 
> >>> On 24/04/24 12:54, Chris Packham wrote:
> >>>> Hi Jeff, Chuck, Greg,
> >>>> 
> >>>> After updating one of our builds along the 5.15.y LTS branch our 
> >>>> testing caught a new kernel bug. Output below.
> >>>> 
> >>>> I haven't dug into it yet but wondered if it rang any bells.
> >>> 
> >>> A bit more info. This is happening at "reboot" for us. Our embedded 
> >>> devices use a bit of a hacked up reboot process so that they come back 
> >>> faster in the case of a failure.
> >>> 
> >>> It doesn't happen with a proper `systemctl reboot` or with a SYSRQ+B
> >>> 
> >>> I can trigger it with `killall -9 nfsd` which I'm not sure is a 
> >>> completely legit thing to do to kernel threads but it's probably close 
> >>> to what our customized reboot does.
> >> 
> >> I've bisected between v5.15.153 and v5.15.155 and identified commit 
> >> dec6b8bcac73 ("nfsd: Simplify code around svc_exit_thread() call in 
> >> nfsd()") as the first bad commit. Based on the context that seems to 
> >> line up with my reproduction. I'm wondering if perhaps something got 
> >> missed out of the stable track? Unfortunately I'm not able to run a more 
> >> recent kernel with all of the nfs related setup that is being used on  
> >> the system in question.
> > 
> > Thanks for bisecting, that would have been my first suggestion.
> > 
> > The backport included all of the NFSD patches up to v6.2, but
> > there might be a missing server-side SunRPC patch.
> 
> So dec6b8bcac73 ("nfsd: Simplify code around svc_exit_thread()
> call in  nfsd()") is from v6.6, so it was applied to v5.15.y
> only to get a subsequent NFSD fix to apply.
> 
> The immediately previous upstream commit is missing:
> 
>   390390240145 ("nfsd: don't allow nfsd threads to be signalled.")
> 
> For testing, I've applied this to my nfsd-5.15.y branch here:
> 
>   https://git.kernel.org/pub/scm/linux/kernel/git/cel/linux.git
> 
> However even if that fixes the reported crash, this suggests
> that after v6.6, nfsd threads are not going to respond to
> "killall -9 nfsd".

I cannot reproduce a crash when using "killall -9 nfsd" on the fixed
kernel above. On that kernel, the killall command does not terminate
the nfsd threads either.

Either we can apply 390390240145 to 5.15.y, or if the imperviousness
to "kill" is considered a regression, I can look into stripping out
dec6b8bcac73 and the subsequent commits that depend on it.


-- 
Chuck Lever

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ