lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Wed, 10 Apr 2024 11:18:04 +0200
From: "Linux regression tracking (Thorsten Leemhuis)"
 <regressions@...mhuis.info>
To: Bjorn Andersson <andersson@...nel.org>, Tejun Heo <tj@...nel.org>
Cc: LKML <linux-kernel@...r.kernel.org>,
 Linux regressions mailing list <regressions@...ts.linux.dev>
Subject: Re: Bug 218665 - nohz_full=0 prevents kernel from booting

On 08.04.24 00:52, Bjorn Andersson wrote:
> On Tue, Apr 02, 2024 at 10:17:16AM +0200, Linux regression tracking (Thorsten Leemhuis) wrote:
>>
>> I noticed a regression report in bugzilla.kernel.org. As many (most?)
>> kernel developers don't keep an eye on it, I decided to forward it by mail.
>>
>> Tejun, apparently it's cause by a change of yours.
>>
>> Note, you have to use bugzilla to reach the reporter, as I sadly[1] can
>> not CCed them in mails like this.
>>
>> Quoting from https://bugzilla.kernel.org/show_bug.cgi?id=218665 :
>>
>>> booting the current kernel (6.9.0-rc1, master/712e1425) on x86_64
>>> with nohz_full=0 cause a page fault and prevents the kernel from
>>> booting.
> [...]
> In addition to this report, I have finally bisected another regression
> to the same commit:
> 
> I start neovim, send SIGSTOP (i.e. ^Z) to it, start another neovim
> instance and upon sending SIGSTOP to that instance all of userspace
> locks up - 100% reproducible.
> 
> The kernel seems to continue to operate, and tapping the power button
> dislodge the lockup and I get a clean shutdown.
> 
> This is seen on multiple Arm64 (Qualcomm) machines with upstream
> defconfig since commit '5797b1c18919 ("workqueue: Implement system-wide
> nr_active enforcement for unbound workqueues")'.

Hmmm, I had hoped Tejun would reply and share an opinion if these
problems are related. But that didn't happen. :-/ So let me at least ask
one question that might help to answer that question: is the machine
using CPU isolation, like the two other reports about problems caused by
this commit do (see the
https://bugzilla.kernel.org/show_bug.cgi?id=218665 and
https://lore.kernel.org/all/20240402105847.GA24832@redhat.com/ for
details) ?

Ciao, Thorsten

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ