lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date: Sun, 23 Jun 2024 07:28:04 -0700
From: Guenter Roeck <linux@...ck-us.net>
To: Helge Deller <deller@....de>,
 Greg Kroah-Hartman <gregkh@...uxfoundation.org>, stable@...r.kernel.org
Cc: patches@...ts.linux.dev, linux-kernel@...r.kernel.org,
 torvalds@...ux-foundation.org, akpm@...ux-foundation.org, shuah@...nel.org,
 patches@...nelci.org, lkft-triage@...ts.linaro.org, pavel@...x.de,
 jonathanh@...dia.com, f.fainelli@...il.com, sudipm.mukherjee@...il.com,
 srw@...dewatkins.net, rwarsow@....de, conor@...nel.org,
 allen.lkml@...il.com, broonie@...nel.org, Oleg Nesterov <oleg@...hat.com>,
 linux-parisc@...r.kernel.org,
 "James E.J. Bottomley" <James.Bottomley@...senPartnership.com>
Subject: Re: [PATCH 6.1 000/217] 6.1.95-rc1 review [parisc64/C3700 boot
 failures]

On 6/22/24 08:13, Helge Deller wrote:
> On 6/22/24 16:58, Guenter Roeck wrote:
>> [ Copying parisc maintainers - maybe they can test on real hardware ]
>>
>> On 6/19/24 05:54, Greg Kroah-Hartman wrote:
>>> This is the start of the stable review cycle for the 6.1.95 release.
>>> There are 217 patches in this series, all will be posted as a response
>>> to this one.  If anyone has any issues with these being applied, please
>>> let me know.
>>>
>>> Responses should be made by Fri, 21 Jun 2024 12:55:11 +0000.
>>> Anything received after that time might be too late.
>>>
>> ...
>>> Oleg Nesterov <oleg@...hat.com>
>>>      zap_pid_ns_processes: clear TIF_NOTIFY_SIGNAL along with TIF_SIGPENDING
>>>
>>
>> I can not explain it, but this patch causes all my parisc64 (C3700)
>> boot tests to crash. There are lots of memory corruption BUGs such as
>>
>> [    0.000000] =============================================================================
>> [    0.000000] BUG kmalloc-96 (Not tainted): Padding overwritten. 0x0000000043411dd0-0x0000000043411f5f @offset=3536
>>
>> ultimately followed by
>>
>> [    0.462562] Unaligned handler failed, ret = -14
>> ...
>> [    0.469160]  IAOQ[0]: idr_alloc_cyclic+0x48/0x118
>> [    0.469372]  IAOQ[1]: idr_alloc_cyclic+0x54/0x118
>> [    0.469548]  RP(r2): __kernfs_new_node.constprop.0+0x160/0x420
>> [    0.469782] Backtrace:
>> [    0.469928]  [<00000000404af108>] __kernfs_new_node.constprop.0+0x160/0x420
>> [    0.470285]  [<00000000404b0cac>] kernfs_new_node+0xbc/0x118
>> [    0.470523]  [<00000000404b158c>] kernfs_create_empty_dir+0x54/0xf0
>> [    0.470756]  [<00000000404b665c>] sysfs_create_mount_point+0x4c/0xb0
>> [    0.470996]  [<00000000401181cc>] cgroup_init+0x5b4/0x738
>> [    0.471213]  [<0000000040102220>] start_kernel+0x1238/0x1308
>> [    0.471429]  [<0000000040107c90>] start_parisc+0x188/0x1d0
>> ...
>> [    0.474956] Kernel panic - not syncing: Attempted to kill the idle task!
>> SeaBIOS wants SYSTEM RESET.
>>
>> This is with qemu v9.0.1.
> 
> Just to be sure, did you tested the same kernel on physical hardware as well?
> 
> Please note, that 64-bit hppa (C3700) support in qemu was just recently added
> and is still considered experimental.
> So, maybe it's not a bug in the source, but in qemu...?!?
> 

Following up on this for everyone: Helge doesn't see the problem on real hardware.
I can make the problem disappear by any of the following:
- Use gcc 13.3 instead of 12.3
- Disable CONFIG_KUNIT
- Enable CONFIG_PAGE_POISONING (without actually enabling it in the runtime)

Overall, that suggests some kind of heisenbug, most likely in qemu,
unrelated to the commit above.

Thanks, and sorry for the noise.

Guenter


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ