lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 18 Nov 2021 00:32:33 +0100
From:   Holger Hoffstätte <holger@...lied-asynchrony.com>
To:     Justin Forbes <jmforbes@...uxtx.org>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>
Cc:     linux-kernel@...r.kernel.org, torvalds@...ux-foundation.org,
        akpm@...ux-foundation.org, linux@...ck-us.net, shuah@...nel.org,
        patches@...nelci.org, lkft-triage@...ts.linaro.org, pavel@...x.de,
        jonathanh@...dia.com, f.fainelli@...il.com, stable@...r.kernel.org
Subject: Re: [PATCH 5.15 000/923] 5.15.3-rc3 review

On 2021-11-17 22:32, Justin Forbes wrote:
> On Wed, Nov 17, 2021 at 11:19:15AM +0100, Greg Kroah-Hartman wrote:
>> This is the start of the stable review cycle for the 5.15.3 release.
>> There are 923 patches in this series, all will be posted as a response
>> to this one.  If anyone has any issues with these being applied, please
>> let me know.
>>
>> Responses should be made by Fri, 19 Nov 2021 10:14:52 +0000.
>> Anything received after that time might be too late.
>>
>> The whole patch series can be found in one patch at:
>> 	https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.15.3-rc3.gz
>> or in the git tree and branch at:
>> 	git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.15.y
>> and the diffstat can be found below.
>>
> 
> I replied to Bruno's original message to lkml which has CKI artifacts
> for the issue, but I am still seeing it with rc3 on x86:
> 
> [    4.435551] BUG: unable to handle page fault for address: ffffb381402d7de0
> [    4.437498] #PF: supervisor read access in kernel mode
> [    4.438937] #PF: error_code(0x0000) - not-present page
> [    4.440373] PGD 100000067 P4D 100000067 PUD 1001d7067 PMD 100a1f067 PTE 0
> [    4.442269] Oops: 0000 [#1] SMP PTI
> [    4.443256] CPU: 1 PID: 1 Comm: systemd Not tainted 5.15.3-0.rc3.1.fc35.x86_64 #1
> [    4.445230] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-3.fc34 04/01/2014
> [    4.447514] RIP: 0010:__unwind_start+0x10b/0x1e0
> [    4.448749] Code: af fb ff 85 c0 75 d2 eb c0 65 48 8b 04 25 c0 fb 01 00 48 39 c6 0f 84 86 00 00 00 48 8b 86 98 23 00 00 48 8d 78 38 48 89 7d 38 <48> 8b 50 28 48 89 55 40 48 8b 40 30 48 89 45 48 48 3d 80 43 00 a1
> [    4.453406] RSP: 0018:ffffb38140017c18 EFLAGS: 00010006
> [    4.454672] RAX: ffffb381402d7db8 RBX: ffffb381402d7db8 RCX: 0000000000000000
> [    4.456370] RDX: 0000000000000000 RSI: ffff9b5080c08000 RDI: ffffb381402d7df0
> [    4.458065] RBP: ffffb38140017c38 R08: 0000000000000040 R09: 0000000000005000
> [    4.459689] R10: 8000000000000000 R11: 0000000000000000 R12: 0000000000000000
> [    4.461306] R13: ffff9b5080c08c74 R14: 000000000000024b R15: 0000000000000001
> [    4.462857] FS:  00007f8d7729c340(0000) GS:ffff9b51f7d00000(0000) knlGS:0000000000000000
> [    4.464613] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    4.465825] CR2: ffffb381402d7de0 CR3: 0000000100244004 CR4: 0000000000770ee0
> [    4.467301] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [    4.468789] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [    4.470217] PKRU: 55555554
> [    4.470777] Call Trace:
> [    4.471280]  <TASK>
> [    4.471718]  __get_wchan+0x35/0x80
> [    4.472415]  get_wchan+0x65/0x80
> [    4.473085]  do_task_stat+0xcd9/0xde0
> [    4.473821]  proc_single_show+0x4d/0xb0
> [    4.474583]  seq_read_iter+0x120/0x4b0
> [    4.475327]  seq_read+0xed/0x120
> [    4.475973]  ? cap_convert_nscap+0x160/0x1b0
> [    4.476832]  vfs_read+0x95/0x190
> [    4.477472]  ksys_read+0x4f/0xc0
> [    4.478115]  do_syscall_64+0x3b/0x90
> [    4.478830]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> [    4.479823] RIP: 0033:0x7f8d77e2c31c
> [    4.480537] Code: ec 28 48 89 54 24 18 48 89 74 24 10 89 7c 24 08 e8 f9 49 f9 ff 48 8b 54 24 18 48 8b 74 24 10 41 89 c0 8b 7c 24 08 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 34 44 89 c7 48 89 44 24 08 e8 4f 4a f9 ff 48
> [    4.484140] RSP: 002b:00007ffc2434e8c0 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
> [    4.485608] RAX: ffffffffffffffda RBX: 000055aa6dc4f650 RCX: 00007f8d77e2c31c
> [    4.486991] RDX: 0000000000000400 RSI: 000055aa6dcaf960 RDI: 0000000000000005
> [    4.488376] RBP: 00007f8d77f00300 R08: 0000000000000000 R09: 0000000000000001
> [    4.489761] R10: 0000000000001000 R11: 0000000000000246 R12: 00007f8d7729c0f8
> [    4.491159] R13: 0000000000000d68 R14: 00007f8d77eff700 R15: 0000000000000d68
> [    4.492545]  </TASK>
> [    4.492982] Modules linked in: xfs crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel serio_raw virtio_console virtio_blk virtio_net net_failover failover qemu_fw_cfg pkcs8_key_parser
> [    4.496354] CR2: ffffb381402d7de0
> [    4.497010] ---[ end trace dc5691b47f8ba15b ]---
> [    4.497913] RIP: 0010:__unwind_start+0x10b/0x1e0
> [    4.498822] Code: af fb ff 85 c0 75 d2 eb c0 65 48 8b 04 25 c0 fb 01 00 48 39 c6 0f 84 86 00 00 00 48 8b 86 98 23 00 00 48 8d 78 38 48 89 7d 38 <48> 8b 50 28 48 89 55 40 48 8b 40 30 48 89 45 48 48 3d 80 43 00 a1
> [    4.502401] RSP: 0018:ffffb38140017c18 EFLAGS: 00010006
> [    4.503418] RAX: ffffb381402d7db8 RBX: ffffb381402d7db8 RCX: 0000000000000000
> [    4.504803] RDX: 0000000000000000 RSI: ffff9b5080c08000 RDI: ffffb381402d7df0
> [    4.506185] RBP: ffffb38140017c38 R08: 0000000000000040 R09: 0000000000005000
> [    4.507582] R10: 8000000000000000 R11: 0000000000000000 R12: 0000000000000000
> [    4.508956] R13: ffff9b5080c08c74 R14: 000000000000024b R15: 0000000000000001
> [    4.510339] FS:  00007f8d7729c340(0000) GS:ffff9b51f7d00000(0000) knlGS:0000000000000000
> [    4.511914] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    4.513032] CR2: ffffb381402d7de0 CR3: 0000000100244004 CR4: 0000000000770ee0
> [    4.514420] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [    4.515803] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [    4.517182] PKRU: 55555554
> [    4.517724] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009
> [    4.519317] Kernel Offset: 0x20000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> [    4.521398] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009 ]---
> 

This is great! Several people (incl. me) have seen the _exact same_ trace, but with
BMQ/PDS (custom CPU schedulers) so we suspected a locking issue/incompatibility in
get_wchan()'s spinlocking & task diddling compared to CFS. The fact that this happens
with vanilla means it's a generic problem with either: "sched: Add wrapper for get_wchan()
to keep task blocked" or "x86: Fix get_wchan() to support the ORC unwinder" or both.
I have been running with a dummy implementation of get_wchan that just returns 0
(effectively disabling wchan) and 5.15.3-rc3 has been rock-solid again.

Maybe just revert all the wchan stuff and let it stew in mainline a bit longer?

-h

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ