linux-kernel - Re: [PATCH 5.15 000/923] 5.15.3-rc3 review

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHk-=wjHbKfck1Ws4Y0pUZ7bxdjU9eh2WK0EFsv65utfeVkT9Q@mail.gmail.com>
Date:   Wed, 17 Nov 2021 15:50:17 -0800
From:   Linus Torvalds <torvalds@...ux-foundation.org>
To:     Holger Hoffstätte <holger@...lied-asynchrony.com>,
        Qi Zheng <zhengqi.arch@...edance.com>,
        Kees Cook <keescook@...omium.org>,
        "Peter Zijlstra (Intel)" <peterz@...radead.org>,
        Thomas Gleixner <tglx@...utronix.de>
Cc:     Justin Forbes <jmforbes@...uxtx.org>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Guenter Roeck <linux@...ck-us.net>,
        Shuah Khan <shuah@...nel.org>, patches@...nelci.org,
        lkft-triage@...ts.linaro.org, Pavel Machek <pavel@...x.de>,
        Jon Hunter <jonathanh@...dia.com>,
        Florian Fainelli <f.fainelli@...il.com>,
        stable <stable@...r.kernel.org>
Subject: Re: [PATCH 5.15 000/923] 5.15.3-rc3 review

Sorry for top-posting and quoting this all, but the actual people
involved with the wchan changes don't seem to be on the participant
list.

And if people see this in stable kernels, I suspect it happens in my
kernel too, but there are more users of stable backports.

I already complained about the locking for the get_wchan() patch for
other reasons, but it looks like this is an actual unwinder problem
that triggers this issue.

Which I'm not AT ALL surprised about. In my locking complaint, I
explicitly mentioned that we've had unwinder issues before, and that
it's probably a really bad idea to call some random unwinder inside a
critical lock.

I really don't think the WCHAN code should use unwinders at all. It's
too damn fragile, and it's too easily triggered from user space.

So I think we need to revert all the wchan changes. Not just in
stable, but in mainline too.

And I hope people start taking the whole "unwinding is so fraught with
problems that you shouldn't do it in any normal circumstances"
seriously.

We've had toolchain issues with unwinding markers being wrong or incomplete.

We've had our own asm files not have all the proper annotations.

And most importantly, we've had unwinding code that is written by
people who don't take these issues seriously enough, and just blindly
follow possibly corrupted or wrong pointers.

Ironically, in the scheduler pull request for this all, the claim was
"Make wchan() more robust".

Which was very much the opposite of what it actually did.

              Linus

On Wed, Nov 17, 2021 at 3:32 PM Holger Hoffstätte
<holger@...lied-asynchrony.com> wrote:
>
> On 2021-11-17 22:32, Justin Forbes wrote:
> > On Wed, Nov 17, 2021 at 11:19:15AM +0100, Greg Kroah-Hartman wrote:
> >> This is the start of the stable review cycle for the 5.15.3 release.
> >> There are 923 patches in this series, all will be posted as a response
> >> to this one.  If anyone has any issues with these being applied, please
> >> let me know.
> >>
> >> Responses should be made by Fri, 19 Nov 2021 10:14:52 +0000.
> >> Anything received after that time might be too late.
> >>
> >> The whole patch series can be found in one patch at:
> >>      https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.15.3-rc3.gz
> >> or in the git tree and branch at:
> >>      git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.15.y
> >> and the diffstat can be found below.
> >>
> >
> > I replied to Bruno's original message to lkml which has CKI artifacts
> > for the issue, but I am still seeing it with rc3 on x86:
> >
> > [    4.435551] BUG: unable to handle page fault for address: ffffb381402d7de0
> > [    4.437498] #PF: supervisor read access in kernel mode
> > [    4.438937] #PF: error_code(0x0000) - not-present page
> > [    4.440373] PGD 100000067 P4D 100000067 PUD 1001d7067 PMD 100a1f067 PTE 0
> > [    4.442269] Oops: 0000 [#1] SMP PTI
> > [    4.443256] CPU: 1 PID: 1 Comm: systemd Not tainted 5.15.3-0.rc3.1.fc35.x86_64 #1
> > [    4.445230] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-3.fc34 04/01/2014
> > [    4.447514] RIP: 0010:__unwind_start+0x10b/0x1e0
> > [    4.448749] Code: af fb ff 85 c0 75 d2 eb c0 65 48 8b 04 25 c0 fb 01 00 48 39 c6 0f 84 86 00 00 00 48 8b 86 98 23 00 00 48 8d 78 38 48 89 7d 38 <48> 8b 50 28 48 89 55 40 48 8b 40 30 48 89 45 48 48 3d 80 43 00 a1
> > [    4.453406] RSP: 0018:ffffb38140017c18 EFLAGS: 00010006
> > [    4.454672] RAX: ffffb381402d7db8 RBX: ffffb381402d7db8 RCX: 0000000000000000
> > [    4.456370] RDX: 0000000000000000 RSI: ffff9b5080c08000 RDI: ffffb381402d7df0
> > [    4.458065] RBP: ffffb38140017c38 R08: 0000000000000040 R09: 0000000000005000
> > [    4.459689] R10: 8000000000000000 R11: 0000000000000000 R12: 0000000000000000
> > [    4.461306] R13: ffff9b5080c08c74 R14: 000000000000024b R15: 0000000000000001
> > [    4.462857] FS:  00007f8d7729c340(0000) GS:ffff9b51f7d00000(0000) knlGS:0000000000000000
> > [    4.464613] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [    4.465825] CR2: ffffb381402d7de0 CR3: 0000000100244004 CR4: 0000000000770ee0
> > [    4.467301] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > [    4.468789] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > [    4.470217] PKRU: 55555554
> > [    4.470777] Call Trace:
> > [    4.471280]  <TASK>
> > [    4.471718]  __get_wchan+0x35/0x80
> > [    4.472415]  get_wchan+0x65/0x80
> > [    4.473085]  do_task_stat+0xcd9/0xde0
> > [    4.473821]  proc_single_show+0x4d/0xb0
> > [    4.474583]  seq_read_iter+0x120/0x4b0
> > [    4.475327]  seq_read+0xed/0x120
> > [    4.475973]  ? cap_convert_nscap+0x160/0x1b0
> > [    4.476832]  vfs_read+0x95/0x190
> > [    4.477472]  ksys_read+0x4f/0xc0
> > [    4.478115]  do_syscall_64+0x3b/0x90
> > [    4.478830]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> > [    4.479823] RIP: 0033:0x7f8d77e2c31c
> > [    4.480537] Code: ec 28 48 89 54 24 18 48 89 74 24 10 89 7c 24 08 e8 f9 49 f9 ff 48 8b 54 24 18 48 8b 74 24 10 41 89 c0 8b 7c 24 08 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 34 44 89 c7 48 89 44 24 08 e8 4f 4a f9 ff 48
> > [    4.484140] RSP: 002b:00007ffc2434e8c0 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
> > [    4.485608] RAX: ffffffffffffffda RBX: 000055aa6dc4f650 RCX: 00007f8d77e2c31c
> > [    4.486991] RDX: 0000000000000400 RSI: 000055aa6dcaf960 RDI: 0000000000000005
> > [    4.488376] RBP: 00007f8d77f00300 R08: 0000000000000000 R09: 0000000000000001
> > [    4.489761] R10: 0000000000001000 R11: 0000000000000246 R12: 00007f8d7729c0f8
> > [    4.491159] R13: 0000000000000d68 R14: 00007f8d77eff700 R15: 0000000000000d68
> > [    4.492545]  </TASK>
> > [    4.492982] Modules linked in: xfs crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel serio_raw virtio_console virtio_blk virtio_net net_failover failover qemu_fw_cfg pkcs8_key_parser
> > [    4.496354] CR2: ffffb381402d7de0
> > [    4.497010] ---[ end trace dc5691b47f8ba15b ]---
> > [    4.497913] RIP: 0010:__unwind_start+0x10b/0x1e0
> > [    4.498822] Code: af fb ff 85 c0 75 d2 eb c0 65 48 8b 04 25 c0 fb 01 00 48 39 c6 0f 84 86 00 00 00 48 8b 86 98 23 00 00 48 8d 78 38 48 89 7d 38 <48> 8b 50 28 48 89 55 40 48 8b 40 30 48 89 45 48 48 3d 80 43 00 a1
> > [    4.502401] RSP: 0018:ffffb38140017c18 EFLAGS: 00010006
> > [    4.503418] RAX: ffffb381402d7db8 RBX: ffffb381402d7db8 RCX: 0000000000000000
> > [    4.504803] RDX: 0000000000000000 RSI: ffff9b5080c08000 RDI: ffffb381402d7df0
> > [    4.506185] RBP: ffffb38140017c38 R08: 0000000000000040 R09: 0000000000005000
> > [    4.507582] R10: 8000000000000000 R11: 0000000000000000 R12: 0000000000000000
> > [    4.508956] R13: ffff9b5080c08c74 R14: 000000000000024b R15: 0000000000000001
> > [    4.510339] FS:  00007f8d7729c340(0000) GS:ffff9b51f7d00000(0000) knlGS:0000000000000000
> > [    4.511914] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [    4.513032] CR2: ffffb381402d7de0 CR3: 0000000100244004 CR4: 0000000000770ee0
> > [    4.514420] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > [    4.515803] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > [    4.517182] PKRU: 55555554
> > [    4.517724] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009
> > [    4.519317] Kernel Offset: 0x20000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> > [    4.521398] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009 ]---
> >
>
> This is great! Several people (incl. me) have seen the _exact same_ trace, but with
> BMQ/PDS (custom CPU schedulers) so we suspected a locking issue/incompatibility in
> get_wchan()'s spinlocking & task diddling compared to CFS. The fact that this happens
> with vanilla means it's a generic problem with either: "sched: Add wrapper for get_wchan()
> to keep task blocked" or "x86: Fix get_wchan() to support the ORC unwinder" or both.
> I have been running with a dummy implementation of get_wchan that just returns 0
> (effectively disabling wchan) and 5.15.3-rc3 has been rock-solid again.
>
> Maybe just revert all the wchan stuff and let it stew in mainline a bit longer?
>
> -h