lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20191206031917.GA7304@paulmck-ThinkPad-P72>
Date:   Thu, 5 Dec 2019 19:19:17 -0800
From:   "Paul E. McKenney" <paulmck@...nel.org>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     Tejun Heo <tj@...nel.org>, jiangshanlai@...il.com,
        linux-kernel@...r.kernel.org, Ingo Molnar <mingo@...hat.com>,
        Thomas Gleixner <tglx@...utronix.de>
Subject: Re: Workqueues splat due to ending up on wrong CPU

On Thu, Dec 05, 2019 at 06:48:05AM -0800, Paul E. McKenney wrote:
> On Thu, Dec 05, 2019 at 11:32:13AM +0100, Peter Zijlstra wrote:
> > On Thu, Dec 05, 2019 at 11:29:28AM +0100, Peter Zijlstra wrote:
> > > On Wed, Dec 04, 2019 at 12:11:50PM -0800, Paul E. McKenney wrote:
> > > 
> > > > And the good news is that I didn't see the workqueue splat, though my
> > > > best guess is that I had about a 13% chance of not seeing it due to
> > > > random chance (and I am currently trying an idea that I hope will make
> > > > it more probable).  But I did get a couple of new complaints about RCU
> > > > being used illegally from an offline CPU.  Splats below.
> > > 
> > > Shiny!
> 
> And my attempt to speed things up did succeed, but the success was limited
> to finding more places where rcutorture chokes on CPUs being slow to boot.
> Fixing those and trying again...

And I finally got a clean run, so I am doing an overnight high-stress
baseline without your patch.  The hope is that this ups the probability
of occurrence such that I can get decent stats on your patch much more
quickly.

I got several more lockdep-RCU splats similar to the one I posted earlier.

I also got the following NULL-pointer dereference out of ACPI.  This had
surprisingly little effect on rcutorture, other than preventing it from
going down cleanly once the test had completed.  No idea whether it is
related, though the acpi_soft_cpu_online() in the stack trace makes me
suspect that it might be.

							Thanx, Paul

------------------------------------------------------------------------

[   97.122142] BUG: kernel NULL pointer dereference, address: 0000000000000028
[   97.224145] #PF: supervisor read access in kernel mode
[   97.301663] #PF: error_code(0x0000) - not-present page
[   97.376716] PGD 0 P4D 0 
[   97.413183] Oops: 0000 [#1] PREEMPT SMP PTI
[   97.472911] CPU: 1 PID: 13 Comm: cpuhp/1 Not tainted 5.4.0-rc1+ #124
[   97.562978] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.11.0-2.el7 04/01/2014
[   97.684860] RIP: 0010:acpi_processor_get_platform_limit+0x5b/0xf0
[   97.772315] Code: 48 c7 c6 5b d6 71 a7 e8 d3 4c fc ff 83 f8 05 74 0b 85 c0 c6 05 21 7c 34 02 01 75 78 48 8b 04 24 48 8b 93 78 02 00 00 89 43 18 <8b> 4a 28 48 39 c8 73 2e 48 83 bb a0 03 00 00 00 74 24 48 8d 04 40
[   98.042953] RSP: 0000:ffffb4490007bdf8 EFLAGS: 00010246
[   98.118508] RAX: 0000000000000000 RBX: ffff8cf91ef09800 RCX: 0000000000000000
[   98.220612] RDX: 0000000000000000 RSI: ffff8cf91f32c910 RDI: 000000000002c910
[   98.322913] RBP: 00000000000000b3 R08: 0000000000000000 R09: 0000000000000000
[   98.423893] R10: 0000000000000001 R11: ffffb4490007bbe0 R12: ffff8cf91f316500
[   98.525024] R13: 0000000000000001 R14: ffffffffa68c1630 R15: 00000000000000cd
[   98.627690] FS:  0000000000000000(0000) GS:ffff8cf91f300000(0000) knlGS:0000000000000000
[   98.740706] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   98.819911] CR2: 0000000000000028 CR3: 0000000010e1e000 CR4: 00000000000006e0
[   98.918566] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   99.017110] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   99.115445] Call Trace:
[   99.148986]  acpi_processor_ppc_has_changed+0x25/0x80
[   99.218892]  acpi_soft_cpu_online+0x4e/0xc0
[   99.276406]  cpuhp_invoke_callback+0xb5/0xa30
[   99.335921]  cpuhp_thread_fun+0x179/0x210
[   99.390697]  ? cpuhp_thread_fun+0x34/0x210
[   99.445818]  smpboot_thread_fn+0x169/0x240
[   99.501944]  kthread+0xf3/0x130
[   99.545137]  ? sort_range+0x20/0x20
[   99.592636]  ? kthread_cancel_delayed_work_sync+0x10/0x10
[   99.665536]  ret_from_fork+0x3a/0x50
[   99.715131] Modules linked in:
[   99.755972] CR2: 0000000000000028
[   99.800216] ---[ end trace 4f464e92083c0f67 ]---
[   99.862848] RIP: 0010:acpi_processor_get_platform_limit+0x5b/0xf0
[   99.945285] Code: 48 c7 c6 5b d6 71 a7 e8 d3 4c fc ff 83 f8 05 74 0b 85 c0 c6 05 21 7c 34 02 01 75 78 48 8b 04 24 48 8b 93 78 02 00 00 89 43 18 <8b> 4a 28 48 39 c8 73 2e 48 83 bb a0 03 00 00 00 74 24 48 8d 04 40
  100.200302] RSP: 0000:ffffb4490007bdf8 EFLAGS: 00010246
[  100.268022] RAX: 0000000000000000 RBX: ffff8cf91ef09800 RCX: 0000000000000000
[  100.361897] RDX: 0000000000000000 RSI: ffff8cf91f32c910 RDI: 000000000002c910
[  100.454643] RBP: 00000000000000b3 R08: 0000000000000000 R09: 0000000000000000
[  100.548611] R10: 0000000000000001 R11: ffffb4490007bbe0 R12: ffff8cf91f316500
[  100.642858] R13: 0000000000000001 R14: ffffffffa68c1630 R15: 00000000000000cd
[  100.739082] FS:  0000000000000000(0000) GS:ffff8cf91f300000(0000) knlGS:0000000000000000
[  100.844924] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  100.919117] CR2: 0000000000000028 CR3: 0000000010e1e000 CR4: 00000000000006e0
[  101.011402] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  101.105372] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ