lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Sat, 30 Jul 2022 11:49:41 +0800
From:   Schspa Shi <schspa@...il.com>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     Lai Jiangshan <jiangshanlai@...il.com>, tj@...nel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH] workqueue: Use active mask for new worker when pool is
 DISASSOCIATED


Peter Zijlstra <peterz@...radead.org> writes:

> On Wed, Jul 13, 2022 at 05:52:58PM +0800, Lai Jiangshan wrote:
>> 
>> 
>> CC Peter.
>> Peter has changed the CPU binding code in workqueue.c.
>
> [ 1622.829091] WARNING: CPU: 3 PID: 31 at kernel/sched/core.c:7756 sched_cpu_dying+0x74/0x204
> [ 1622.829374] CPU: 3 PID: 31 Comm: migration/3 Tainted: P           O      5.10.59-rt52 #2
> 									^^^^^^^^^^^^^^^^^^^^^
>
> I think we can ignore this as being some ancient kernel. Please try
> something recent.

Hi peter:

I spent a few days writing a test case and reproduced the problem on
kernel 5.19. I think it's time for us to review the V3 patch for a fix.

The V3 patch is at
https://lore.kernel.org/all/20220714031645.28004-1-schspa@gmail.com/
Please help to review it.

Test branch as:
https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-rt-devel.git/tag/?h=v5.19-rc8-rt8

I think this code is new enough to demonstrate that the problem persists.

The log as fellowing:

[ 3103.198684] ------------[ cut here ]------------
[ 3103.198684] Dying CPU not properly vacated!
[ 3103.198684] WARNING: CPU: 1 PID: 23 at kernel/sched/core.c:9575 sched_cpu_dying.cold+0xc/0xc3
[ 3103.198684] Modules linked in: work_test(O)
[ 3103.198684] CPU: 1 PID: 23 Comm: migration/1 Tainted: G           O      5.19.0-rc8-rt8 #1
[ 3103.198684] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
[ 3103.198684] Stopper: multi_cpu_stop+0x0/0xf0 <- stop_machine_cpuslocked+0x132/0x170
[ 3103.198684] RIP: 0010:sched_cpu_dying.cold+0xc/0xc3
[ 3103.198684] Code: 00 e9 a1 c1 40 ff 48 c7 c7 48 91 94 82 e8 99 29 00 00 48 c7 c7 00 5e 53 83 e9 e3 10 50 ff 48 c7 c7 98 91 94 82 e8 4f ec ff ff <0f> 0b 44 8b ab 10 0a 00 00 8b 4b 04 48 c7 c6 cd 37 93 82 48 c7 c7
[ 3103.198684] RSP: 0000:ffffc900000dbda0 EFLAGS: 00010086
[ 3103.198684] RAX: 0000000000000000 RBX: ffff88813bcaa280 RCX: 0000000000000000
[ 3103.198684] RDX: 0000000000000003 RSI: ffffffff82953971 RDI: 00000000ffffffff
[ 3103.198684] RBP: 0000000000000082 R08: 00000000000021ed R09: ffffc900000dbd38
[ 3103.198684] R10: 0000000000000001 R11: ffffffffffffffff R12: 0000000000000060
[ 3103.198684] R13: ffffffff810a9040 R14: ffffffff82c555c0 R15: 0000000000000000
[ 3103.198684] FS:  0000000000000000(0000) GS:ffff88813bc80000(0000) knlGS:0000000000000000
[ 3103.198684] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3103.198684] CR2: 00007f85acd18010 CR3: 0000000102578000 CR4: 0000000000350ee0
[ 3103.198684] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 3103.198684] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 3103.198684] Call Trace:
[ 3103.198684]  <TASK>
[ 3103.198684]  ? sched_cpu_wait_empty+0x60/0x60
[ 3103.198684]  cpuhp_invoke_callback+0x3a4/0x5f0
[ 3103.198684]  take_cpu_down+0x71/0xd0
[ 3103.198684]  multi_cpu_stop+0x5c/0xf0
[ 3103.198684]  ? stop_machine_yield+0x10/0x10
[ 3103.198684]  cpu_stopper_thread+0x82/0x130
[ 3103.198684]  smpboot_thread_fn+0x1bb/0x2b0
[ 3103.198684]  ? sort_range+0x20/0x20
[ 3103.198684]  kthread+0xfe/0x120
[ 3103.198684]  ? kthread_complete_and_exit+0x20/0x20
[ 3103.198684]  ret_from_fork+0x1f/0x30
[ 3103.198684]  </TASK>
[ 3103.198684] Kernel panic - not syncing: panic_on_warn set ...
[ 3103.198684] CPU: 1 PID: 23 Comm: migration/1 Tainted: G           O      5.19.0-rc8-rt8 #1
[ 3103.198684] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
[ 3103.198684] Stopper: multi_cpu_stop+0x0/0xf0 <- stop_machine_cpuslocked+0x132/0x170
[ 3103.198684] Call Trace:
[ 3103.198684]  <TASK>
[ 3103.198684]  dump_stack_lvl+0x34/0x48
[ 3103.198684]  panic+0xf8/0x299
[ 3103.198684]  ? sched_cpu_dying.cold+0xc/0xc3
[ 3103.198684]  __warn.cold+0x43/0xba
[ 3103.198684]  ? sched_cpu_dying.cold+0xc/0xc3
[ 3103.198684]  report_bug+0x9d/0xc0
[ 3103.198684]  handle_bug+0x3c/0x70
[ 3103.198684]  exc_invalid_op+0x14/0x70
[ 3103.198684]  asm_exc_invalid_op+0x16/0x20
[ 3103.198684] RIP: 0010:sched_cpu_dying.cold+0xc/0xc3

-- 
BRs
Schspa Shi

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ