linux-kernel - Re: native_smp_send_reschedule() splat from rt_mutex

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20170920162447.5j5kcs3t6kzbilql@linutronix.de>
Date:   Wed, 20 Sep 2017 18:24:47 +0200
From:   Sebastian Andrzej Siewior <bigeasy@...utronix.de>
To:     "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
Cc:     peterz@...radead.org, mingo@...hat.com,
        linux-kernel@...r.kernel.org, tglx@...utronix.de
Subject: Re: native_smp_send_reschedule() splat from rt_mutex_lock()?

On 2017-09-18 09:51:10 [-0700], Paul E. McKenney wrote:
> Hello!
Hi,

> [11072.586518] sched: Unexpected reschedule of offline CPU#6!
> [11072.587578] ------------[ cut here ]------------
> [11072.588563] WARNING: CPU: 0 PID: 59 at /home/paulmck/public_git/linux-rcu/arch/x86/kernel/smp.c:128 native_smp_send_reschedule+0x37/0x40
> [11072.591543] Modules linked in:
> [11072.591543] CPU: 0 PID: 59 Comm: rcub/10 Not tainted 4.14.0-rc1+ #1
> [11072.610596] Call Trace:
> [11072.611531]  resched_curr+0x61/0xd0
> [11072.611531]  switched_to_rt+0x8f/0xa0
> [11072.612647]  rt_mutex_setprio+0x25c/0x410
> [11072.613591]  task_blocks_on_rt_mutex+0x1b3/0x1f0
> [11072.614601]  rt_mutex_slowlock+0xa9/0x1e0
> [11072.615567]  rt_mutex_lock+0x29/0x30
> [11072.615567]  rcu_boost_kthread+0x127/0x3c0

> In theory, I could work around this by excluding CPU-hotplug operations
> while doing RCU priority boosting, but in practice I am very much hoping
> that there is a more reasonable solution out there...

so in CPUHP_TEARDOWN_CPU / take_cpu_down() / __cpu_disable() the CPU is
marked as offline and interrupt handling is disabled. Later in
CPUHP_AP_SCHED_STARTING / sched_cpu_dying() all tasks are migrated away.

Did this hit a random task during a CPU-hotplug operation which was not
yet migrated away from the dying CPU? In theory a futex_unlock() of a RT
task could also produce such a backtrace.

> 							Thanx, Paul
> 

Sebastian