[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YjIKQBIbJR/kRR+N@linutronix.de>
Date: Wed, 16 Mar 2022 17:03:12 +0100
From: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
To: Steven Rostedt <rostedt@...dmis.org>
Cc: Peter Zijlstra <peterz@...radead.org>,
LKML <linux-kernel@...r.kernel.org>,
Thomas Gleixner <tglx@...utronix.de>
Subject: Re: sched_core_balance() releasing interrupts with pi_lock held
On 2022-03-15 17:46:06 [-0400], Steven Rostedt wrote:
> On Tue, 8 Mar 2022 16:14:55 -0500
> Steven Rostedt <rostedt@...dmis.org> wrote:
>
> > Hi Peter,
>
> Have you had time to look into this?
yes, I can confirm that it is a problem ;) So I did this:
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 33ce5cd113d8..56c286aaa01f 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5950,7 +5950,6 @@ static bool try_steal_cookie(int this, int that)
unsigned long cookie;
bool success = false;
- local_irq_disable();
double_rq_lock(dst, src);
cookie = dst->core->core_cookie;
@@ -5989,7 +5988,6 @@ static bool try_steal_cookie(int this, int that)
unlock:
double_rq_unlock(dst, src);
- local_irq_enable();
return success;
}
@@ -6019,7 +6017,7 @@ static void sched_core_balance(struct rq *rq)
preempt_disable();
rcu_read_lock();
- raw_spin_rq_unlock_irq(rq);
+ raw_spin_rq_unlock(rq);
for_each_domain(cpu, sd) {
if (need_resched())
break;
@@ -6027,7 +6025,7 @@ static void sched_core_balance(struct rq *rq)
if (steal_cookie_task(cpu, sd))
break;
}
- raw_spin_rq_lock_irq(rq);
+ raw_spin_rq_lock(rq);
rcu_read_unlock();
preempt_enable();
}
which looked right but RT still fall apart:
| =====================================
| WARNING: bad unlock balance detected!
| 5.17.0-rc8-rt14+ #10 Not tainted
| -------------------------------------
| gcc/2608 is trying to release lock ((lock)) at:
| [<ffffffff8135a150>] folio_add_lru+0x60/0x90
| but there are no more locks to release!
|
| other info that might help us debug this:
| 4 locks held by gcc/2608:
| #0: ffff88826ea6efe0 (&sb->s_type->i_mutex_key#12){++++}-{3:3}, at: xfs_ilock+0x90/0xd0
| #1: ffff88826ea6f1a0 (mapping.invalidate_lock#2){++++}-{3:3}, at: page_cache_ra_unbounded+0x8e/0x1f0
| #2: ffff88852aba8d18 ((lock)#3){+.+.}-{2:2}, at: folio_add_lru+0x2a/0x90
| #3: ffffffff829a5140 (rcu_read_lock){....}-{1:2}, at: rt_spin_lock+0x5/0xe0
|
| stack backtrace:
| CPU: 18 PID: 2608 Comm: gcc Not tainted 5.17.0-rc8-rt14+ #10
| Hardware name: Intel Corporation S2600CP/S2600CP, BIOS SE5C600.86B.02.03.0003.041920141333 04/19/2014
| Call Trace:
| <TASK>
| dump_stack_lvl+0x4a/0x62
| lock_release.cold+0x32/0x37
| rt_spin_unlock+0x17/0x80
| folio_add_lru+0x60/0x90
| filemap_add_folio+0x53/0xa0
| page_cache_ra_unbounded+0x1c3/0x1f0
| filemap_get_pages+0xe3/0x5b0
| filemap_read+0xc5/0x2f0
| xfs_file_buffered_read+0x6b/0x1a0
| xfs_file_read_iter+0x6a/0xd0
| new_sync_read+0x11b/0x1a0
| vfs_read+0x134/0x1d0
| ksys_read+0x68/0xf0
| do_syscall_64+0x59/0x80
| entry_SYSCALL_64_after_hwframe+0x44/0xae
| RIP: 0033:0x7f3feab7310e
It is always the local-lock that is breaks apart. Based on "locks held"
and the lock it tries to release it looks like the lock was acquired on
CPU-A and released on CPU-B.
> Thanks,
>
> -- Steve
Sebastian
Powered by blists - more mailing lists