[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <Zy52VDrG48EgrbtS@slm.duckdns.org>
Date: Fri, 8 Nov 2024 10:36:36 -1000
From: Tejun Heo <tj@...nel.org>
To: Peter Zijlstra <peterz@...radead.org>
Cc: David Vernet <void@...ifault.com>, linux-kernel@...r.kernel.org,
sched-ext@...a.com, kernel-team@...a.com,
Ingo Molnar <mingo@...hat.com>
Subject: Re: [PATCH sched_ext/for-6.12-fixes] sched_ext: Call
__balance_callbacks() from __scx_task_iter_rq_unlock()
Hello, Peter.
Sorry about the delay.
On Fri, Nov 01, 2024 at 03:12:18PM +0100, Peter Zijlstra wrote:
> On Fri, Nov 01, 2024 at 12:36:34AM +0100, Peter Zijlstra wrote:
> > On Wed, Oct 30, 2024 at 11:41:39AM -1000, Tejun Heo wrote:
> >
> > > --- a/kernel/sched/ext.c
> > > +++ b/kernel/sched/ext.c
> > > @@ -1315,6 +1315,8 @@ static void scx_task_iter_start(struct s
> > > static void __scx_task_iter_rq_unlock(struct scx_task_iter *iter)
> > > {
> > > if (iter->locked) {
> > > + /* ->switched_from() may have scheduled balance callbacks */
> > > + __balance_callbacks(iter->rq);
> > > task_rq_unlock(iter->rq, iter->locked, &iter->rf);
> > > iter->locked = NULL;
> > > }
> >
> > I think you need to unpin/repin around it. The balance callbacks like to
> > drop rq->lock at times.
>
> Maybe something like so.. I'm not sure it's an improvement.
I actually easily reproduce the problem by making tasks switch between DL
and SCX. e.g. If I run `stress-ng --schedmix 32` while running any SCX
sched:
rq->balance_callback && rq->balance_callback != &balance_push_callback
WARNING: CPU: 5 PID: 2784 at kernel/sched/sched.h:1729 do_sched_yield+0x10a/0x130
...
Sched_ext: simple (enabling+all)
RIP: 0010:do_sched_yield+0x10a/0x130
Code: 84 66 e8 7e e8 07 a2 f0 00 48 83 c4 08 5b 41 5e 5d e9 0a 4f f1 00 cc c6 05 09 fb3
RSP: 0018:ffffc900030c3ef0 EFLAGS: 00010082
RAX: 0000000000000046 RBX: ffff8887fab70380 RCX: 0000000000000027
RDX: 0000000000000000 RSI: ffffffff811def79 RDI: ffff8887fab5b448
cb=0xffff8887fad5b040
RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000
R10: ffffffff811dee91 R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000000 R14: ffff8887fab40000 R15: 0000000000000018
FS: 00007fc8772d8000(0000) GS:ffff8887fab40000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00005607f90c3078 CR3: 000000012aa1c000 CR4: 0000000000350eb0
Call Trace:
<TASK>
__x64_sys_sched_yield+0xa/0x20
do_syscall_64+0x7b/0x140
entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7fc87a30ea8b
Code: 73 01 c3 48 8b 0d 85 72 0e 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 008
RSP: 002b:00007ffde081a928 EFLAGS: 00000202 ORIG_RAX: 0000000000000018
RAX: ffffffffffffffda RBX: 00007fc86ee1ab18 RCX: 00007fc87a30ea8b
RDX: 0000000000000001 RSI: 000000000000001f RDI: 000000000000001b
RBP: 00007ffde081ab90 R08: 000000000000001b R09: 00000000000003e8
R10: 00007ffde081a8f0 R11: 0000000000000202 R12: 0000000000005b81
R13: 00000000000061e8 R14: 0000000000000000 R15: 0000000000000003
</TASK>
and your patch makes the issue go away. Please feel free to add:
Tested-by: Tejun Heo <tj@...nel.org>
If you want me to turn it into a proper patch and apply it, plesae let me
know.
Thanks.
--
tejun
Powered by blists - more mailing lists