[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJWu+opPO18zgQZPDqPkALXqU4Tn=ohPPDM=jmdpL=z1J=PJhA@mail.gmail.com>
Date: Thu, 22 Aug 2024 09:15:36 -0400
From: Joel Fernandes <joelaf@...gle.com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: mingo@...nel.org, tj@...nel.org, void@...ifault.com, juri.lelli@...hat.com,
vincent.guittot@...aro.org, dietmar.eggemann@....com, rostedt@...dmis.org,
bsegall@...gle.com, mgorman@...e.de, vschneid@...hat.com,
linux-kernel@...r.kernel.org,
"Joel Fernandes (Google)" <joel@...lfernandes.org>
Subject: Re: [PATCH 0/9] sched: Prepare for sched_ext
On Thu, Aug 22, 2024 at 8:58 AM Joel Fernandes <joelaf@...gle.com> wrote:
>
> On Wed, Aug 21, 2024 at 5:41 PM Joel Fernandes <joelaf@...gle.com> wrote:
> >
> > On Tue, Aug 13, 2024 at 6:50 PM Peter Zijlstra <peterz@...radead.org> wrote:
> > >
> > > Hi,
> > >
> > > These patches apply on top of the EEVDF series (queue/sched/core), which
> > > re-arranges the fair pick_task() functions to make them state invariant such
> > > that they can easily be restarted upon picking (and dequeueing) a delayed task.
> > >
> > > This same is required to push (the final) put_prev_task() beyond pick_task(),
> > > like we do for sched_core already.
> > >
> > > This in turn is done to prepare for sched_ext, which wants a final callback to
> > > be in possesion of the next task, such that it can tell if the context switch
> > > will leave the sched_class.
> > >
> > > As such, this all re-arranges the current order of:
> > >
> > > put_prev_task(rq, prev);
> > > next = pick_next_task(rq); /* implies set_next_task(.first=true); */
> > >
> > > to sometihng like:
> > >
> > > next = pick_task(rq)
> > > if (next != prev) {
> > > put_prev_task(rq, prev, next);
> > > set_next_task(rq, next, true);
> > > }
> > >
> > > The patches do a fair bit of cleaning up. Notably a bunch of sched_core stuff
> > > -- Joel, could you please test this stuff, because the self-tests we have are
> > > hardly adequate.
> > >
> > > The EEVDF stuff was supposed to be merged already, but since Valentin seems to
> > > be doing a read-through, I figured I'd give him a little extra time. A complete
> > > set can be found at:
> > >
> > > git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git sched/prep
> > >
> >
> > So I booted queue.git sched/core branch on a newish Chromebook (after
> > applying 700 patches for making it boot and spending 2 days on it
> > since we boot old kernels -- I wasn't joking when I said I would carve
> > some time up for you this week :P).
> >
> > With sched/core , it boots fine with core scheduling disabled, but
> > when core scheduling is enabled I am getting hard hangs and
> > occasionally get to the login screen if I'm lucky. So there's
> > definitely something wonky in sched/core branch and core sched.
> > I could not get a trace or logs yet, since once it hangs I have to
> > hard power off.
> >
> > I could bissect it tomorrow though since it looks like a manageable
> > set of patches on 6.11-rc1. Or did you already figure out the issue?
> >
> > I am based on:
> > commit aef6987d89544d63a47753cf3741cabff0b5574c
> > Author: Peter Zijlstra <peterz@...radead.org>
> > Date: Thu Jun 20 13:16:49 2024 +0200
> >
> > sched/eevdf: Propagate min_slice up the cgroup hierarchy
>
> One of these 29 in sched/core broke core-scheduling, causes hangs.
> Haven't narrowed it down to which. Not much time today. Will probably
> try to collect some logs.
> https://hastebin.com/share/uqubojiqiy.yaml
>
> Also I realized I should apply the 9 in this set too. But very least
> it appears the above 29 broke core-sched vs bissection, probably the
> delayed-dequeue or task-pick rework?
>
> I will try the sched/prep branch now, which has the 9 in this set too..
Same issue with sched/prep which has these 9. Looks like it hung on rq
lock Picked up a dmesg this time:
[ 13.856485] Hardware name: Google XXXXXX
[ 13.856487] RIP: 0010:queued_spin_lock_slowpath+0x140/0x260
[ 13.856496] RSP: 0018:ffff91d90253b9b8 EFLAGS: 00000046
[ 13.856498] RAX: 0000000000000000 RBX: ffff8b4f3792e880 RCX:
ffff8b4f37b2f6c0
[ 13.856499] RDX: ffff8b4f37b80000 RSI: fffffffffffffff8 RDI:
ffff8b4f3792e880
[ 13.856500] RBP: ffff91d90253b9d8 R08: 0000000000000002 R09:
ffff8b4dc2bbc3c0
[ 13.856501] R10: 0000000000000005 R11: 0000000000000005 R12:
ffff8b4f37b00000
[ 13.856502] R13: 0000000000000000 R14: 0000000000340000 R15:
0000000000340000
[ 13.856504] FS: 0000788e7f133c40(0000) GS:ffff8b4f37b00000(0000)
knlGS:0000000000000000
[ 13.856505] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 13.856507] CR2: 00005a4901a073f8 CR3: 00000001131b2000 CR4:
0000000000750ef0
[ 13.856508] PKRU: 55555554
[ 13.856508] Call Trace:
[ 13.856512] <NMI>
[ 13.856516] ? nmi_cpu_backtrace+0x101/0x130
[ 13.856521] ? nmi_cpu_backtrace_handler+0x11/0x20
[ 13.856524] ? nmi_handle+0x59/0x160
[ 13.856526] ? queued_spin_lock_slowpath+0x140/0x260
[ 13.856528] ? default_do_nmi+0x46/0x110
[ 13.856530] ? exc_nmi+0xb1/0x110
[ 13.856532] ? end_repeat_nmi+0xf/0x53
[ 13.856534] ? queued_spin_lock_slowpath+0x140/0x260
[ 13.856535] ? queued_spin_lock_slowpath+0x140/0x260
[ 13.856537] ? queued_spin_lock_slowpath+0x140/0x260
[ 13.856538] </NMI>
[ 13.856539] <TASK>
[ 13.856540] raw_spin_rq_lock_nested+0x4c/0x80
[ 13.856543] sched_balance_rq+0x15ff/0x1860
[ 13.856548] sched_balance_newidle+0x193/0x390
[ 13.856550] balance_fair+0x25/0x40
[ 13.856553] __schedule+0x899/0x1110
[ 13.856555] ? timerqueue_add+0x86/0xa0
[ 13.856558] ? hrtimer_start_range_ns+0x225/0x2f0
[ 13.856560] schedule+0x5e/0x90
[ 13.856562] schedule_hrtimeout_range_clock+0xc2/0x130
[ 13.856564] ? __pfx_hrtimer_wakeup+0x10/0x10
[ 13.856566] do_epoll_wait+0x627/0x6b0
[ 13.856571] ? __pfx_ep_autoremove_wake_function+0x10/0x10
[ 13.856574] __x64_sys_epoll_wait+0x50/0x80
[ 13.856577] do_syscall_64+0x6a/0xe0
[ 13.856580] ? clear_bhb_loop+0x45/0xa0
Let me know if you have cluses on which but I will dig further and see
which patch(es) cause it, thanks,
- Joel
Powered by blists - more mailing lists