linux-kernel - Re: [PATCH 0/9] sched: Prepare for sched

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJWu+opPO18zgQZPDqPkALXqU4Tn=ohPPDM=jmdpL=z1J=PJhA@mail.gmail.com>
Date: Thu, 22 Aug 2024 09:15:36 -0400
From: Joel Fernandes <joelaf@...gle.com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: mingo@...nel.org, tj@...nel.org, void@...ifault.com, juri.lelli@...hat.com, 
	vincent.guittot@...aro.org, dietmar.eggemann@....com, rostedt@...dmis.org, 
	bsegall@...gle.com, mgorman@...e.de, vschneid@...hat.com, 
	linux-kernel@...r.kernel.org, 
	"Joel Fernandes (Google)" <joel@...lfernandes.org>
Subject: Re: [PATCH 0/9] sched: Prepare for sched_ext

On Thu, Aug 22, 2024 at 8:58 AM Joel Fernandes <joelaf@...gle.com> wrote:
>
> On Wed, Aug 21, 2024 at 5:41 PM Joel Fernandes <joelaf@...gle.com> wrote:
> >
> > On Tue, Aug 13, 2024 at 6:50 PM Peter Zijlstra <peterz@...radead.org> wrote:
> > >
> > > Hi,
> > >
> > > These patches apply on top of the EEVDF series (queue/sched/core), which
> > > re-arranges the fair pick_task() functions to make them state invariant such
> > > that they can easily be restarted upon picking (and dequeueing) a delayed task.
> > >
> > > This same is required to push (the final) put_prev_task() beyond pick_task(),
> > > like we do for sched_core already.
> > >
> > > This in turn is done to prepare for sched_ext, which wants a final callback to
> > > be in possesion of the next task, such that it can tell if the context switch
> > > will leave the sched_class.
> > >
> > > As such, this all re-arranges the current order of:
> > >
> > >   put_prev_task(rq, prev);
> > >   next = pick_next_task(rq); /* implies set_next_task(.first=true); */
> > >
> > > to sometihng like:
> > >
> > >   next = pick_task(rq)
> > >   if (next != prev) {
> > >     put_prev_task(rq, prev, next);
> > >     set_next_task(rq, next, true);
> > >   }
> > >
> > > The patches do a fair bit of cleaning up. Notably a bunch of sched_core stuff
> > > -- Joel, could you please test this stuff, because the self-tests we have are
> > > hardly adequate.
> > >
> > > The EEVDF stuff was supposed to be merged already, but since Valentin seems to
> > > be doing a read-through, I figured I'd give him a little extra time. A complete
> > > set can be found at:
> > >
> > >   git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git sched/prep
> > >
> >
> > So I booted queue.git sched/core branch on a newish Chromebook (after
> > applying 700 patches for making it boot and spending 2 days on it
> > since we boot old kernels -- I wasn't joking when I said I would carve
> > some time up for you this week :P).
> >
> > With sched/core , it boots fine with core scheduling disabled, but
> > when core scheduling is enabled I am getting hard hangs and
> > occasionally get to the login screen if I'm lucky. So there's
> > definitely something wonky in sched/core branch and core sched.
> > I could not get a trace or logs yet, since once it hangs I have to
> > hard power off.
> >
> > I could bissect it tomorrow though since it looks like a manageable
> > set of patches on 6.11-rc1.  Or did you already figure out the issue?
> >
> > I am based on:
> > commit aef6987d89544d63a47753cf3741cabff0b5574c
> > Author: Peter Zijlstra <peterz@...radead.org>
> > Date:   Thu Jun 20 13:16:49 2024 +0200
> >
> >     sched/eevdf: Propagate min_slice up the cgroup hierarchy
>
> One of these 29 in sched/core broke core-scheduling, causes hangs.
> Haven't narrowed it down to which. Not much time today. Will probably
> try to collect some logs.
> https://hastebin.com/share/uqubojiqiy.yaml
>
> Also I realized I should apply the 9 in this set too. But very least
> it appears the above 29 broke core-sched vs bissection, probably the
> delayed-dequeue or task-pick rework?
>
> I will try the sched/prep branch now, which has the 9 in this set too..

Same issue with sched/prep which has these 9. Looks like it hung on rq
lock Picked up a dmesg this time:

[   13.856485] Hardware name: Google XXXXXX
[   13.856487] RIP: 0010:queued_spin_lock_slowpath+0x140/0x260

[   13.856496] RSP: 0018:ffff91d90253b9b8 EFLAGS: 00000046
[   13.856498] RAX: 0000000000000000 RBX: ffff8b4f3792e880 RCX:
ffff8b4f37b2f6c0
[   13.856499] RDX: ffff8b4f37b80000 RSI: fffffffffffffff8 RDI:
ffff8b4f3792e880
[   13.856500] RBP: ffff91d90253b9d8 R08: 0000000000000002 R09:
ffff8b4dc2bbc3c0
[   13.856501] R10: 0000000000000005 R11: 0000000000000005 R12:
ffff8b4f37b00000
[   13.856502] R13: 0000000000000000 R14: 0000000000340000 R15:
0000000000340000
[   13.856504] FS:  0000788e7f133c40(0000) GS:ffff8b4f37b00000(0000)
knlGS:0000000000000000
[   13.856505] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   13.856507] CR2: 00005a4901a073f8 CR3: 00000001131b2000 CR4:
0000000000750ef0
[   13.856508] PKRU: 55555554
[   13.856508] Call Trace:
[   13.856512]  <NMI>
[   13.856516]  ? nmi_cpu_backtrace+0x101/0x130
[   13.856521]  ? nmi_cpu_backtrace_handler+0x11/0x20
[   13.856524]  ? nmi_handle+0x59/0x160
[   13.856526]  ? queued_spin_lock_slowpath+0x140/0x260
[   13.856528]  ? default_do_nmi+0x46/0x110
[   13.856530]  ? exc_nmi+0xb1/0x110
[   13.856532]  ? end_repeat_nmi+0xf/0x53
[   13.856534]  ? queued_spin_lock_slowpath+0x140/0x260
[   13.856535]  ? queued_spin_lock_slowpath+0x140/0x260
[   13.856537]  ? queued_spin_lock_slowpath+0x140/0x260
[   13.856538]  </NMI>
[   13.856539]  <TASK>
[   13.856540]  raw_spin_rq_lock_nested+0x4c/0x80
[   13.856543]  sched_balance_rq+0x15ff/0x1860
[   13.856548]  sched_balance_newidle+0x193/0x390
[   13.856550]  balance_fair+0x25/0x40
[   13.856553]  __schedule+0x899/0x1110
[   13.856555]  ? timerqueue_add+0x86/0xa0
[   13.856558]  ? hrtimer_start_range_ns+0x225/0x2f0
[   13.856560]  schedule+0x5e/0x90
[   13.856562]  schedule_hrtimeout_range_clock+0xc2/0x130
[   13.856564]  ? __pfx_hrtimer_wakeup+0x10/0x10
[   13.856566]  do_epoll_wait+0x627/0x6b0
[   13.856571]  ? __pfx_ep_autoremove_wake_function+0x10/0x10
[   13.856574]  __x64_sys_epoll_wait+0x50/0x80
[   13.856577]  do_syscall_64+0x6a/0xe0
[   13.856580]  ? clear_bhb_loop+0x45/0xa0

Let me know if you have cluses on which but I will dig further and see
which patch(es) cause it, thanks,

 - Joel