[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240910140739.GI4723@noisy.programming.kicks-ass.net>
Date: Tue, 10 Sep 2024 16:07:39 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: Sven Schnelle <svens@...ux.ibm.com>
Cc: mingo@...hat.com, juri.lelli@...hat.com, vincent.guittot@...aro.org,
dietmar.eggemann@....com, rostedt@...dmis.org, bsegall@...gle.com,
mgorman@...e.de, vschneid@...hat.com, linux-kernel@...r.kernel.org,
kprateek.nayak@....com, wuyun.abel@...edance.com,
youssefesmat@...omium.org, tglx@...utronix.de, efault@....de
Subject: Re: [PATCH 00/24] Complete EEVDF
On Tue, Sep 10, 2024 at 02:21:05PM +0200, Sven Schnelle wrote:
> Sven Schnelle <svens@...ux.ibm.com> writes:
>
> > Peter Zijlstra <peterz@...radead.org> writes:
> >
> >> Hi all,
> >>
> >> So after much delay this is hopefully the final version of the EEVDF patches.
> >> They've been sitting in my git tree for ever it seems, and people have been
> >> testing it and sending fixes.
> >>
> >> I've spend the last two days testing and fixing cfs-bandwidth, and as far
> >> as I know that was the very last issue holding it back.
> >>
> >> These patches apply on top of queue.git sched/dl-server, which I plan on merging
> >> in tip/sched/core once -rc1 drops.
> >>
> >> I'm hoping to then merge all this (+- the DVFS clock patch) right before -rc2.
> >>
> >>
> >> Aside from a ton of bug fixes -- thanks all! -- new in this version is:
> >>
> >> - split up the huge delay-dequeue patch
> >> - tested/fixed cfs-bandwidth
> >> - PLACE_REL_DEADLINE -- preserve the relative deadline when migrating
> >> - SCHED_BATCH is equivalent to RESPECT_SLICE
> >> - propagate min_slice up cgroups
> >> - CLOCK_THREAD_DVFS_ID
> >
> > I'm seeing crashes/warnings like the following on s390 with linux-next 20240909:
> >
> > Sometimes the system doesn't manage to print a oops, this one is the best i got:
> >
> > [..]
> > This happens when running the strace test suite. The system normaly has
> > 128 CPUs. With this configuration the crash doesn't happen, but when
> > disabling all but four CPUs and running 'make check -j16' in the strace
> > test suite the crash is almost always reproducable.
I noted: Comm: prctl-sched-cor, which is testing core scheduling, right?
Only today I;ve merged a fix for that:
c662e2b1e8cf ("sched: Fix sched_delayed vs sched_core")
Could you double check if merging tip/sched/core into your next tree
helps anything at all?
Powered by blists - more mailing lists