[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CANDhNCo+G4_t8jYU-QNPz42uZsKdMgEmTnr8pYSKbgm26NJUCg@mail.gmail.com>
Date: Wed, 23 Jul 2025 15:42:35 -0700
From: John Stultz <jstultz@...gle.com>
To: Juri Lelli <juri.lelli@...hat.com>
Cc: LKML <linux-kernel@...r.kernel.org>, Joel Fernandes <joelagnelf@...dia.com>,
Qais Yousef <qyousef@...alina.io>, Ingo Molnar <mingo@...hat.com>,
Peter Zijlstra <peterz@...radead.org>, Vincent Guittot <vincent.guittot@...aro.org>,
Dietmar Eggemann <dietmar.eggemann@....com>, Valentin Schneider <vschneid@...hat.com>,
Steven Rostedt <rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>,
Zimuzo Ezeozue <zezeozue@...gle.com>, Mel Gorman <mgorman@...e.de>, Will Deacon <will@...nel.org>,
Waiman Long <longman@...hat.com>, Boqun Feng <boqun.feng@...il.com>,
"Paul E. McKenney" <paulmck@...nel.org>, Metin Kaya <Metin.Kaya@....com>,
Xuewen Yan <xuewen.yan94@...il.com>, K Prateek Nayak <kprateek.nayak@....com>,
Thomas Gleixner <tglx@...utronix.de>, Daniel Lezcano <daniel.lezcano@...aro.org>,
Suleiman Souhlal <suleiman@...gle.com>, kuyo chang <kuyo.chang@...iatek.com>, hupu <hupu.gm@...il.com>,
kernel-team@...roid.com
Subject: Re: [RFC][PATCH v20 0/6] Donor Migration for Proxy Execution (v20)
On Wed, Jul 23, 2025 at 7:44 AM Juri Lelli <juri.lelli@...hat.com> wrote:
> On 22/07/25 07:05, John Stultz wrote:
> > Issues still to address with the full series:
> > * There’s a new quirk from recent changes for dl_server that
> > is causing the ksched_football test in the full series to hang
> > at boot. I’ve bisected and reverted the change for now, but I
> > need to better understand what’s going wrong.
>
> After our quick chat on IRC, I remembered that there were additional two
> fixes for dl-server posted, but still not on tip.
>
> https://lore.kernel.org/lkml/20250615131129.954975-1-kuyo.chang@mediatek.com/
> https://lore.kernel.org/lkml/20250627035420.37712-1-yangyicong@huawei.com/
>
> So I went ahead and pushed them to
>
> git@...hub.com:jlelli/linux.git upstream/fix-dlserver
>
> Could you please check if any (or both together) of the two topmost
> changes do any good to the issue you are seeing?
Thanks for sharing these! Unfortunately they don't seem to help. :/
I'm still digging down into the behavior. I'm not 100% sure the
problem isn't just my test logic starving itself (after creating
NR_CPU RT spinners, its not surprising creating new threads might be
tough if the non-RT kthreadd can't get scheduled), but I don't quite
see how the dl_server patch cccb45d7c429 ("sched/deadline: Less
agressive dl_server handling") would be the cause of the dramatic
behavioral change - esp as this test was also functional prior to the
dl_server logic landing. Also it's odd just re-adding the
dl_server_stop() call removed from dequeue_entities() seems to make it
work again. So I clearly need to dig more to understand the behavior.
Thanks again for your suggestions! I'm going to dig further and let
folks know when I figure this detail out
thanks
-john
Powered by blists - more mailing lists