lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <cd10d32b514d2792659fe03ad1235771982a6e2f.camel@mediatek.com>
Date: Wed, 23 Jul 2025 22:22:37 +0000
From: Pierce Wen (溫彥翔) <Pierce.Wen@...iatek.com>
To: "juri.lelli@...hat.com" <juri.lelli@...hat.com>,
	Kuyo Chang (張建文) <Kuyo.Chang@...iatek.com>
CC: "bsegall@...gle.com" <bsegall@...gle.com>, "vschneid@...hat.com"
	<vschneid@...hat.com>, "dietmar.eggemann@....com" <dietmar.eggemann@....com>,
	"peterz@...radead.org" <peterz@...radead.org>, "rostedt@...dmis.org"
	<rostedt@...dmis.org>, "mingo@...hat.com" <mingo@...hat.com>,
	"vincent.guittot@...aro.org" <vincent.guittot@...aro.org>, "mgorman@...e.de"
	<mgorman@...e.de>, "jstultz@...gle.com" <jstultz@...gle.com>,
	"matthias.bgg@...il.com" <matthias.bgg@...il.com>,
	"linux-arm-kernel@...ts.infradead.org"
	<linux-arm-kernel@...ts.infradead.org>, "linux-kernel@...r.kernel.org"
	<linux-kernel@...r.kernel.org>, "linux-mediatek@...ts.infradead.org"
	<linux-mediatek@...ts.infradead.org>, AngeloGioacchino Del Regno
	<angelogioacchino.delregno@...labora.com>
Subject: Re: [RFC PATCH 1/1] sched/deadline: Fix RT task potential starvation
 when expiry time passed

On Sat, 2025-06-21 at 10:55 +0800, Kuyo Chang wrote:
> On Fri, 2025-06-20 at 17:22 +0200, Juri Lelli wrote:
> > 
> > External email : Please do not click links or open attachments
> > until
> > you have verified the sender or the content.
> > 
> > 
> > On 20/06/25 11:00, Kuyo Chang wrote:
> > 
> > ...
> > 
> > > 
> > 
> > Thanks for the additional explanation.
> > 
> > The way I understand it now is the following (of course please
> > correct
> > me if I am still not getting it :)
> > 
> > - a dl_server is actively servicing NORMAL tasks, but suffers lot
> > of
> > IRQ
> >   load and cannot make much progress
> > - it does anyway make progress, but it reaches
> > update_curr_dl_se@...ottle
> >   only when its current deadline is past rq_clock
> > - dl_runtime_exceeded() branch is entered, but start_dl_timer()
> > fails
> > as
> >   the computed act is still in the past
> > - enqueue_dl_entity(REPLENISH) call replenish_dl_entity() which
> > tries
> > to
> >   add runtime and advance the deadline, but time moved on so far
> > that
> >   deadline is still behind rq_clock() and so "DL replenish ..." is
> >   printed
> > - replenish_dl_new_period() updates runtime and deadline from
> > current
> >   clock and the dl-server is put back to run (so it continues to
> > run
> >   over/starve FIFO tasks)
> > 
> 
> Yes, "DL replenish ..." is the critical clue for identifying the root
> cause of this issue.
> 
> > It looks like your proposed fix might work in this particular
> > corner
> > case, but I am not 100% comfortable with not trying to replenish
> > properly (catch up with runtime) at all. I wonder if we might then
> > start
> > missing some other corner case. Maybe we could try to catch this
> > particular corner case before even attempting to start the
> > dl_timer,
> > since we know it will fail, and do something at that point?
> > 
> 
> You can consider the patch more as an error-proofing mechanism, and
> so
> far, it has been working well on our platform.
> However, it might be better to catch this particular corner case in
> advance to prevent the issue.
> > Thanks,
> > Juri
> > 
> 

Hi all,

I wanted to follow up on the discussion regarding the potential RT task
starvation issue and check if there have been any further updates or
feedback.

To recap and provide some additional context:

1. As discussed in the thread (see
https://lore.kernel.org/all/CANDhNCqYCpdhYS9afdKeY34Bmw8MXyqKWCSTxOZNLTjYrUaVXg@mail.gmail.com/
), it has been demonstrated that the use of a scaled timer can indeed
induce RT starvation under certain conditions.

2. Furthermore, since the delta_exec time calculation relies on the
clock_task member of struct rq, which is affected by IRQ time on the
runqueue, there is a risk that if IRQ time becomes excessively long in
some corner cases, it could also lead to RT starvation.

3. Based on these observations, we strongly recommend adopting a
recovery patch to address these critical scenarios and prevent RT task
starvation, especially in cases where the current logic may not be
sufficient.

Best regards,  
Pierce.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ