[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230911132535.GA24480@lorien.usersys.redhat.com>
Date: Mon, 11 Sep 2023 09:25:35 -0400
From: Phil Auld <pauld@...hat.com>
To: Hao Jia <jiahao.os@...edance.com>
Cc: mingo@...hat.com, peterz@...radead.org, mingo@...nel.org,
juri.lelli@...hat.com, vincent.guittot@...aro.org,
dietmar.eggemann@....com, rostedt@...dmis.org, bsegall@...gle.com,
mgorman@...e.de, bristot@...hat.com, vschneid@...hat.com,
linux-kernel@...r.kernel.org
Subject: Re: [External] Re: [PATCH 0/2] Fix nohz_full vs rt bandwidth
Hi Hao,
On Mon, Sep 11, 2023 at 11:39:02AM +0800 Hao Jia wrote:
> On 2023/9/8 Phil Auld wrote:
> > On Fri, Sep 08, 2023 at 10:57:26AM +0800 Hao Jia wrote:
> > > On 2023/9/7 Phil Auld wrote:
> > > > Hi Hao,
...
> > > >
> > > > Are you actually hitting this in the real world?
> > > >
> > > > We, for example, no longer enable RT_GROUP_SCHED so this is a non-issue
> > > > for our use cases. I'd recommend considering that. (Does it even
> > > > work with cgroup2?)
> > > >
> > >
> > > Yes, it has always been there. Regardless of whether RT_GROUP_SCHED is
> > > enabled or not, rt bandwidth is always enabled. If RT_GROUP_SCHED is not
> > > enabled, all rt tasks in the system are a group, and rt_runtime is 950000,
> > > and rt_period is 1000000.So rt bandwidth is always enabled by default.
> >
> > Sure, there is that. But I think Daniel is actively trying to remove it.
> >
>
> Thank you for your reply. Maybe I'm missing something. Can you give me some
> links to discussions about it?
>
Sure, try this one:
https://lore.kernel.org/lkml/cover.1693510979.git.bristot@kernel.org/
> > Also I'm not sure you answered my question. Are you actually hitting this
> > in the real world? I'd be tempted to think this is a mis-configuration or
> > mis-use of RT. Plus you can disable that throttling and use stalld to catch
> > cases where the rt task goes out of control.
> >
>
> > Are you actually hitting this in the real world?
>
> I tested on my machine using default settings (rt_runtime is 950000, and
> rt_period is 1000000.). The rt task is supposed to be throttled after
> running for 0.95 seconds, but due to the influence of NO_HZ_FULL, it may be
> throttled after running for about 1.4 seconds. This will only cause the
> rt_bandwidth throttle to be delayed, but no warning will be triggered.
Yes, you can hit this in testing. I'm asking if it's causing your real-world
applicaton issues or is this just a theoretical problem you can contrive a
test for? Are you actually hitting this when running your workload?
>From what you are showing (a test setup) I'm guessing no.
>
>
> > Plus you can disable that throttling and use stalld to catch cases where
> the rt task goes out of control.
>
> IIRC, if we disable rt_bandwidth. The rt task is always running, which may
> cause cfs task starvation and hung_task warnning. This may be the reason why
> rt_bandwidth is enabled by default (rt_runtime is 950000, and rt_period is
> 1000000).
That's what stalld is for. Some rt applications don't like giving up 5% of
the cpu time when they don't really need to.
Cheers,
Phil
--
Powered by blists - more mailing lists