lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230911132535.GA24480@lorien.usersys.redhat.com>
Date:   Mon, 11 Sep 2023 09:25:35 -0400
From:   Phil Auld <pauld@...hat.com>
To:     Hao Jia <jiahao.os@...edance.com>
Cc:     mingo@...hat.com, peterz@...radead.org, mingo@...nel.org,
        juri.lelli@...hat.com, vincent.guittot@...aro.org,
        dietmar.eggemann@....com, rostedt@...dmis.org, bsegall@...gle.com,
        mgorman@...e.de, bristot@...hat.com, vschneid@...hat.com,
        linux-kernel@...r.kernel.org
Subject: Re: [External] Re: [PATCH 0/2] Fix nohz_full vs rt bandwidth


Hi Hao,

On Mon, Sep 11, 2023 at 11:39:02AM +0800 Hao Jia wrote:
> On 2023/9/8 Phil Auld wrote:
> > On Fri, Sep 08, 2023 at 10:57:26AM +0800 Hao Jia wrote:
> > > On 2023/9/7 Phil Auld wrote:
> > > > Hi Hao,

...

> > > > 
> > > > Are you actually hitting this in the real world?
> > > > 
> > > > We, for example, no longer enable RT_GROUP_SCHED so this is a non-issue
> > > > for our use cases.  I'd recommend considering that. (Does it even
> > > > work with cgroup2?)
> > > > 
> > > 
> > > Yes, it has always been there. Regardless of whether RT_GROUP_SCHED is
> > > enabled or not, rt bandwidth is always enabled. If RT_GROUP_SCHED is not
> > > enabled, all rt tasks in the system are a group, and rt_runtime is 950000,
> > > and rt_period is 1000000.So rt bandwidth is always enabled by default.
> > 
> > Sure, there is that. But I think Daniel is actively trying to remove it.
> > 
> 
> Thank you for your reply. Maybe I'm missing something. Can you give me some
> links to discussions about it?
>

Sure, try this one:
      https://lore.kernel.org/lkml/cover.1693510979.git.bristot@kernel.org/


> > Also I'm not sure you answered my question. Are you actually hitting this
> > in the real world?  I'd be tempted to think this is a mis-configuration or
> > mis-use of RT.  Plus you can disable that throttling and use stalld to catch
> > cases where the rt task goes out of control.
> > 
> 
> > Are you actually hitting this in the real world?
> 
> I tested on my machine using default settings (rt_runtime is 950000, and
> rt_period is 1000000.). The rt task is supposed to be throttled after
> running for 0.95 seconds, but due to the influence of NO_HZ_FULL, it may be
> throttled after running for about 1.4 seconds. This will only cause the
> rt_bandwidth throttle to be delayed, but no warning will be triggered.

Yes, you can hit this in testing.  I'm asking if it's causing your real-world
applicaton issues or is this just a theoretical problem you can contrive a
test for?  Are you actually hitting this when running your workload?
>From what you are showing (a test setup) I'm guessing no.

> 
> 
> > Plus you can disable that throttling and use stalld to catch cases where
> the rt task goes out of control.
> 
> IIRC, if we disable rt_bandwidth. The rt task is always running, which may
> cause cfs task starvation and hung_task warnning. This may be the reason why
> rt_bandwidth is enabled by default (rt_runtime is 950000, and rt_period is
> 1000000).

That's what stalld is for.  Some rt applications don't like giving up 5% of
the cpu time when they don't really need to.


Cheers,
Phil


-- 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ