linux-kernel - Re: [External] Re: [PATCH 0/2] Fix nohz

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <9581428d-13ba-114c-7594-c2357a8ccf11@bytedance.com>
Date:   Tue, 12 Sep 2023 10:35:46 +0800
From:   Hao Jia <jiahao.os@...edance.com>
To:     Phil Auld <pauld@...hat.com>
Cc:     mingo@...hat.com, peterz@...radead.org, mingo@...nel.org,
        juri.lelli@...hat.com, vincent.guittot@...aro.org,
        dietmar.eggemann@....com, rostedt@...dmis.org, bsegall@...gle.com,
        mgorman@...e.de, bristot@...hat.com, vschneid@...hat.com,
        linux-kernel@...r.kernel.org
Subject: Re: [External] Re: [PATCH 0/2] Fix nohz_full vs rt bandwidth



On 2023/9/11 Phil Auld wrote:
> 
> Hi Hao,
> 
> On Mon, Sep 11, 2023 at 11:39:02AM +0800 Hao Jia wrote:
>> On 2023/9/8 Phil Auld wrote:
>>> On Fri, Sep 08, 2023 at 10:57:26AM +0800 Hao Jia wrote:
>>>> On 2023/9/7 Phil Auld wrote:
>>>>> Hi Hao,
> 
> ...
> 
>>>>>
>>>>> Are you actually hitting this in the real world?
>>>>>
>>>>> We, for example, no longer enable RT_GROUP_SCHED so this is a non-issue
>>>>> for our use cases.  I'd recommend considering that. (Does it even
>>>>> work with cgroup2?)
>>>>>
>>>>
>>>> Yes, it has always been there. Regardless of whether RT_GROUP_SCHED is
>>>> enabled or not, rt bandwidth is always enabled. If RT_GROUP_SCHED is not
>>>> enabled, all rt tasks in the system are a group, and rt_runtime is 950000,
>>>> and rt_period is 1000000.So rt bandwidth is always enabled by default.
>>>
>>> Sure, there is that. But I think Daniel is actively trying to remove it.
>>>
>>
>> Thank you for your reply. Maybe I'm missing something. Can you give me some
>> links to discussions about it?
>>
> 
> Sure, try this one:
>        https://lore.kernel.org/lkml/cover.1693510979.git.bristot@kernel.org/
> 

Thanks for the information you shared.

> 
>>> Also I'm not sure you answered my question. Are you actually hitting this
>>> in the real world?  I'd be tempted to think this is a mis-configuration or
>>> mis-use of RT.  Plus you can disable that throttling and use stalld to catch
>>> cases where the rt task goes out of control.
>>>
>>
>>> Are you actually hitting this in the real world?
>>
>> I tested on my machine using default settings (rt_runtime is 950000, and
>> rt_period is 1000000.). The rt task is supposed to be throttled after
>> running for 0.95 seconds, but due to the influence of NO_HZ_FULL, it may be
>> throttled after running for about 1.4 seconds. This will only cause the
>> rt_bandwidth throttle to be delayed, but no warning will be triggered.
> 
> Yes, you can hit this in testing.  I'm asking if it's causing your real-world
> applicaton issues or is this just a theoretical problem you can contrive a
> test for?  Are you actually hitting this when running your workload?
>  From what you are showing (a test setup) I'm guessing no.
> 

Yes, I don't see this issue in our production environment. The number of 
rt tasks is very small in our production environment, and their running 
time is very short, so the rt_bandwidth throttle will not be triggered 
unless the rt task goes out of control.

Thanks,
Hao

>>
>>
>>> Plus you can disable that throttling and use stalld to catch cases where
>> the rt task goes out of control.
>>
>> IIRC, if we disable rt_bandwidth. The rt task is always running, which may
>> cause cfs task starvation and hung_task warnning. This may be the reason why
>> rt_bandwidth is enabled by default (rt_runtime is 950000, and rt_period is
>> 1000000).
> 
> That's what stalld is for.  Some rt applications don't like giving up 5% of
> the cpu time when they don't really need to.
> 
> 
> Cheers,
> Phil
> 
>