linux-kernel - Re: [PATCH RESEND] sched/nohz: Add HRTICK_BW for using cfs bandwidth with nohz

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20230518143718.GC110197@lorien.usersys.redhat.com>
Date:   Thu, 18 May 2023 10:37:18 -0400
From:   Phil Auld <pauld@...hat.com>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     linux-kernel@...r.kernel.org, Juri Lelli <juri.lelli@...hat.com>,
        Ingo Molnar <mingo@...hat.com>,
        Daniel Bristot de Oliveira <bristot@...hat.com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Valentin Schneider <vschneid@...hat.com>,
        Ben Segall <bsegall@...gle.com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Mel Gorman <mgorman@...e.de>
Subject: Re: [PATCH RESEND] sched/nohz: Add HRTICK_BW for using cfs bandwidth
 with nohz_full

On Thu, May 18, 2023 at 03:47:46PM +0200 Peter Zijlstra wrote:
> On Thu, May 18, 2023 at 09:20:38AM -0400, Phil Auld wrote:
> > CFS bandwidth limits and NOHZ full don't play well together.  Tasks
> > can easily run well past their quotas before a remote tick does
> > accounting.  This leads to long, multi-period stalls before such
> > tasks can run again.  Use the hrtick mechanism to set a sched
> > tick to fire at remaining_runtime in the future if we are on
> > a nohz full cpu, if the task has quota and if we are likely to
> > disable the tick (nr_running == 1).  This allows for bandwidth
> > accounting before tasks go too far over quota.
> > 
> > A number of container workloads use a dynamic number of real
> > nohz tasks but also have other work that is limited which ends
> > up running on the "spare" nohz cpus.  This is an artifact of
> > having to specify nohz_full cpus at boot. Adding this hrtick
> > resolves the issue of long stalls on these tasks.
> > 
> > Add the sched_feat HRTICK_BW off by default to allow users to
> > enable this only when needed.
> 
> OMG; so because NOHZ_FULL configuration sucks, we add hacks on?
>

I suppose one could make that argument. The HRTICK mechanism is already
in place and used similarly for DL (and that also benefits nohz workloads).

I don't see NOHZ_FULL configuration getting better anytime soon, although
I think efforts are being made in that direction. 

This seemed to be a sane way to handle what are effectively conflicting
requirements.  Stalling a task to the point the host gets rebooted is
pretty painful.  Maybe if we could fail the tick_stop test in this
case that would work but that would keep all the ticks whereas this
tries to respect the request for nohz as much as possible. 

Thanks for taking a look :)

Cheers,
Phil
--