lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Y2BDFNpkSawKnE9S@slm.duckdns.org>
Date:   Mon, 31 Oct 2022 11:50:12 -1000
From:   Tejun Heo <tj@...nel.org>
To:     Josh Don <joshdon@...gle.com>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...hat.com>,
        Juri Lelli <juri.lelli@...hat.com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
        Daniel Bristot de Oliveira <bristot@...hat.com>,
        Valentin Schneider <vschneid@...hat.com>,
        linux-kernel@...r.kernel.org,
        Joel Fernandes <joel@...lfernandes.org>
Subject: Re: [PATCH v2] sched: async unthrottling for cfs bandwidth

Hello,

On Mon, Oct 31, 2022 at 02:22:42PM -0700, Josh Don wrote:
> > So, TJ has been complaining about us throttling in kernel-space, causing
> > grief when we also happen to hold a mutex or some other resource and has
> > been prodding us to only throttle at the return-to-user boundary.
> 
> Yea, we've been having similar priority inversion issues. It isn't
> limited to CFS bandwidth though, such problems are also pretty easy to
> hit with configurations of shares, cpumasks, and SCHED_IDLE. I've

We need to distinguish between work-conserving and non-work-conserving
control schemes. Work-conserving ones - such as shares and idle - shouldn't
affect the aggregate amount of work the system can perform. There may be
local and temporary priority inversions but they shouldn't affect the
throughput of the system and the scheduler should be able to make the
eventual resource distribution conform to the configured targtes.

CPU affinity and bw control are not work conserving and thus cause a
different class of problems. While it is possible to slow down a system with
overly restrictive CPU affinities, it's a lot harder to do so severely
compared to BW control because no matter what you do, there's still at least
one CPU which can make full forward progress. BW control, it's really easy
to stall the entire system almost completely because we're giving userspace
the ability to stall tasks for an arbitrary amount of time at random places
in the kernel. This is what cgroup1 freezer did which had exactly the same
problems.

> chatted with the folks working on the proxy execution patch series,
> and it seems like that could be a better generic solution to these
> types of issues.

Care to elaborate?

> Throttle at return-to-user seems only mildly beneficial, and then only
> really with preemptive kernels. Still pretty easy to get inversion
> issues, e.g. a thread holding a kernel mutex wake back up into a
> hierarchy that is currently throttled, or a thread holding a kernel
> mutex exists in the hierarchy being throttled but is currently waiting
> to run.

I don't follow. If you only throttle at predefined safe spots, the easiest
place being the kernel-user boundary, you cannot get system-wide stalls from
BW restrictions, which is something the kernel shouldn't allow userspace to
cause. In your example, a thread holding a kernel mutex waking back up into
a hierarchy that is currently throttled should keep running in the kernel
until it encounters such safe throttling point where it would have released
the kernel mutex and then throttle.

Thanks.

-- 
tejun

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ