linux-kernel - Re: [PATCH] sched/fair : prevent unlimited runtime on throttled group

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <xm267e1koecd.fsf@bsegall-linux.svl.corp.google.com>
Date:   Tue, 21 Jan 2020 10:26:42 -0800
From:   bsegall@...gle.com
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     bsegall@...gle.com, Vincent Guittot <vincent.guittot@...aro.org>,
        mingo@...hat.com, juri.lelli@...hat.com, dietmar.eggemann@....com,
        rostedt@...dmis.org, mgorman@...e.de, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] sched/fair : prevent unlimited runtime on throttled group

Peter Zijlstra <peterz@...radead.org> writes:

> On Tue, Jan 14, 2020 at 10:29:43AM -0800, bsegall@...gle.com wrote:
>> Vincent Guittot <vincent.guittot@...aro.org> writes:
>> 
>> > When a running task is moved on a throttled task group and there is no
>> > other task enqueued on the CPU, the task can keep running using 100% CPU
>> > whatever the allocated bandwidth for the group and although its cfs rq is
>> > throttled. Furthermore, the group entity of the cfs_rq and its parents are
>> > not enqueued but only set as curr on their respective cfs_rqs.
>> >
>> > We have the following sequence:
>> >
>> > sched_move_task
>> >   -dequeue_task: dequeue task and group_entities.
>> >   -put_prev_task: put task and group entities.
>> >   -sched_change_group: move task to new group.
>> >   -enqueue_task: enqueue only task but not group entities because cfs_rq is
>> >     throttled.
>> >   -set_next_task : set task and group_entities as current sched_entity of
>> >     their cfs_rq.
>> >
>> > Another impact is that the root cfs_rq runnable_load_avg at root rq stays
>> > null because the group_entities are not enqueued. This situation will stay
>> > the same until an "external" event triggers a reschedule. Let trigger it
>> > immediately instead.
>> 
>> Sounds reasonable to me, "moved group" being an explicit resched check
>> doesn't sound like a problem in general.
>
> Do I read that as an Ack from you Ben? :-)

Yeah,

Acked-by: Ben Segall <bsegall@...gle.com>

The only question I see is if we care about avoiding the overhead for
non-cfsb cases, but cgroup attach is already slow enough that it's
probably not a real problem, and it's reasonable to check if it's still
right to run this task in general.