linux-kernel - Re: [PATCH] sched/fair: initialize throttle

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <xm26r3bxvzep.fsf@bsegall-linux.mtv.corp.google.com>
Date:	Thu, 16 Jun 2016 10:03:58 -0700
From:	bsegall@...gle.com
To:	Konstantin Khlebnikov <khlebnikov@...dex-team.ru>
Cc:	Peter Zijlstra <peterz@...radead.org>,
	Ingo Molnar <mingo@...hat.com>, linux-kernel@...r.kernel.org,
	stable@...r.kernel.org
Subject: Re: [PATCH] sched/fair: initialize throttle_count for new task-groups lazily

Konstantin Khlebnikov <khlebnikov@...dex-team.ru> writes:

> Cgroup created inside throttled group must inherit current throttle_count.
> Broken throttle_count allows to nominate throttled entries as a next buddy,
> later this leads to null pointer dereference in pick_next_task_fair().
>
> This patch initialize cfs_rq->throttle_count at first enqueue: laziness
> allows to skip locking all rq at group creation. Lazy approach also allows
> to skip full sub-tree scan at throttling hierarchy (not in this patch).
>
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@...dex-team.ru>
> Cc: Stable <stable@...r.kernel.org> # v3.2+
> ---
>  kernel/sched/fair.c  |   19 +++++++++++++++++++
>  kernel/sched/sched.h |    2 +-
>  2 files changed, 20 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 218f8e83db73..fe809fe169d2 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -4185,6 +4185,25 @@ static void check_enqueue_throttle(struct cfs_rq *cfs_rq)
>  	if (!cfs_bandwidth_used())
>  		return;
>  
> +	/* synchronize hierarchical throttle counter */
> +	if (unlikely(!cfs_rq->throttle_uptodate)) {
> +		struct rq *rq = rq_of(cfs_rq);
> +		struct cfs_rq *pcfs_rq;
> +		struct task_group *tg;
> +
> +		cfs_rq->throttle_uptodate = 1;
> +		/* get closest uptodate node because leaves goes first */
> +		for (tg = cfs_rq->tg->parent; tg; tg = tg->parent) {
> +			pcfs_rq = tg->cfs_rq[cpu_of(rq)];
> +			if (pcfs_rq->throttle_uptodate)
> +				break;
> +		}
> +		if (tg) {
> +			cfs_rq->throttle_count = pcfs_rq->throttle_count;
> +			cfs_rq->throttled_clock_task = rq_clock_task(rq);
> +		}
> +	}
> +

Checking just in enqueue is not sufficient - throttled_lb_pair can check
against a cfs_rq that has never been enqueued (and possibly other
paths).

It might also make sense to go ahead and initialize all the cfs_rqs we
skipped over to avoid some n^2 pathological behavior. You could also use
throttle_count == -1 or < 0. (We had our own version of this patch that
I guess we forgot to push?)


>  	/* an active group must be handled by the update_curr()->put() path */
>  	if (!cfs_rq->runtime_enabled || cfs_rq->curr)
>  		return;
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index 72f1f3087b04..7cbeb92a1cb9 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -437,7 +437,7 @@ struct cfs_rq {
>  
>  	u64 throttled_clock, throttled_clock_task;
>  	u64 throttled_clock_task_time;
> -	int throttled, throttle_count;
> +	int throttled, throttle_count, throttle_uptodate;
>  	struct list_head throttled_list;
>  #endif /* CONFIG_CFS_BANDWIDTH */
>  #endif /* CONFIG_FAIR_GROUP_SCHED */