linux-kernel - Re: [PATCH] sched/fair: Prevent dead task groups from regaining cfs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <xm26ilx86gmp.fsf@google.com>
Date:   Wed, 03 Nov 2021 15:03:58 -0700
From:   Benjamin Segall <bsegall@...gle.com>
To:     Mathias Krause <minipli@...ecurity.net>
Cc:     Ingo Molnar <mingo@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Juri Lelli <juri.lelli@...hat.com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Michal Koutný <mkoutny@...e.com>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Mel Gorman <mgorman@...e.de>,
        Daniel Bristot de Oliveira <bristot@...hat.com>,
        Valentin Schneider <valentin.schneider@....com>,
        linux-kernel@...r.kernel.org, Odin Ugedal <odin@...d.al>,
        Kevin Tanguy <kevin.tanguy@...p.ovh.com>,
        Brad Spengler <spender@...ecurity.net>
Subject: Re: [PATCH] sched/fair: Prevent dead task groups from regaining
 cfs_rq's

Mathias Krause <minipli@...ecurity.net> writes:

> Kevin is reporting crashes which point to a use-after-free of a cfs_rq
> in update_blocked_averages(). Initial debugging revealed that we've live
> cfs_rq's (on_list=1) in an about to be kfree()'d task group in
> free_fair_sched_group(). However, it was unclear how that can happen.
> [...]
> Fixes: a7b359fc6a37 ("sched/fair: Correctly insert cfs_rq's to list on unthrottle")
> Cc: Odin Ugedal <odin@...d.al>
> Cc: Michal Koutný <mkoutny@...e.com>
> Reported-by: Kevin Tanguy <kevin.tanguy@...p.ovh.com>
> Suggested-by: Brad Spengler <spender@...ecurity.net>
> Signed-off-by: Mathias Krause <minipli@...ecurity.net>
> ---
>  kernel/sched/core.c | 18 +++++++++++++++---
>  1 file changed, 15 insertions(+), 3 deletions(-)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 978460f891a1..60125a6c9d1b 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -9506,13 +9506,25 @@ void sched_offline_group(struct task_group *tg)
>  {
>  	unsigned long flags;
>  
> -	/* End participation in shares distribution: */
> -	unregister_fair_sched_group(tg);
> -
> +	/*
> +	 * Unlink first, to avoid walk_tg_tree_from() from finding us (via
> +	 * sched_cfs_period_timer()).
> +	 */
>  	spin_lock_irqsave(&task_group_lock, flags);
>  	list_del_rcu(&tg->list);
>  	list_del_rcu(&tg->siblings);
>  	spin_unlock_irqrestore(&task_group_lock, flags);
> +
> +	/*
> +	 * Wait for all pending users of this task group to leave their RCU
> +	 * critical section to ensure no new user will see our dying task
> +	 * group any more. Specifically ensure that tg_unthrottle_up() won't
> +	 * add decayed cfs_rq's to it.
> +	 */
> +	synchronize_rcu();

I was going to suggest that we could just clear all of avg.load_sum/etc, but
that breaks the speculative on_list read. Currently the final avg update
just races, but that's not good enough if we wanted to rely on it to
prevent UAF. synchronize_rcu() doesn't look so bad if the alternative is
taking every rqlock anyways.

I do wonder if we can move the relevant part of
unregister_fair_sched_group into sched_free_group_rcu. After all
for_each_leaf_cfs_rq_safe is not _rcu and update_blocked_averages does
in fact hold the rqlock (though print_cfs_stats thinks it is _rcu and
should be updated). 


> +
> +	/* End participation in shares distribution: */
> +	unregister_fair_sched_group(tg);
>  }
>  
>  static void sched_change_group(struct task_struct *tsk, int type)