[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <xm26ilx86gmp.fsf@google.com>
Date: Wed, 03 Nov 2021 15:03:58 -0700
From: Benjamin Segall <bsegall@...gle.com>
To: Mathias Krause <minipli@...ecurity.net>
Cc: Ingo Molnar <mingo@...hat.com>,
Peter Zijlstra <peterz@...radead.org>,
Juri Lelli <juri.lelli@...hat.com>,
Vincent Guittot <vincent.guittot@...aro.org>,
Michal Koutný <mkoutny@...e.com>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Steven Rostedt <rostedt@...dmis.org>,
Mel Gorman <mgorman@...e.de>,
Daniel Bristot de Oliveira <bristot@...hat.com>,
Valentin Schneider <valentin.schneider@....com>,
linux-kernel@...r.kernel.org, Odin Ugedal <odin@...d.al>,
Kevin Tanguy <kevin.tanguy@...p.ovh.com>,
Brad Spengler <spender@...ecurity.net>
Subject: Re: [PATCH] sched/fair: Prevent dead task groups from regaining
cfs_rq's
Mathias Krause <minipli@...ecurity.net> writes:
> Kevin is reporting crashes which point to a use-after-free of a cfs_rq
> in update_blocked_averages(). Initial debugging revealed that we've live
> cfs_rq's (on_list=1) in an about to be kfree()'d task group in
> free_fair_sched_group(). However, it was unclear how that can happen.
> [...]
> Fixes: a7b359fc6a37 ("sched/fair: Correctly insert cfs_rq's to list on unthrottle")
> Cc: Odin Ugedal <odin@...d.al>
> Cc: Michal Koutný <mkoutny@...e.com>
> Reported-by: Kevin Tanguy <kevin.tanguy@...p.ovh.com>
> Suggested-by: Brad Spengler <spender@...ecurity.net>
> Signed-off-by: Mathias Krause <minipli@...ecurity.net>
> ---
> kernel/sched/core.c | 18 +++++++++++++++---
> 1 file changed, 15 insertions(+), 3 deletions(-)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 978460f891a1..60125a6c9d1b 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -9506,13 +9506,25 @@ void sched_offline_group(struct task_group *tg)
> {
> unsigned long flags;
>
> - /* End participation in shares distribution: */
> - unregister_fair_sched_group(tg);
> -
> + /*
> + * Unlink first, to avoid walk_tg_tree_from() from finding us (via
> + * sched_cfs_period_timer()).
> + */
> spin_lock_irqsave(&task_group_lock, flags);
> list_del_rcu(&tg->list);
> list_del_rcu(&tg->siblings);
> spin_unlock_irqrestore(&task_group_lock, flags);
> +
> + /*
> + * Wait for all pending users of this task group to leave their RCU
> + * critical section to ensure no new user will see our dying task
> + * group any more. Specifically ensure that tg_unthrottle_up() won't
> + * add decayed cfs_rq's to it.
> + */
> + synchronize_rcu();
I was going to suggest that we could just clear all of avg.load_sum/etc, but
that breaks the speculative on_list read. Currently the final avg update
just races, but that's not good enough if we wanted to rely on it to
prevent UAF. synchronize_rcu() doesn't look so bad if the alternative is
taking every rqlock anyways.
I do wonder if we can move the relevant part of
unregister_fair_sched_group into sched_free_group_rcu. After all
for_each_leaf_cfs_rq_safe is not _rcu and update_blocked_averages does
in fact hold the rqlock (though print_cfs_stats thinks it is _rcu and
should be updated).
> +
> + /* End participation in shares distribution: */
> + unregister_fair_sched_group(tg);
> }
>
> static void sched_change_group(struct task_struct *tsk, int type)
Powered by blists - more mailing lists