linux-kernel - Re: [PATCH v6 2/2] sched/uclamp: Protect uclamp fast path code with static key

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20200630175502.otw4seymlynghje7@e107158-lin.cambridge.arm.com>
Date:   Tue, 30 Jun 2020 18:55:02 +0100
From:   Qais Yousef <qais.yousef@....com>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     Ingo Molnar <mingo@...hat.com>,
        Valentin Schneider <valentin.schneider@....com>,
        Juri Lelli <juri.lelli@...hat.com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
        Patrick Bellasi <patrick.bellasi@...bug.net>,
        Chris Redpath <chris.redpath@....com>,
        Lukasz Luba <lukasz.luba@....com>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v6 2/2] sched/uclamp: Protect uclamp fast path code with
 static key

On 06/30/20 19:07, Peter Zijlstra wrote:
> On Tue, Jun 30, 2020 at 12:21:23PM +0100, Qais Yousef wrote:
> > @@ -993,10 +1013,38 @@ static inline void uclamp_rq_dec_id(struct rq *rq, struct task_struct *p,
> >  
> >  	lockdep_assert_held(&rq->lock);
> >  
> > +	/*
> > +	 * If sched_uclamp_used was enabled after task @p was enqueued,
> > +	 * we could end up with unbalanced call to uclamp_rq_dec_id().
> > +	 *
> > +	 * In this case the uc_se->active flag should be false since no uclamp
> > +	 * accounting was performed at enqueue time and we can just return
> > +	 * here.
> > +	 *
> > +	 * Need to be careful of the following enqeueue/dequeue ordering
> > +	 * problem too
> > +	 *
> > +	 *	enqueue(taskA)
> > +	 *	// sched_uclamp_used gets enabled
> > +	 *	enqueue(taskB)
> > +	 *	dequeue(taskA)
> > +	 *	// Must not decrement bukcet->tasks here
> > +	 *	dequeue(taskB)
> > +	 *
> > +	 * where we could end up with stale data in uc_se and
> > +	 * bucket[uc_se->bucket_id].
> > +	 *
> > +	 * The following check here eliminates the possibility of such race.
> > +	 */
> > +	if (unlikely(!uc_se->active))
> > +		return;
> > +
> >  	bucket = &uc_rq->bucket[uc_se->bucket_id];
> > +
> >  	SCHED_WARN_ON(!bucket->tasks);
> >  	if (likely(bucket->tasks))
> >  		bucket->tasks--;
> > +
> >  	uc_se->active = false;
> >  
> >  	/*
> 
> > @@ -1221,6 +1289,8 @@ static void __setscheduler_uclamp(struct task_struct *p,
> >  	if (likely(!(attr->sched_flags & SCHED_FLAG_UTIL_CLAMP)))
> >  		return;
> >  
> > +	static_branch_enable(&sched_uclamp_used);
> > +
> >  	if (attr->sched_flags & SCHED_FLAG_UTIL_CLAMP_MIN) {
> >  		uclamp_se_set(&p->uclamp_req[UCLAMP_MIN],
> >  			      attr->sched_util_min, true);
> > @@ -7387,6 +7457,8 @@ static ssize_t cpu_uclamp_write(struct kernfs_open_file *of, char *buf,
> >  	if (req.ret)
> >  		return req.ret;
> >  
> > +	static_branch_enable(&sched_uclamp_used);
> > +
> >  	mutex_lock(&uclamp_mutex);
> >  	rcu_read_lock();
> >  
> 
> There's a fun race described in 9107c89e269d ("perf: Fix race between
> event install and jump_labels"), are we sure this isn't also susceptible
> to something similar?
> 
> I suspect not, but I just wanted to make sure.

IIUC, the worry is that not all CPUs might have observed the change in the
static key state; hence could not be running the patched
enqueue/dequeue_task(), so we could end up with some CPUs accounting for
uclamp in the enqueue/dequeue path but not others?

I was hoping this synchronization is guaranteed by the static_branch_*() call.

aarch64_insn_patch_text_nosync() in arch/arm64/kernel/insn.c performs
__flush_icache_range() after writing the new instruction.

I need to dig into what __flush_icache_range() do, but I think it'll just
flush the icache of the calling CPU. Need to dig into upper layers too.

It'd be good to know if I got you correctly first.

Thanks

--
Qais Yousef