linux-kernel - Re: [PATCH 2/3] sched: enforce per-cpu utilization limits on runtime balancing

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Thu, 25 Feb 2010 21:28:28 +0100
From:	Peter Zijlstra <peterz@...radead.org>
To:	Fabio Checconi <fchecconi@...il.com>
Cc:	Ingo Molnar <mingo@...e.hu>, Thomas Gleixner <tglx@...utronix.de>,
	Paul Turner <pjt@...gle.com>,
	Dario Faggioli <faggioli@...dalf.sssup.it>,
	Michael Trimarchi <michael@...dence.eu.com>,
	Dhaval Giani <dhaval@...is.sssup.it>,
	Tommaso Cucinotta <t.cucinotta@...up.it>,
	linux-kernel@...r.kernel.org,
	Fabio Checconi <fabio@...dalf.sssup.it>
Subject: Re: [PATCH 2/3] sched: enforce per-cpu utilization limits on
 runtime balancing

On Tue, 2010-02-23 at 19:56 +0100, Fabio Checconi wrote:

>  /*
> + * Reset the balancing machinery, restarting from a safe runtime assignment
> + * on all the cpus/rt_rqs in the system.  There is room for improvements here,
> + * as this iterates through all the rt_rqs in the system; the main problem
> + * is that after the balancing has been running for some time we are not
> + * sure that the fragmentation of the free bandwidth it produced allows new
> + * groups to run where they need to run.  The caller has to make sure that
> + * only one instance of this function is running at any time.
>   */
> +static void __rt_reset_runtime(void)
>  {
> +       struct rq *rq;
> +       struct rt_rq *rt_rq;
> +       struct rt_bandwidth *rt_b;
> +       unsigned long flags;
> +       int i;
> +
> +       for_each_possible_cpu(i) {
> +               rq = cpu_rq(i);
> +
> +               rq->rt_balancing_disabled = 1;
> +               /*
> +                * Make sure that all the new calls to do_balance_runtime()
> +                * see the disable flag and do not migrate anything.  We will
> +                * implicitly wait for the old ones to terminate entering all
> +                * the rt_b->rt_runtime_lock, one by one.  Note that maybe
> +                * iterating over the task_groups first would be a good idea...
> +                */
> +               smp_wmb();
> +
> +               for_each_leaf_rt_rq(rt_rq, rq) {
> +                       rt_b = sched_rt_bandwidth(rt_rq);
> +
> +                       raw_spin_lock_irqsave(&rt_b->rt_runtime_lock, flags);
> +                       raw_spin_lock(&rt_rq->rt_runtime_lock);
> +                       rt_rq->rt_runtime = rt_b->rt_runtime;
> +                       rt_rq->rt_period = rt_b->rt_period;
> +                       rt_rq->rt_time = 0;
> +                       raw_spin_unlock(&rt_rq->rt_runtime_lock);
> +                       raw_spin_unlock_irqrestore(&rt_b->rt_runtime_lock, flags);
> +               }
> +       }
> +}


> +/*
> + * Handle runtime rebalancing: try to push our bandwidth to
> + * runqueues that need it.
> + */
> +static void do_balance_runtime(struct rt_rq *rt_rq)
> +{
> +       struct rq *rq = cpu_rq(smp_processor_id());
> +       struct rt_bandwidth *rt_b = sched_rt_bandwidth(rt_rq);
> +       struct root_domain *rd = rq->rd;
> +       int i, weight, ret;
> +       u64 rt_period, prev_runtime;
> +       s64 diff;
> +
>         weight = cpumask_weight(rd->span);
>  
>         raw_spin_lock(&rt_b->rt_runtime_lock);
> +       /*
> +        * The raw_spin_lock() acts as an acquire barrier, ensuring
> +        * that rt_balancing_disabled is accessed after taking the lock;
> +        * since rt_reset_runtime() takes rt_runtime_lock after
> +        * setting the disable flag we are sure that no bandwidth
> +        * is migrated while the reset is in progress.
> +        */

Note that LOCK != {RMB,MB}, what you can do is order the WMB with the
UNLOCK+LOCK (== MB).

I'm thinking the WMB above is superfluous, either we are already in
do_balance() and __rt_reset_runtime() will wait for us, or
__rt_reset_runtime() will have done a LOCK+UNLOCK between setting
->rt_balancing_disabled here and we'll have done a LOCK before the read.

So we always have at least store+UNLOCK+LOCK+load, which can never be
reordered.

IOW, look at it as if the store leaks into the rt_b->rt_runtime_lock
section, in that case that lock properly serializes the store and these
loads.

> +       if (rq->rt_balancing_disabled)
> +               goto out;

( maybe call that label unlock )

> +
> +       prev_runtime = rt_rq->rt_runtime;
>         rt_period = ktime_to_ns(rt_b->rt_period);
> +
>         for_each_cpu(i, rd->span) {
>                 struct rt_rq *iter = sched_rt_period_rt_rq(rt_b, i);
> +               struct rq *iter_rq = rq_of_rt_rq(iter);
>  
>                 if (iter == rt_rq)
>                         continue;

Idem to the above ordering.

> +               if (iter_rq->rt_balancing_disabled)
> +                       continue;
> +
>                 raw_spin_lock(&iter->rt_runtime_lock);
>                 /*
>                  * Either all rqs have inf runtime and there's nothing to steal 



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/