linux-kernel - RE: [PATCH] smp_call_function

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1295288253.30950.280.camel@laptop>
Date:	Mon, 17 Jan 2011 19:17:33 +0100
From:	Peter Zijlstra <peterz@...radead.org>
To:	Anton Blanchard <anton@...ba.org>
Cc:	xiaoguangrong@...fujitsu.com, mingo@...e.hu, jaxboe@...ionio.com,
	npiggin@...il.com, rusty@...tcorp.com.au,
	akpm@...ux-foundation.org, torvalds@...ux-foundation.org,
	paulmck@...ux.vnet.ibm.com, miltonm@....com,
	benh@...nel.crashing.org, linux-kernel@...r.kernel.org
Subject: RE: [PATCH] smp_call_function_many SMP race

On Wed, 2011-01-12 at 15:07 +1100, Anton Blanchard wrote:

> I managed to forget all about this bug, probably because of how much it
> makes my brain hurt.

Agreed.


> I tried to fix it by ordering the read and the write of ->cpumask and
> ->refs. In doing so I missed a critical case but Paul McKenney was able
> to spot my bug thankfully :) To ensure we arent viewing previous
> iterations the interrupt handler needs to read ->refs then ->cpumask
> then ->refs _again_.

> ---
> 
> Index: linux-2.6/kernel/smp.c
> ===================================================================
> --- linux-2.6.orig/kernel/smp.c 2010-12-22 17:19:11.262835785 +1100
> +++ linux-2.6/kernel/smp.c      2011-01-12 15:03:08.793324402 +1100
> @@ -194,6 +194,31 @@ void generic_smp_call_function_interrupt
>         list_for_each_entry_rcu(data, &call_function.queue, csd.list) {
>                 int refs;
>  
> +               /*
> +                * Since we walk the list without any locks, we might
> +                * see an entry that was completed, removed from the
> +                * list and is in the process of being reused.
> +                *
> +                * Just checking data->refs then data->cpumask is not good
> +                * enough because we could see a non zero data->refs from a
> +                * previous iteration. We need to check data->refs, then
> +                * data->cpumask then data->refs again. Talk about
> +                * complicated!
> +                */
> +
> +               if (atomic_read(&data->refs) == 0)
> +                       continue;
> +

So here we might see the old ref

> +               smp_rmb();
> +
> +               if (!cpumask_test_cpu(cpu, data->cpumask))
> +                       continue;

Here we might see the new cpumask

> +               smp_rmb();
> +
> +               if (atomic_read(&data->refs) == 0)
> +                       continue;
> +

But then still see a 0 ref, at which point we skip this entry and rely
on the fact that arch_send_call_function_ipi_mask() will simply latch
our IPI line and cause a back-to-back IPI such that we can process the
data on the second go-round?

>                 if (!cpumask_test_and_clear_cpu(cpu, data->cpumask))
>                         continue;

And finally, once we observe a valid ->refs, do we test the ->cpumask
again so we cross with the store order (->cpumask first, then ->refs).

> @@ -458,6 +483,14 @@ void smp_call_function_many(const struct
>         data->csd.info = info;
>         cpumask_and(data->cpumask, mask, cpu_online_mask);
>         cpumask_clear_cpu(this_cpu, data->cpumask);
> +
> +       /*
> +        * To ensure the interrupt handler gets an up to date view
> +        * we order the cpumask and refs writes and order the
> +        * read of them in the interrupt handler.
> +        */
> +       smp_wmb();
> +
>         atomic_set(&data->refs, cpumask_weight(data->cpumask));
>  
>         raw_spin_lock_irqsave(&call_function.lock, flags); 

Read side:			Write side:

list_for_each_rcu()
  !->refs, continue		  ->cpumask = 
rmb				wmb
  !->cpumask, continue		  ->refs = 
rmb				wmb
  !->refs, continue		  list_add_rcu()
mb
  !->cpumask, continue



Wouldn't something like:

list_for_each_rcu()
  !->cpumask, continue		  ->refs =
rmb				wmb
  !->refs, continue		  ->cpumask =
mb				wmb
  !->cpumask, continue		  list_add_rcu()


Suffice? There we can observe the old ->cpumask, new ->refs and new
->cpumask in crossed order, so we filter out the old, and cross the new,
and have one rmb and conditional less.

Or am I totally missing something here,.. like said, this stuff hurts
brains.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/