linux-kernel - Re: call_function_many: fix list delete vs add race

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1296508677.26581.84.camel@laptop>
Date:	Mon, 31 Jan 2011 22:17:57 +0100
From:	Peter Zijlstra <peterz@...radead.org>
To:	Milton Miller <miltonm@....com>
Cc:	akpm@...ux-foundation.org, Anton Blanchard <anton@...ba.org>,
	xiaoguangrong@...fujitsu.com, mingo@...e.hu, jaxboe@...ionio.com,
	npiggin@...il.com, rusty@...tcorp.com.au,
	torvalds@...ux-foundation.org, paulmck@...ux.vnet.ibm.com,
	benh@...nel.crashing.org, linux-kernel@...r.kernel.org
Subject: Re: call_function_many: fix list delete vs add race

On Fri, 2011-01-28 at 18:20 -0600, Milton Miller wrote:
> @@ -491,15 +491,17 @@ void smp_call_function_many(const struct
>         cpumask_clear_cpu(this_cpu, data->cpumask);
>  
>         /*
> -        * To ensure the interrupt handler gets an complete view
> -        * we order the cpumask and refs writes and order the read
> -        * of them in the interrupt handler.  In addition we may
> -        * only clear our own cpu bit from the mask.
> +        * We reuse the call function data without waiting for any grace
> +        * period after some other cpu removes it from the global queue.
> +        * This means a cpu might find our data block as it is writen.
> +        * The interrupt handler waits until it sees refs filled out
> +        * while its cpu mask bit is set; here we may only clear our
> +        * own cpu mask bit, and must wait to set refs until we are sure
> +        * previous writes are complete and we have obtained the lock to
> +        * add the element to the queue.  We use the acquire and release
> +        * of the lock as a wmb() -- acquire prevents write moving up and
> +        * release requires old writes are visible.

That's wrong:

 ->foo =
 LOCK
 UNLOCK
 ->bar =

can be re-ordered as:

 LOCK
 ->bar =
 ->foo =
 UNLOCK

Only a UNLOCK+LOCK sequence can be considered an MB.

However, I think the code is still OK, because list_add_rcu() implies a
wmb(), so in that respect its an improvement since we fix a race and
avoid an extra wmb. But the comment needs an update.

>          */
> -       smp_wmb();
> -
> -       atomic_set(&data->refs, cpumask_weight(data->cpumask));
> -
>         raw_spin_lock_irqsave(&call_function.lock, flags);
>         /*
>          * Place entry at the _HEAD_ of the list, so that any cpu still
> @@ -509,6 +511,8 @@ void smp_call_function_many(const struct
>         list_add_rcu(&data->csd.list, &call_function.queue);
>         raw_spin_unlock_irqrestore(&call_function.lock, flags);

And this wants to grow a comment that it relies on the wmb implied by
list_add_rcu()

> +       atomic_set(&data->refs, cpumask_weight(data->cpumask));
> +
>         /*
>          * Make the list addition visible before sending the ipi.
>          * (IPIs must obey or appear to obey normal Linux cache 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/