linux-kernel - Re: [PATCH 1/3] generic-ipi: simplify the barriers

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090218002702.GA25256@linux.vnet.ibm.com>
Date:	Tue, 17 Feb 2009 16:27:02 -0800
From:	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To:	Peter Zijlstra <a.p.zijlstra@...llo.nl>
Cc:	Linus Torvalds <torvalds@...ux-foundation.org>,
	Nick Piggin <npiggin@...e.de>,
	Jens Axboe <jens.axboe@...cle.com>,
	Ingo Molnar <mingo@...e.hu>,
	Rusty Russell <rusty@...tcorp.com.au>,
	linux-kernel@...r.kernel.org, Oleg Nesterov <oleg@...hat.com>
Subject: Re: [PATCH 1/3] generic-ipi: simplify the barriers

On Tue, Feb 17, 2009 at 10:59:05PM +0100, Peter Zijlstra wrote:
> From: Nick Piggin <npiggin@...e.de>
> 
> Firstly, just unconditionally take the lock and check the list in the
> generic_call_function_single_interrupt IPI handler. As we've just taken
> an IPI here, the chances are fairly high that there will be work on the
> list for us, so do the locking unconditionally. This removes the tricky
> lockless list_empty check and dubious barriers. The change looks bigger
> than it is because it is just removing an outer loop.
> 
> Secondly, clarify architecture specific IPI locking rules. Generic code
> has no tools to impose any sane ordering on IPIs if they go outside
> normal cache coherency, ergo the arch code must make them appear to
> obey cache coherency as a "memory operation" to initiate an IPI, and
> a "memory operation" to receive one. This way at least they can be
> reasoned about in generic code, and smp_mb used to provide ordering.
> 
> The combination of these two changes means that explict barriers can
> be taken out of queue handling for the single case -- shared data is
> explicitly locked, and ipi ordering must conform to that, so no
> barriers needed. An extra barrier is needed in the many handler, so
> as to ensure we load the list element after the IPI is received.
> 
> Does any architecture actually needs barriers? For the initiator I
> could see it, but for the handler I would be surprised. The other
> thing we could do for simplicity is just to require that a full
> barrier is required before generating an IPI, and after receiving an
> IPI. We can't just do that in generic code without auditing
> architectures. There have been subtle hangs here on some archs in
> the past.

While I sympathize with pushing memory barriers down into the
arch-specific functions, you -are- running this by the various
arch maintainers so that they have an opportunity to adjust, right?

							Thanx, Paul

> Signed-off-by: Nick Piggin <npiggin@...e.de>
> Signed-off-by: Peter Zijlstra <a.p.zijlstra@...llo.nl>
> ---
>  kernel/smp.c |   83 +++++++++++++++++++++++++++++++----------------------------
>  1 file changed, 44 insertions(+), 39 deletions(-)
> 
> Index: linux-2.6/kernel/smp.c
> ===================================================================
> --- linux-2.6.orig/kernel/smp.c
> +++ linux-2.6/kernel/smp.c
> @@ -74,9 +74,16 @@ static void generic_exec_single(int cpu,
>  	spin_unlock_irqrestore(&dst->lock, flags);
> 
>  	/*
> -	 * Make the list addition visible before sending the ipi.
> +	 * The list addition should be visible before sending the IPI
> +	 * handler locks the list to pull the entry off it because of
> +	 * normal cache coherency rules implied by spinlocks.
> +	 *
> +	 * If IPIs can go out of order to the cache coherency protocol
> +	 * in an architecture, sufficient synchronisation should be added
> +	 * to arch code to make it appear to obey cache coherency WRT
> +	 * locking and barrier primitives. Generic code isn't really equipped
> +	 * to do the right thing...
>  	 */
> -	smp_mb();

While I sympathize with the above, you -are- running this by the various
arch maintainers so that they have an opportunity to adjust, right?

> 
>  	if (ipi)
>  		arch_send_call_function_single_ipi(cpu);
> @@ -104,6 +111,14 @@ void generic_smp_call_function_interrupt
>  	int cpu = get_cpu();
> 
>  	/*
> +	 * Ensure entry is visible on call_function_queue after we have
> +	 * entered the IPI. See comment in smp_call_function_many.
> +	 * If we don't have this, then we may miss an entry on the list
> +	 * and never get another IPI to process it.
> +	 */
> +	smp_mb();
> +
> +	/*
>  	 * It's ok to use list_for_each_rcu() here even though we may delete
>  	 * 'pos', since list_del_rcu() doesn't clear ->next
>  	 */
> @@ -154,49 +169,37 @@ void generic_smp_call_function_single_in
>  {
>  	struct call_single_queue *q = &__get_cpu_var(call_single_queue);
>  	LIST_HEAD(list);
> +	unsigned int data_flags;
> 
> -	/*
> -	 * Need to see other stores to list head for checking whether
> -	 * list is empty without holding q->lock
> -	 */
> -	smp_read_barrier_depends();
> -	while (!list_empty(&q->list)) {
> -		unsigned int data_flags;
> -
> -		spin_lock(&q->lock);
> -		list_replace_init(&q->list, &list);
> -		spin_unlock(&q->lock);
> -
> -		while (!list_empty(&list)) {
> -			struct call_single_data *data;
> -
> -			data = list_entry(list.next, struct call_single_data,
> -						list);
> -			list_del(&data->list);
> +	spin_lock(&q->lock);
> +	list_replace_init(&q->list, &list);
> +	spin_unlock(&q->lock);
> 
> -			/*
> -			 * 'data' can be invalid after this call if
> -			 * flags == 0 (when called through
> -			 * generic_exec_single(), so save them away before
> -			 * making the call.
> -			 */
> -			data_flags = data->flags;
> +	while (!list_empty(&list)) {
> +		struct call_single_data *data;
> 
> -			data->func(data->info);
> +		data = list_entry(list.next, struct call_single_data,
> +					list);
> +		list_del(&data->list);
> 
> -			if (data_flags & CSD_FLAG_WAIT) {
> -				smp_wmb();
> -				data->flags &= ~CSD_FLAG_WAIT;
> -			} else if (data_flags & CSD_FLAG_LOCK) {
> -				smp_wmb();
> -				data->flags &= ~CSD_FLAG_LOCK;
> -			} else if (data_flags & CSD_FLAG_ALLOC)
> -				kfree(data);
> -		}
>  		/*
> -		 * See comment on outer loop
> +		 * 'data' can be invalid after this call if
> +		 * flags == 0 (when called through
> +		 * generic_exec_single(), so save them away before
> +		 * making the call.
>  		 */
> -		smp_read_barrier_depends();
> +		data_flags = data->flags;
> +
> +		data->func(data->info);
> +
> +		if (data_flags & CSD_FLAG_WAIT) {
> +			smp_wmb();
> +			data->flags &= ~CSD_FLAG_WAIT;
> +		} else if (data_flags & CSD_FLAG_LOCK) {
> +			smp_wmb();
> +			data->flags &= ~CSD_FLAG_LOCK;
> +		} else if (data_flags & CSD_FLAG_ALLOC)
> +			kfree(data);
>  	}
>  }
> 
> @@ -375,6 +378,8 @@ void smp_call_function_many(const struct
> 
>  	/*
>  	 * Make the list addition visible before sending the ipi.
> +	 * (IPIs must obey or appear to obey normal Linux cache coherency
> +	 * rules -- see comment in generic_exec_single).
>  	 */
>  	smp_mb();
> 
> 
> -- 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/