[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <yw1xoaezmjj4.fsf@unicorn.mansr.com>
Date: Thu, 12 Nov 2015 13:31:11 +0000
From: Måns Rullgård <mans@...sr.com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: ralf@...ux-mips.org, ddaney@...iumnetworks.com,
linux-kernel@...r.kernel.org,
Paul McKenney <paulmck@...ux.vnet.ibm.com>,
Will Deacon <will.deacon@....com>,
torvalds@...ux-foundation.org, boqun.feng@...il.com
Subject: Re: [RFC][PATCH] mips: Fix arch_spin_unlock()
Peter Zijlstra <peterz@...radead.org> writes:
> Hi
>
> I think the MIPS arch_spin_unlock() is borken.
>
> spin_unlock() must have RELEASE semantics, these require that no LOADs
> nor STOREs leak out from the critical section.
>
> From what I know MIPS has a relaxed memory model which allows reads to
> pass stores, and as implemented arch_spin_unlock() only issues a wmb
> which doesn't order prior reads vs later stores.
This is correct.
> Therefore upgrade the wmb() to smp_mb().
>
> (Also, why the unconditional wmb, as opposed to smp_wmb() ?)
Good question.
The current MIPS asm/barrier.h uses a plain SYNC instruction for all
kinds of barriers (except on Cavium Octeon), which is a bit wasteful.
A MIPS implementation can optionally support partial barriers (load,
store, acquire, release) which all behave like a full barrier if not
implemented, so those really ought to be used.
> Maybe-Signed-off-by: Peter Zijlstra (Intel) <peterz@...radead.org>
> ---
> diff --git a/arch/mips/include/asm/spinlock.h b/arch/mips/include/asm/spinlock.h
> index 40196bebe849..b2ca13f06152 100644
> --- a/arch/mips/include/asm/spinlock.h
> +++ b/arch/mips/include/asm/spinlock.h
> @@ -140,7 +140,7 @@ static inline void arch_spin_lock(arch_spinlock_t *lock)
> static inline void arch_spin_unlock(arch_spinlock_t *lock)
> {
> unsigned int serving_now = lock->h.serving_now + 1;
> - wmb();
> + smp_mb();
> lock->h.serving_now = (u16)serving_now;
> nudge_writes();
> }
All this weirdness was added in commit 500c2e1f:
MIPS: Optimize spinlocks.
The current locking mechanism uses a ll/sc sequence to release a
spinlock. This is slower than a wmb() followed by a store to unlock.
The branching forward to .subsection 2 on sc failure slows down the
contended case. So we get rid of that part too.
Since we are now working on naturally aligned u16 values, we can get
rid of a masking operation as the LHU already does the right thing.
The ANDI are reversed for better scheduling on multi-issue CPUs
On a 12 CPU 750MHz Octeon cn5750 this patch improves ipv4 UDP packet
forwarding rates from 3.58*10^6 PPS to 3.99*10^6 PPS, or about 11%.
Signed-off-by: David Daney <ddaney@...iumnetworks.com>
To: linux-mips@...ux-mips.org
Patchwork: http://patchwork.linux-mips.org/patch/937/
Signed-off-by: Ralf Baechle <ralf@...ux-mips.org>
--
Måns Rullgård
mans@...sr.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists