[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <56720D78.9080706@gmail.com>
Date: Wed, 16 Dec 2015 17:18:48 -0800
From: David Daney <ddaney.cavm@...il.com>
To: Will Deacon <will.deacon@....com>
CC: peterz@...radead.org, linux-arm-kernel@...ts.infradead.org,
linux-kernel@...r.kernel.org, andrew.pinski@...iumnetworks.com,
dbueso@...e.de
Subject: Re: [PATCH] locking/osq: fix ordering of node initialisation in osq_lock
What is the status of this patch? It there a good likelihood that it
will make it into v4.4?
If not, we should request that c55a6ffa6285 ("locking/osq: Relax atomic
semantics") be reverted for v4.4
David Daney
On 12/11/2015 09:46 AM, Will Deacon wrote:
> The Cavium guys reported a soft lockup on their arm64 machine, caused
> by c55a6ffa6285 ("locking/osq: Relax atomic semantics"):
>
> [ 68.909948] [<fffffe00000da03c>] mutex_optimistic_spin+0x9c/0x1d0
> [ 68.909951] [<fffffe00006fe4b8>] __mutex_lock_slowpath+0x44/0x158
> [ 68.909953] [<fffffe00006fe620>] mutex_lock+0x54/0x58
> [ 68.909956] [<fffffe0000265efc>] kernfs_iop_permission+0x38/0x70
> [ 68.909959] [<fffffe00001fbf50>] __inode_permission+0x88/0xd8
> [ 68.909961] [<fffffe00001fbfd0>] inode_permission+0x30/0x6c
> [ 68.909964] [<fffffe00001fe26c>] link_path_walk+0x68/0x4d4
> [ 68.909966] [<fffffe00001ffa14>] path_openat+0xb4/0x2bc
> [ 68.909968] [<fffffe000020123c>] do_filp_open+0x74/0xd0
> [ 68.909971] [<fffffe00001f13e4>] do_sys_open+0x14c/0x228
> [ 68.909973] [<fffffe00001f1544>] SyS_openat+0x3c/0x48
> [ 68.909976] [<fffffe00000851f0>] el0_svc_naked+0x24/0x28
>
> This is because in osq_lock we initialise the node for the current CPU:
>
> node->locked = 0;
> node->next = NULL;
> node->cpu = curr;
>
> and then publish the current CPU in the lock tail:
>
> old = atomic_xchg_acquire(&lock->tail, curr);
>
> Once the update to lock->tail is visible to another CPU, the node is
> then live and can be both read and updated by concurrent lockers.
>
> Unfortunately, the ACQUIRE semantics of the xchg operation mean that
> there is no guarantee the contents of the node will be visible before
> lock tail is updated. This can lead to lock corruption when, for example,
> a concurrent locker races to set the next field.
>
> Fixes: c55a6ffa6285 ("locking/osq: Relax atomic semantics"):
> Reported-by: David Daney <ddaney@...iumnetworks.com>
> Reported-by: Andrew Pinski <andrew.pinski@...iumnetworks.com>
> Tested-by: Andrew Pinski <andrew.pinski@...iumnetworks.com>
> Acked-by: Davidlohr Bueso <dave@...olabs.net>
> Signed-off-by: Will Deacon <will.deacon@....com>
> ---
> kernel/locking/osq_lock.c | 8 +++++---
> 1 file changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c
> index d092a0c9c2d4..05a37857ab55 100644
> --- a/kernel/locking/osq_lock.c
> +++ b/kernel/locking/osq_lock.c
> @@ -93,10 +93,12 @@ bool osq_lock(struct optimistic_spin_queue *lock)
> node->cpu = curr;
>
> /*
> - * ACQUIRE semantics, pairs with corresponding RELEASE
> - * in unlock() uncontended, or fastpath.
> + * We need both ACQUIRE (pairs with corresponding RELEASE in
> + * unlock() uncontended, or fastpath) and RELEASE (to publish
> + * the node fields we just initialised) semantics when updating
> + * the lock tail.
> */
> - old = atomic_xchg_acquire(&lock->tail, curr);
> + old = atomic_xchg(&lock->tail, curr);
> if (old == OSQ_UNLOCKED_VAL)
> return true;
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists