linux-kernel - Re: [PATCH] locking/osq_lock: fix a data race in osq_wait

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <A90E2B85-77CB-4743-AEC3-90D7836C4D47@lca.pw>
Date:   Wed, 22 Jan 2020 18:54:43 -0500
From:   Qian Cai <cai@....pw>
To:     Marco Elver <elver@...gle.com>
Cc:     Will Deacon <will@...nel.org>, mingo@...hat.com,
        peterz@...radead.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] locking/osq_lock: fix a data race in osq_wait_next



> On Jan 22, 2020, at 5:38 PM, Marco Elver <elver@...gle.com> wrote:
> 
> 
> 
> On Wed, 22 Jan 2020, Qian Cai wrote:
> 
>> 
>> 
>>> On Jan 22, 2020, at 11:59 AM, Will Deacon <will@...nel.org> wrote:
>>> 
>>> I don't understand this; 'next' is a local variable.
>>> 
>>> Not keen on the onslaught of random "add a READ_ONCE() to shut the
>>> sanitiser up" patches we're going to get from kcsan :(
>> 
>> My fault. I suspect it is node->next. I’ll do a bit more testing to confirm.
> 
> If possible, decode and get the line numbers. I have observed a data

[  667.817131] Reported by Kernel Concurrency Sanitizer on:
[  667.823200] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G        W    L    5.5.0-rc7-next-20200121+ #9
[  667.832839] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019
[  667.842132] ==================================================================
[  672.299421] ==================================================================
[  672.307449] BUG: KCSAN: data-race in osq_lock / osq_lock

[  672.315741] write (marked) to 0xffff8f613013be00 of 8 bytes by task 971 on cpu 59:
[  672.324085]  osq_lock+0x2fb/0x340 kernel/locking/osq_lock.c:200
[  672.328149]  __mutex_lock+0x277/0xd20 kernel/locking/mutex.c:657
[  672.332561]  mutex_lock_nested+0x31/0x40 kernel/locking/mutex.c:1118
[  672.337236]  memcg_create_kmem_cache+0x2e/0x190 mm/slab_common.c:659
[  672.342534]  memcg_kmem_cache_create_func+0x40/0x80
[  672.348177]  process_one_work+0x54c/0xbe0
[  672.352940]  worker_thread+0x80/0x650
[  672.357351]  kthread+0x1e0/0x200
[  672.361324]  ret_from_fork+0x27/0x50

[  672.367875] read to 0xffff8f613013be00 of 8 bytes by task 708 on cpu 50:
[  672.375345]  osq_lock+0x234/0x340 kernel/locking/osq_lock.c:78
[  672.379431]  __mutex_lock+0x277/0xd20 kernel/locking/mutex.c:657
[  672.383862]  mutex_lock_nested+0x31/0x40 kernel/locking/mutex.c:1118
[  672.388537]  memcg_create_kmem_cache+0x2e/0x190 mm/slab_common.c:659
[  672.393824]  memcg_kmem_cache_create_func+0x40/0x80
[  672.399461]  process_one_work+0x54c/0xbe0
[  672.404229]  worker_thread+0x80/0x650
[  672.408640]  kthread+0x1e0/0x200
[  672.412613]  ret_from_fork+0x27/0x50

This?

diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c
index 1f7734949ac8..832e87966dcf 100644
--- a/kernel/locking/osq_lock.c
+++ b/kernel/locking/osq_lock.c
@@ -75,7 +75,7 @@ osq_wait_next(struct optimistic_spin_queue *lock,
                 * wait for either @lock to point to us, through its Step-B, or
                 * wait for a new @node->next from its Step-C.
                 */
-               if (node->next) {
+               if (READ_ONCE(node->next)) {
                        next = xchg(&node->next, NULL);
                        if (next)
                                break;

> race in osq_lock before, however, this is the only one I have recently
> seen in osq_lock:
> 
> read to 0xffff88812c12d3d4 of 4 bytes by task 23304 on cpu 0:
>  osq_lock+0x170/0x2f0 kernel/locking/osq_lock.c:143
> 
> 	while (!READ_ONCE(node->locked)) {
> 		/*
> 		 * If we need to reschedule bail... so we can block.
> 		 * Use vcpu_is_preempted() to avoid waiting for a preempted
> 		 * lock holder:
> 		 */
> -->		if (need_resched() || vcpu_is_preempted(node_cpu(node->prev)))
> 			goto unqueue;
> 
> 		cpu_relax();
> 	}
> 
> where
> 
> 	static inline int node_cpu(struct optimistic_spin_node *node)
> 	{
> -->		return node->cpu - 1;
> 	}
> 
> 
> write to 0xffff88812c12d3d4 of 4 bytes by task 23334 on cpu 1:
> osq_lock+0x89/0x2f0 kernel/locking/osq_lock.c:99
> 
> 	bool osq_lock(struct optimistic_spin_queue *lock)
> 	{
> 		struct optimistic_spin_node *node = this_cpu_ptr(&osq_node);
> 		struct optimistic_spin_node *prev, *next;
> 		int curr = encode_cpu(smp_processor_id());
> 		int old;
> 
> 		node->locked = 0;
> 		node->next = NULL;
> -->		node->cpu = curr;
> 
> 
> Thanks,
> -- Marco