linux-kernel - Re: [PATCH] locking/osq_lock: fix a data race in osq_wait

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <2E13BFD2-A2E5-4CAA-B0D0-0DF2F5529F1B@lca.pw>
Date:   Mon, 27 Jan 2020 22:12:58 -0500
From:   Qian Cai <cai@....pw>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     Marco Elver <elver@...gle.com>, Will Deacon <will@...nel.org>,
        Ingo Molnar <mingo@...hat.com>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        "paul E. McKenney" <paulmck@...nel.org>
Subject: Re: [PATCH] locking/osq_lock: fix a data race in osq_wait_next



> On Jan 23, 2020, at 4:36 AM, Peter Zijlstra <peterz@...radead.org> wrote:
> 
> On Wed, Jan 22, 2020 at 11:38:51PM +0100, Marco Elver wrote:
> 
>> If possible, decode and get the line numbers. I have observed a data
>> race in osq_lock before, however, this is the only one I have recently
>> seen in osq_lock:
>> 
>> read to 0xffff88812c12d3d4 of 4 bytes by task 23304 on cpu 0:
>>  osq_lock+0x170/0x2f0 kernel/locking/osq_lock.c:143
>> 
>> 	while (!READ_ONCE(node->locked)) {
>> 		/*
>> 		 * If we need to reschedule bail... so we can block.
>> 		 * Use vcpu_is_preempted() to avoid waiting for a preempted
>> 		 * lock holder:
>> 		 */
>> -->		if (need_resched() || vcpu_is_preempted(node_cpu(node->prev)))
>> 			goto unqueue;
>> 
>> 		cpu_relax();
>> 	}
>> 
>> where
>> 
>> 	static inline int node_cpu(struct optimistic_spin_node *node)
>> 	{
>> -->		return node->cpu - 1;
>> 	}
>> 
>> 
>> write to 0xffff88812c12d3d4 of 4 bytes by task 23334 on cpu 1:
>> osq_lock+0x89/0x2f0 kernel/locking/osq_lock.c:99
>> 
>> 	bool osq_lock(struct optimistic_spin_queue *lock)
>> 	{
>> 		struct optimistic_spin_node *node = this_cpu_ptr(&osq_node);
>> 		struct optimistic_spin_node *prev, *next;
>> 		int curr = encode_cpu(smp_processor_id());
>> 		int old;
>> 
>> 		node->locked = 0;
>> 		node->next = NULL;
>> -->		node->cpu = curr;
>> 
> 
> Yeah, that's impossible. This store happens before the node is
> published, so no matter how the load in node_cpu() is shattered, it must
> observe the right value.

Marco, any thought on how to do something about this? The worry is that
too many false positives like this will render the tool usefulness as a
general debug option.