linux-kernel - Re: [PATCH] locking/pvqspinlock: Use try_cmpxchg() in pv

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <9b76aa09-532f-e2c5-8cf8-c732a079c5eb@loongson.cn>
Date: Fri, 27 Dec 2024 10:00:07 +0800
From: bibo mao <maobibo@...ngson.cn>
To: Waiman Long <llong@...hat.com>, Ingo Molnar <mingo@...hat.com>,
 Will Deacon <will@...nel.org>
Cc: Boqun Feng <boqun.feng@...il.com>, linux-kernel@...r.kernel.org,
 Akira Yokosawa <akiyks@...il.com>
Subject: Re: [PATCH] locking/pvqspinlock: Use try_cmpxchg() in pv_unhash



On 2024/12/26 上午12:04, Waiman Long wrote:
> On 12/23/24 2:47 AM, Bibo Mao wrote:
>> We ported pv spinlock to old linux kernel on LoongArch platform, there is
> What old kernel are you using? Race condition like that should be hard 
> to reproduce. Do you hit this bug only once?
>> error with some stress tests. The error report is something like this for
>> short:
>>   kernel BUG at kernel/locking/qspinlock_paravirt.h:261!
>>   Oops - BUG[#1]:
>>   CPU: 1 PID: 6613 Comm: pidof Not tainted 4.19.190+ #43
>>   Hardware name: Loongson KVM, BIOS 0.0.0 02/06/2015
>>      ra: 9000000000509cfc do_task_stat+0x29c/0xaf0
>>     ERA: 9000000000291308 __pv_queued_spin_unlock_slowpath+0xf8/0x100
>>    CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE)
>>    PRMD: 00000000 (PPLV0 -PIE -PWE)
>>           ...
>>   Call Trace:
>>   [<9000000000291308>] __pv_queued_spin_unlock_slowpath+0xf8/0x100
>>   [<9000000000509cf8>] do_task_stat+0x298/0xaf0
>>   [<9000000000502570>] proc_single_show+0x60/0xe0
>>
>> The problem is that memory accessing is out of order on LoongArch
>> platform, there is contension between pv_unhash() and pv_hash().
>>
>> CPU0 pv_unhash:                                CPU1 pv_hash:
>>
>>    for_each_hash_entry(he, offset, hash) {        
>> for_each_hash_entry(he, offset, hash) {
>>      if (READ_ONCE(he->lock) == lock) {             struct qspinlock 
>> *old = NULL;
>>        node = READ_ONCE(he->node);
>>        WRITE_ONCE(he->lock, NULL);
>>
>> On LoongArch platform which is out of order, the execution order may be
>> switched like this:
>>>       WRITE_ONCE(he->lock, NULL);
>>                                                      if 
>> (try_cmpxchg(&he->lock, &old, lock)) {
>>                                                        
>> WRITE_ONCE(he->node, node);
>>                                                        return &he->lock;
>>
>> CPU1 pv_hash() is executing and watch that lock is set with NULL. Write
>> he->node with node of new lock.
>>>       node = READ_ONCE(he->node);
>> READ_ONCE(he->node) on CPU0 will return node of new lock rather than 
>> itself.
> The pv_hash_entry structure is supposed to be cacheline aligned. The 
> use  of READ_ONCE/WRITE_ONCE will ensure that compiler won't rearrange 
> the ordering. If the CPU can rearrange read/write ordering on the same 
> cacheline like that, it may be some advance optimization technique that 
> I don't  quite understand. Can you check if the buffer returned by 
> alloc_large_system_hash() is really properly aligned?
Yes, you are right. After communicating with HW guys, WRITE_ONCE will 
not surpass over READ_ONCE, WRITE_ONCE can only surpass WRITE_ONCE.

And buffer allocated by alloc_large_system_hash() is cache aligned also,
the problem may be LoongArch VM specific, I am investigating the 
detailed reason.

>>
>> Here READ_ONCE/WRITE_ONCE is replaced with try_cmpxchg().
>>
>> Signed-off-by: Bibo Mao <maobibo@...ngson.cn>
>> ---
>>   kernel/locking/qspinlock_paravirt.h | 5 +++--
>>   1 file changed, 3 insertions(+), 2 deletions(-)
>>
>> diff --git a/kernel/locking/qspinlock_paravirt.h 
>> b/kernel/locking/qspinlock_paravirt.h
>> index dc1cb90e3644..cc4eabce092d 100644
>> --- a/kernel/locking/qspinlock_paravirt.h
>> +++ b/kernel/locking/qspinlock_paravirt.h
>> @@ -240,9 +240,10 @@ static struct pv_node *pv_unhash(struct qspinlock 
>> *lock)
>>       struct pv_node *node;
>>       for_each_hash_entry(he, offset, hash) {
>> -        if (READ_ONCE(he->lock) == lock) {
>> +        struct qspinlock *old = lock;
>> +
>> +        if (try_cmpxchg(&he->lock, &old, NULL)) {
>>               node = READ_ONCE(he->node);
>> -            WRITE_ONCE(he->lock, NULL);
>>               return node;
>>           }
>>       }
> 
> Anyway, this change isn't quite right as suggested by Akira.
This change is abuse usage about try_cmpxchg(), the origin code maybe 
has no problem, I am testing again.

Sorry for the noise and happy new year holiday :)

Regards
Bibo Mao

> 
> Cheers,
> Longman
> 
>>
>> base-commit: 48f506ad0b683d3e7e794efa60c5785c4fdc86fa