[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <9b76aa09-532f-e2c5-8cf8-c732a079c5eb@loongson.cn>
Date: Fri, 27 Dec 2024 10:00:07 +0800
From: bibo mao <maobibo@...ngson.cn>
To: Waiman Long <llong@...hat.com>, Ingo Molnar <mingo@...hat.com>,
Will Deacon <will@...nel.org>
Cc: Boqun Feng <boqun.feng@...il.com>, linux-kernel@...r.kernel.org,
Akira Yokosawa <akiyks@...il.com>
Subject: Re: [PATCH] locking/pvqspinlock: Use try_cmpxchg() in pv_unhash
On 2024/12/26 上午12:04, Waiman Long wrote:
> On 12/23/24 2:47 AM, Bibo Mao wrote:
>> We ported pv spinlock to old linux kernel on LoongArch platform, there is
> What old kernel are you using? Race condition like that should be hard
> to reproduce. Do you hit this bug only once?
>> error with some stress tests. The error report is something like this for
>> short:
>> kernel BUG at kernel/locking/qspinlock_paravirt.h:261!
>> Oops - BUG[#1]:
>> CPU: 1 PID: 6613 Comm: pidof Not tainted 4.19.190+ #43
>> Hardware name: Loongson KVM, BIOS 0.0.0 02/06/2015
>> ra: 9000000000509cfc do_task_stat+0x29c/0xaf0
>> ERA: 9000000000291308 __pv_queued_spin_unlock_slowpath+0xf8/0x100
>> CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE)
>> PRMD: 00000000 (PPLV0 -PIE -PWE)
>> ...
>> Call Trace:
>> [<9000000000291308>] __pv_queued_spin_unlock_slowpath+0xf8/0x100
>> [<9000000000509cf8>] do_task_stat+0x298/0xaf0
>> [<9000000000502570>] proc_single_show+0x60/0xe0
>>
>> The problem is that memory accessing is out of order on LoongArch
>> platform, there is contension between pv_unhash() and pv_hash().
>>
>> CPU0 pv_unhash: CPU1 pv_hash:
>>
>> for_each_hash_entry(he, offset, hash) {
>> for_each_hash_entry(he, offset, hash) {
>> if (READ_ONCE(he->lock) == lock) { struct qspinlock
>> *old = NULL;
>> node = READ_ONCE(he->node);
>> WRITE_ONCE(he->lock, NULL);
>>
>> On LoongArch platform which is out of order, the execution order may be
>> switched like this:
>>> WRITE_ONCE(he->lock, NULL);
>> if
>> (try_cmpxchg(&he->lock, &old, lock)) {
>>
>> WRITE_ONCE(he->node, node);
>> return &he->lock;
>>
>> CPU1 pv_hash() is executing and watch that lock is set with NULL. Write
>> he->node with node of new lock.
>>> node = READ_ONCE(he->node);
>> READ_ONCE(he->node) on CPU0 will return node of new lock rather than
>> itself.
> The pv_hash_entry structure is supposed to be cacheline aligned. The
> use of READ_ONCE/WRITE_ONCE will ensure that compiler won't rearrange
> the ordering. If the CPU can rearrange read/write ordering on the same
> cacheline like that, it may be some advance optimization technique that
> I don't quite understand. Can you check if the buffer returned by
> alloc_large_system_hash() is really properly aligned?
Yes, you are right. After communicating with HW guys, WRITE_ONCE will
not surpass over READ_ONCE, WRITE_ONCE can only surpass WRITE_ONCE.
And buffer allocated by alloc_large_system_hash() is cache aligned also,
the problem may be LoongArch VM specific, I am investigating the
detailed reason.
>>
>> Here READ_ONCE/WRITE_ONCE is replaced with try_cmpxchg().
>>
>> Signed-off-by: Bibo Mao <maobibo@...ngson.cn>
>> ---
>> kernel/locking/qspinlock_paravirt.h | 5 +++--
>> 1 file changed, 3 insertions(+), 2 deletions(-)
>>
>> diff --git a/kernel/locking/qspinlock_paravirt.h
>> b/kernel/locking/qspinlock_paravirt.h
>> index dc1cb90e3644..cc4eabce092d 100644
>> --- a/kernel/locking/qspinlock_paravirt.h
>> +++ b/kernel/locking/qspinlock_paravirt.h
>> @@ -240,9 +240,10 @@ static struct pv_node *pv_unhash(struct qspinlock
>> *lock)
>> struct pv_node *node;
>> for_each_hash_entry(he, offset, hash) {
>> - if (READ_ONCE(he->lock) == lock) {
>> + struct qspinlock *old = lock;
>> +
>> + if (try_cmpxchg(&he->lock, &old, NULL)) {
>> node = READ_ONCE(he->node);
>> - WRITE_ONCE(he->lock, NULL);
>> return node;
>> }
>> }
>
> Anyway, this change isn't quite right as suggested by Akira.
This change is abuse usage about try_cmpxchg(), the origin code maybe
has no problem, I am testing again.
Sorry for the noise and happy new year holiday :)
Regards
Bibo Mao
>
> Cheers,
> Longman
>
>>
>> base-commit: 48f506ad0b683d3e7e794efa60c5785c4fdc86fa
Powered by blists - more mailing lists