linux-kernel - Re: [czhong@...hat.com: [bug report] WARNING: CPU: 121 PID: 93233 at fs/dcache.c:365 __dentry

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a6b10684-39ee-960a-10ab-663746800f85@huawei.com>
Date:   Mon, 18 Sep 2023 09:52:28 +0800
From:   Baokun Li <libaokun1@...wei.com>
To:     Peter Zijlstra <peterz@...radead.org>
CC:     Yi Zhang <yi.zhang@...hat.com>, Ming Lei <ming.lei@...hat.com>,
        <mark.rutland@....com>, Christian Brauner <brauner@...nel.org>,
        <linux-fsdevel@...r.kernel.org>,
        Alexander Viro <viro@...iv.linux.org.uk>,
        <linux-kernel@...r.kernel.org>, <linux-scsi@...r.kernel.org>,
        Changhui Zhong <czhong@...hat.com>,
        yangerkun <yangerkun@...wei.com>,
        "zhangyi (F)" <yi.zhang@...wei.com>,
        Kees Cook <keescook@...omium.org>,
        chengzhihao <chengzhihao1@...wei.com>,
        Baokun Li <libaokun1@...wei.com>
Subject: Re: [czhong@...hat.com: [bug report] WARNING: CPU: 121 PID: 93233 at
 fs/dcache.c:365 __dentry_kill+0x214/0x278]

On 2023/9/17 17:26, Peter Zijlstra wrote:
> On Sun, Sep 17, 2023 at 11:10:32AM +0200, Peter Zijlstra wrote:
>> On Sat, Sep 16, 2023 at 02:55:47PM +0800, Baokun Li wrote:
>>> On 2023/9/13 16:59, Yi Zhang wrote:
>>>> The issue still can be reproduced on the latest linux tree[2].
>>>> To reproduce I need to run about 1000 times blktests block/001, and
>>>> bisect shows it was introduced with commit[1], as it was not 100%
>>>> reproduced, not sure if it's the culprit?
>>>>
>>>>
>>>> [1] 9257959a6e5b locking/atomic: scripts: restructure fallback ifdeffery
>>> Hello, everyone！
>>>
>>> We have confirmed that the merge-in of this patch caused hlist_bl_lock
>>> (aka, bit_spin_lock) to fail, which in turn triggered the issue above.
>>> [root@...alhost ~]# insmod mymod.ko
>>> [   37.994787][  T621] >>> a = 725, b = 724
>>> [   37.995313][  T621] ------------[ cut here ]------------
>>> [   37.995951][  T621] kernel BUG at fs/mymod/mymod.c:42!
>>> [r[  oo 3t7@...96o4c61al]h[o s T6t21] ~ ]#Int ernal error: Oops - BUG:
>>> 00000000f2000800 [#1] SMP
>>> [   37.997420][  T621] Modules linked in: mymod(E)
>>> [   37.997891][  T621] CPU: 9 PID: 621 Comm: bl_lock_thread2 Tainted:
>>> G            E      6.4.0-rc2-00034-g9257959a6e5b-dirty #117
>>> [   37.999038][  T621] Hardware name: linux,dummy-virt (DT)
>>> [   37.999571][  T621] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS
>>> BTYPE=--)
>>> [   38.000344][  T621] pc : increase_ab+0xcc/0xe70 [mymod]
>>> [   38.000882][  T621] lr : increase_ab+0xcc/0xe70 [mymod]
>>> [   38.001416][  T621] sp : ffff800008b4be40
>>> [   38.001822][  T621] x29: ffff800008b4be40 x28: 0000000000000000 x27:
>>> 0000000000000000
>>> [   38.002605][  T621] x26: 0000000000000000 x25: 0000000000000000 x24:
>>> 0000000000000000
>>> [   38.003385][  T621] x23: ffffd9930c698190 x22: ffff800008a0ba38 x21:
>>> 0000000000000001
>>> [   38.004174][  T621] x20: ffffffffffffefff x19: ffffd9930c69a580 x18:
>>> 0000000000000000
>>> [   38.004955][  T621] x17: 0000000000000000 x16: ffffd9933011bd38 x15:
>>> ffffffffffffffff
>>> [   38.005754][  T621] x14: 0000000000000000 x13: 205d313236542020 x12:
>>> ffffd99332175b80
>>> [   38.006538][  T621] x11: 0000000000000003 x10: 0000000000000001 x9 :
>>> ffffd9933022a9d8
>>> [   38.007325][  T621] x8 : 00000000000bffe8 x7 : c0000000ffff7fff x6 :
>>> ffffd993320b5b40
>>> [   38.008124][  T621] x5 : ffff0001f7d1c708 x4 : 0000000000000000 x3 :
>>> 0000000000000000
>>> [   38.008912][  T621] x2 : 0000000000000000 x1 : 0000000000000000 x0 :
>>> 0000000000000015
>>> [   38.009709][  T621] Call trace:
>>> [   38.010035][  T621]  increase_ab+0xcc/0xe70 [mymod]
>>> [   38.010539][  T621]  kthread+0xdc/0xf0
>>> [   38.010927][  T621]  ret_from_fork+0x10/0x20
>>> [   38.011370][  T621] Code: 17ffffe0 90000020 91044000 9400000d (d4210000)
>>> [   38.012067][  T621] ---[ end trace 0000000000000000 ]---
>> Is this arm64 or something? You seem to have forgotten to mention what
>> platform you're using.
> Is that an LSE or LLSC arm64 ?

I'm not sure how to distinguish if it's LSE or LLSC, here's some info on 
the cpu:

$ cat /sys/devices/system/cpu/cpu0/regs/identification/midr_el1
0x00000000481fd010

$ lscpu
Architecture:        aarch64
Byte Order:          Little Endian
CPU(s):              96
On-line CPU(s) list: 0-95
Thread(s) per core:  1
Core(s) per socket:  48
Socket(s):           2
NUMA node(s):        4
Vendor ID:           HiSilicon
BIOS Vendor ID:      HiSilicon
Model:               0
Model name:          Kunpeng-920
BIOS Model name:     Kunpeng 920-4826
Stepping:            0x1
BogoMIPS:            200.00
L1d cache:           64K
L1i cache:           64K
L2 cache:            512K
L3 cache:            49152K
NUMA node0 CPU(s):   0-23
NUMA node1 CPU(s):   24-47
NUMA node2 CPU(s):   48-71
NUMA node3 CPU(s):   72-95
Flags:               fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics 
fphp asimdhp cpuid asimdrdm jscvt fcma dcpop asimddp asimdfhm

> Anyway, it seems that ARM64 shouldn't be using the fallback as it does
> everything itself.
>
> Mark, can you have a look please? At first glance the
> atomic64_fetch_or_acquire() that's being used by generic bitops/lock.h
> seems in order..
>
We also suspect some implicit mechanism change in
raw_atomic64_fetch_or_acquire. You can reproduce the problem with the
above mod that can reproduce the problem to make it easier to locate.
I can help reproduce it and grab some information if you can't reproduce
it on your end.

-- 
With Best Regards,
Baokun Li
.