linux-kernel - Re: [PATCH] sparc/pci: Make pci_poke_lock a raw_spinlock

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <72991b83-173e-492e-a4aa-5049304c1bd0@roeck-us.net>
Date: Mon, 25 Nov 2024 11:23:18 -0800
From: Guenter Roeck <linux@...ck-us.net>
To: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
Cc: sparclinux@...r.kernel.org, linux-kernel@...r.kernel.org,
 Boqun Feng <boqun.feng@...il.com>, Ingo Molnar <mingo@...hat.com>,
 Peter Zijlstra <peterz@...radead.org>, Thomas Gleixner <tglx@...utronix.de>,
 Waiman Long <longman@...hat.com>, Will Deacon <will@...nel.org>,
 "David S. Miller" <davem@...emloft.net>,
 Andreas Larsson <andreas@...sler.com>
Subject: Re: [PATCH] sparc/pci: Make pci_poke_lock a raw_spinlock_t.

On 11/25/24 10:12, Sebastian Andrzej Siewior wrote:
> On 2024-11-25 09:59:09 [-0800], Guenter Roeck wrote:
>> On 11/25/24 09:43, Sebastian Andrzej Siewior wrote:
>>> On 2024-11-25 09:01:33 [-0800], Guenter Roeck wrote:
>>>> Unfortunately it doesn't make a difference.
>>>
>>> stunning. It looks like the exact same error message.
>>>
>>
>> I think it uses
>>
>> #define spin_lock_irqsave(lock, flags)                          \
>> do {                                                            \
>>          raw_spin_lock_irqsave(spinlock_check(lock), flags);     \
>> } while (0)
>>
>> from include/linux/spinlock.h, meaning your patch doesn't really make a difference.
> 
> The difference comes from DEFINE_SPINLOCK vs DEFINE_RAW_SPINLOCK. There
> is the .lock_type init which goes from LD_WAIT_CONFIG to LD_WAIT_SPIN.
> And this is all it matters.
> 

Ah, now I get it. Thanks for the explanation. And it turns out my log was wrong.
I must have taken it from the old image. Sorry for that.

That specific backtrace isn't seen anymore. But there is another one.

[    1.779653] =============================
[    1.779860] [ BUG: Invalid wait context ]
[    1.780139] 6.12.0+ #1 Not tainted
[    1.780394] -----------------------------
[    1.780600] swapper/0/1 is trying to lock:
[    1.780824] 0000000001b68888 (cpu_map_lock){....}-{3:3}, at: map_to_cpu+0x10/0x80
[    1.781393] other info that might help us debug this:
[    1.781624] context-{5:5}
[    1.781838] 3 locks held by swapper/0/1:
[    1.782055]  #0: fffff800042b90f8 (&dev->mutex){....}-{4:4}, at: __driver_attach+0x80/0x160
[    1.782345]  #1: fffff800040f2c18 (&desc->request_mutex){+.+.}-{4:4}, at: __setup_irq+0xa0/0x6e0
[    1.782632]  #2: fffff800040f2ab0 (&irq_desc_lock_class){....}-{2:2}, at: __setup_irq+0xc8/0x6e0
[    1.782912] stack backtrace:
[    1.783172] CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.12.0+ #1
[    1.783498] Call Trace:
[    1.783734] [<00000000004e31d0>] __lock_acquire+0xa50/0x3160
[    1.783971] [<00000000004e63e8>] lock_acquire+0xe8/0x340
[    1.784191] [<00000000010f0dbc>] _raw_spin_lock_irqsave+0x3c/0x80
[    1.784417] [<000000000043ed90>] map_to_cpu+0x10/0x80
[    1.784633] [<000000000042b2b8>] sun4u_irq_enable+0x18/0x80
[    1.784854] [<00000000004fb6b4>] irq_enable+0x34/0xc0
[    1.785069] [<00000000004fb7b8>] __irq_startup+0x78/0xe0
[    1.785287] [<00000000004fb8f0>] irq_startup+0xd0/0x1a0
[    1.785503] [<00000000004f85b4>] __setup_irq+0x5f4/0x6e0
[    1.785726] [<00000000004f8754>] request_threaded_irq+0xb4/0x1a0
[    1.785950] [<0000000000439930>] power_probe+0x70/0xe0
[    1.786165] [<0000000000c13a68>] platform_probe+0x28/0x80
[    1.786382] [<0000000000c11178>] really_probe+0xb8/0x340
[    1.786599] [<0000000000c115a4>] driver_probe_device+0x24/0xe0
[    1.786820] [<0000000000c117cc>] __driver_attach+0x8c/0x160
[    1.787039] [<0000000000c0ef74>] bus_for_each_dev+0x54/0xc0

After replacing cpu_map_lock with a raw spinlock, I get:

[    2.015140] =============================
[    2.015247] [ BUG: Invalid wait context ]
[    2.015419] 6.12.0+ #1 Not tainted
[    2.015564] -----------------------------
[    2.015668] swapper/0/1 is trying to lock:
[    2.015791] fffff80004870610 (&mm->context.lock){....}-{3:3}, at: __schedule+0x410/0x5b0
[    2.016306] other info that might help us debug this:
[    2.016451] context-{5:5}
[    2.016539] 3 locks held by swapper/0/1:
[    2.016652]  #0: 0000000001d11f38 (key_types_sem){++++}-{4:4}, at: __key_create_or_update+0x5c/0x4c0
[    2.016934]  #1: 0000000001d1b850 (asymmetric_key_parsers_sem){++++}-{4:4}, at: asymmetric_key_preparse+0x18/0xa0
[    2.017197]  #2: fffff8001f811a98 (&rq->__lock){-.-.}-{2:2}, at: __schedule+0xdc/0x5b0
[    2.017412] stack backtrace:
[    2.017551] CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.12.0+ #1
[    2.017800] Call Trace:
[    2.017910] [<00000000004e31d0>] __lock_acquire+0xa50/0x3160
[    2.018062] [<00000000004e63e8>] lock_acquire+0xe8/0x340
[    2.018192] [<00000000010f0dbc>] _raw_spin_lock_irqsave+0x3c/0x80
[    2.018341] [<00000000010e5050>] __schedule+0x410/0x5b0
[    2.018469] [<00000000010e5ae4>] schedule+0x44/0x1c0
[    2.018591] [<00000000010f0684>] schedule_timeout+0xa4/0x100
[    2.018730] [<00000000010e668c>] __wait_for_common+0xac/0x1a0
[    2.018869] [<00000000010e6878>] wait_for_completion_state+0x18/0x40
[    2.019022] [<000000000048ad18>] call_usermodehelper_exec+0x138/0x1c0
[    2.019177] [<000000000052eb40>] __request_module+0x160/0x2e0
[    2.019316] [<00000000009ba6dc>] crypto_alg_mod_lookup+0x17c/0x280
[    2.019466] [<00000000009ba990>] crypto_alloc_tfm_node+0x30/0x100
[    2.019614] [<00000000009dcc5c>] public_key_verify_signature+0xbc/0x260
[    2.019772] [<00000000009ded8c>] x509_check_for_self_signed+0xac/0x280
[    2.019928] [<00000000009dddec>] x509_cert_parse+0x14c/0x220
[    2.020065] [<00000000009dea08>] x509_key_preparse+0x8/0x1e0

The problem here is

typedef struct {
         spinlock_t              lock;		<--
         unsigned long           sparc64_ctx_val;
         unsigned long           hugetlb_pte_count;
         unsigned long           thp_pte_count;
         struct tsb_config       tsb_block[MM_NUM_TSBS];
         struct hv_tsb_descr     tsb_descr[MM_NUM_TSBS];
         void                    *vdso;
         bool                    adi;
         tag_storage_desc_t      *tag_store;
         spinlock_t              tag_lock;
} mm_context_t;

Replacing that with a raw spinlock just triggers the next one.

[    2.035384] =============================
[    2.035490] [ BUG: Invalid wait context ]
[    2.035660] 6.12.0+ #3 Not tainted
[    2.035802] -----------------------------
[    2.035906] kworker/u4:3/48 is trying to lock:
[    2.036036] 0000000001b6a790 (ctx_alloc_lock){....}-{3:3}, at: get_new_mmu_context+0x14/0x280
[    2.036558] other info that might help us debug this:
[    2.036697] context-{5:5}
[    2.036784] 4 locks held by kworker/u4:3/48:
[    2.036906]  #0: fffff80004838a70 (&sig->cred_guard_mutex){+.+.}-{4:4}, at: bprm_execve+0xc/0x8e0
[    2.037169]  #1: fffff80004838b08 (&sig->exec_update_lock){+.+.}-{4:4}, at: begin_new_exec+0x344/0xbe0
[    2.037411]  #2: fffff800047fc940 (&p->alloc_lock){+.+.}-{3:3}, at: begin_new_exec+0x3a0/0xbe0
[    2.037639]  #3: fffff80004848610 (&mm->context.lock){....}-{2:2}, at: begin_new_exec+0x41c/0xbe0

Fixing that finally gives me a clean run. Nevertheless, that makes me wonder:
Should I just disable CONFIG_PROVE_RAW_LOCK_NESTING for sparc runtime tests ?

Thanks,
Guenter