linux-kernel - Re: [PATCH 6.10 000/809] 6.10.3-rc3 review

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <dba1c62b-add6-45fc-9fc1-cc8755fb7a14@roeck-us.net>
Date: Mon, 5 Aug 2024 18:16:28 -0700
From: Guenter Roeck <linux@...ck-us.net>
To: Thomas Gleixner <tglx@...utronix.de>,
 Greg Kroah-Hartman <gregkh@...uxfoundation.org>, stable@...r.kernel.org
Cc: patches@...ts.linux.dev, linux-kernel@...r.kernel.org,
 torvalds@...ux-foundation.org, akpm@...ux-foundation.org, shuah@...nel.org,
 patches@...nelci.org, lkft-triage@...ts.linaro.org, pavel@...x.de,
 jonathanh@...dia.com, f.fainelli@...il.com, sudipm.mukherjee@...il.com,
 srw@...dewatkins.net, rwarsow@....de, conor@...nel.org,
 allen.lkml@...il.com, broonie@...nel.org,
 "Rafael J. Wysocki" <rafael.j.wysocki@...el.com>,
 Helge Deller <deller@....de>, Parisc List <linux-parisc@...r.kernel.org>
Subject: Re: [PATCH 6.10 000/809] 6.10.3-rc3 review

On 8/5/24 14:49, Thomas Gleixner wrote:
> On Mon, Aug 05 2024 at 08:02, Guenter Roeck wrote:
>> On 8/5/24 05:51, Thomas Gleixner wrote:
>>> IRQF_COND_ONESHOT has only an effect when
>>>
>>>       1) Interrupt is shared
>>>       2) First interrupt request has IRQF_ONESHOT set
>>>
>>> Neither #1 nor #2 are true, but maybe your current config enables some moar
>>> devices than the one on your website.
>>>
>>
>> No, it is pretty much the same, except for a more recent C compiler, and it
>> requires qemu v9.0. See http://server.roeck-us.net/qemu/parisc64-6.10.3/.
>>
>> Debugging shows pretty much the same for me, and any log message added
>> to request_irq() makes the problem go away (or be different), and if the problem
>> is seen it doesn't even get to the third interrupt request. I copied a more complete
>> log to bad.log.gz in above page.
> 
> At the point where the problem happens is way before the first interrupt
> is requested, so that definitely excludes any side effect of
> COND_ONESHOT.
> 
> It happens right after
> 
> [    0.000000] Memory: 991812K/1048576K available (16752K kernel code, 5220K rwdata, 3044K rodata, 760K init, 1424K bss, 56764K reserved, 0K cma-reserved)
>                 SNIP
> [    0.000000] ** This system shows unhashed kernel memory addresses   **
>                 SNIP
> 
> I.e. the big fat warning about unhashed kernel addresses)
> 
> In the good case the first interrupt is requested here:
> 
> [    0.000000] Memory: 992804K/1048576K available (16532K kernel code, 5096K rwdata, 2984K rodata, 744K init, 1412K bss, 55772K reserved, 0K cma-reserved)
>                 SNIP
> [    0.000000] ** This system shows unhashed kernel memory addresses   **
>                 SNIP
> [    0.000000] SLUB: HWalign=16, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
> [    0.000000] ODEBUG: selftest passed
> [    0.000000] rcu: Hierarchical RCU implementation.
> [    0.000000] rcu:     RCU event tracing is enabled.
> [    0.000000] rcu:     RCU callback double-/use-after-free debug is enabled.
> [    0.000000] rcu:     RCU debug extended QS entry/exit.
> [    0.000000] rcu: RCU calculated value of scheduler-enlistment delay is 25 jiffies.
> 
> [    0.000000] NR_IRQS: 128              <- This is where the interrupt
>                                              subsystem is initialized
> [    0.000000] genirq: 64: 00215600      <- This is probably the timer interrupt
> 
> Looking at the backtrace the fail happens from within start_kernel(),
> i.e. mm_core_init():
> 
>         slab_err+0x13c/0x190
>         check_slab+0xf4/0x140
>         alloc_debug_processing+0x58/0x248
>         ___slab_alloc+0x22c/0xa38
>         __slab_alloc.isra.0+0x60/0x88
>         kmem_cache_alloc_node_noprof+0x148/0x3c0
>         __kmem_cache_create+0x5d4/0x680
>         create_boot_cache+0xc4/0x128
>         new_kmalloc_cache+0x1ac/0x1d8
>         create_kmalloc_caches+0xac/0x108
>         kmem_cache_init+0x1cc/0x340
>         mm_core_init+0x458/0x560
>         start_kernel+0x9ac/0x11e0
>         start_parisc+0x188/0x1b0
> 
> The interrupt subsystem is initialized way later and request_irq() only
> works after that point.
> 
> I'm 100% sure by now that this commit has absolutely nothing to do with
> the underlying problem. All it does is changing the code size and
> placement of text, data and ....
> 
> So I finally managed to reproduce with gcc-13.3.0 from the k.org cross
> tools (the debian 12.2.2 does not).
> 
> If I add:
> 
> --- a/kernel/irq/manage.c
> +++ b/kernel/irq/manage.c
> @@ -1513,6 +1513,8 @@ static int
>   	unsigned long flags, thread_mask = 0;
>   	int ret, nested, shared = 0;
>   
> +	pr_info("%u: %08x\n", irq, new->flags);
> +
>   	if (!desc)
>   		return -EINVAL;
>   
> it still reproduces. If I change that to:
> 
> --- a/kernel/irq/manage.c
> +++ b/kernel/irq/manage.c
> @@ -1513,6 +1513,8 @@ static int
>   	unsigned long flags, thread_mask = 0;
>   	int ret, nested, shared = 0;
>   
> +	new->flags &= ~IRQF_COND_ONESHOT;
> +
>   	if (!desc)
>   		return -EINVAL;
>   
> that does neither cure it (unsurprisingly).
> 
> Reverting the "offending" commit cures it.
> 
> So I tried your
> 
>       +       pr_info_once("####### First timer interrupt\n");
> 
> which cures it too.
> 
> So now where is the difference between the printk() in __setup_irq(),
> the new->flag mangling, the revert and the printk() in timer interrupt?
> 
> There is ZERO functional change. There is no race either because at that
> point everything is single threaded and interrupts are disabled.
> 
> The only thing which changes is the code and data layout as I can
> observe when looking at System.map of the builds. I stared at the diffs
> for a bit, but nothing stood out.
> 
> Maybe the PARISC people can shed some light on it.
> 
> This reminds me of the x86 32-bit disaster we debugged a few days ago...
> 

Looks like it :-(

Thanks for checking !

Guenter