[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <0ad9d0db-df2f-4e35-b53c-ed23cb2dc42d@roeck-us.net>
Date: Mon, 5 Aug 2024 10:42:53 -0700
From: Guenter Roeck <linux@...ck-us.net>
To: Thomas Gleixner <tglx@...utronix.de>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>, stable@...r.kernel.org
Cc: patches@...ts.linux.dev, linux-kernel@...r.kernel.org,
torvalds@...ux-foundation.org, akpm@...ux-foundation.org, shuah@...nel.org,
patches@...nelci.org, lkft-triage@...ts.linaro.org, pavel@...x.de,
jonathanh@...dia.com, f.fainelli@...il.com, sudipm.mukherjee@...il.com,
srw@...dewatkins.net, rwarsow@....de, conor@...nel.org,
allen.lkml@...il.com, broonie@...nel.org,
"Rafael J. Wysocki" <rafael.j.wysocki@...el.com>,
Helge Deller <deller@....de>, Parisc List <linux-parisc@...r.kernel.org>
Subject: Re: [PATCH 6.10 000/809] 6.10.3-rc3 review
On 8/5/24 01:56, Thomas Gleixner wrote:
> On Sun, Aug 04 2024 at 20:28, Guenter Roeck wrote:
>> On 8/4/24 11:36, Guenter Roeck wrote:
>>>> Rafael J. Wysocki <rafael.j.wysocki@...el.com>
>>>> genirq: Set IRQF_COND_ONESHOT in request_irq()
>>>>
>>>
>>> With this patch in v6.10.3, all my parisc64 qemu tests get stuck with repeated error messages
>>>
>>> [ 0.000000] =============================================================================
>>> [ 0.000000] BUG kmem_cache_node (Not tainted): objects 21 > max 16
>>> [ 0.000000] -----------------------------------------------------------------------------
>
> Do you have a full boot log? It's unclear to me at which point of the boot
> process this happens. Is this before or after the secondary CPUs have
> been brought up?
>
>>> This never stops until the emulation aborts.
>
> Do you have a recipe how to reproduce?
>
>>> Reverting this patch fixes the problem for me.
>>>
>>> I noticed a similar problem in the mainline kernel but it is either spurious there
>>> or the problem has been fixed.
>>>
>>
>> As a follow-up, the patch below (on top of v6.10.3) "fixes" the problem for me.
>> I guess that suggests some kind of race condition.
>>
>>
>> @@ -2156,6 +2157,8 @@ int request_threaded_irq(unsigned int irq, irq_handler_t handler,
>> struct irq_desc *desc;
>> int retval;
>>
>> + udelay(1);
>> +
>> if (irq == IRQ_NOTCONNECTED)
>> return -ENOTCONN;
>
> That all makes absolutely no sense to me.
>
Same here, really. I can reproduce the problem with v6.10.3, using my configuration,
but whatever debugging I add makes the problem disappear. I had seen the same problem
on mainline with v6.11-rc1-272-g17712b7ea075. Log is at
https://kerneltests.org/builders/qemu-parisc64-master/builds/168/steps/qemubuildcommand/logs/stdio
However, I can no longer reproduce it there. What makes it even more weird / odd
is that I can bisect the problem between v6.10.2 and v6.10.3 and it points to this
commit, but reproducing it outside that chain seems to be all but impossible.
Guenter
> IRQF_COND_ONESHOT has only an effect on shared interrupts, when the
> interrupt was already requested with IRQF_ONESHOT.
>
> If this is really a race then the following must be true:
>
> 1) no delay
>
> CPU0 CPU1
> request_irq(IRQF_ONESHOT)
> request_irq(IRQF_COND_ONESHOT)
>
> 2) delay
>
> CPU0 CPU1
> request_irq(IRQF_COND_ONESHOT)
> request_irq(IRQF_ONESHOT)
>
> In this case the request on CPU 0 fails with -EBUSY ...
>
> Confused
>
> tglx
>
>
Powered by blists - more mailing lists