[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aKi-oXTf0RphLLgn@google.com>
Date: Fri, 22 Aug 2025 12:01:53 -0700
From: Brian Norris <briannorris@...omium.org>
To: Guenter Roeck <linux@...ck-us.net>
Cc: Thomas Gleixner <tglx@...utronix.de>, David Gow <davidgow@...gle.com>,
linux-kernel@...r.kernel.org, kunit-dev@...glegroups.com
Subject: Re: [PATCH 0/6] genirq/test: Platform/architecture fixes
On Fri, Aug 22, 2025 at 11:34:04AM -0700, Guenter Roeck wrote:
> On 8/21/25 12:06, Brian Norris wrote:
> > On Thu, Aug 21, 2025 at 10:02:52AM -0700, Guenter Roeck wrote:
> > > Build results:
> > > total: 162 pass: 162 fail: 0
> > > Qemu test results:
> > > total: 637 pass: 637 fail: 0
> > > Unit test results:
> > > pass: 640616 fail: 13
> > > Failed unit tests:
> > > arm64:imx8mp-evk:irq_cpuhotplug_test
> > > arm64:imx8mp-evk:irq_test_cases
> > > m68k:q800:irq_test_cases
> > > m68k:virt:irq_test_cases
> > >
> > > Individual failures:
> > >
> > > [ 32.613761] # irq_cpuhotplug_test: EXPECTATION FAILED at kernel/irq/irq_test.c:210
> > > [ 32.613761] Expected remove_cpu(1) == 0, but
> > > [ 32.613761] remove_cpu(1) == -16 (0xfffffffffffffff0)
> > > [ 32.621522] # irq_cpuhotplug_test: EXPECTATION FAILED at kernel/irq/irq_test.c:212
> > > [ 32.621522] Expected add_cpu(1) == 0, but
> > > [ 32.621522] add_cpu(1) == 1 (0x1)
> > > [ 32.630930] # irq_cpuhotplug_test: pass:0 fail:1 skip:0 total:1
> >
> > I managed to get an imx8mp-evk setup running (both little and big
> > endian) and couldn't reproduce. But I'm guessing based on the logs that
> > we're racing with pci_call_probe(), which disables CPU hotplug
> > (cpu_hotplug_disable()) for its duration.
> >
> > I'm not sure how to handle that.
> >
> > 1. I could just SKIP the test on EBUSY. But that'd make for flaky test
> > coverage.
> > 2. Expose some method to block cpu_hotplug_disable() users temporarily.
> > 3. Stop trying to do CPU hotplug in a unit test. (It's bordering on
> > "integration test"; but it's still useful IMO...)
> > 4. Add an EBUSY retry loop? Or some other similar polling (if we had,
> > say, a cpu_hotplug_disabled() API).
Ah, I see that add_cpu() (cpu_subsys_online()) already has an -EBUSY
retry loop, but remove_cpu() doesn't. So #4 seems like a good solution.
It might even make sense to retry in cpu_subsys_offline(), rather than
just in the test.
I'll give this some thought for later though.
> Here is an additional data point: It only happens with big endian tests.
> This always happens in my setup, and it only happens when booting from
> virtio-pci but not when booting from other devices.
>
> I just re-ran the test and it passed this time, so this is apparently
> a flake. I'd suggest to ignore it for now. If I see it again and find
> a clean way to reproduce it we can have another look. The emulated PCIe
> controller for imx8mp-evk isn't exactly stable, so this may just be a side
> effect of emulation problems.
This furthers my suspicion that it's a race with PCIe probing. On the
failure case, the test is running right after some PCI scan logs.
But I'm fine deferring for now, since it's not very reproducible.
Brian
Powered by blists - more mailing lists