linux-kernel - Re: [PATCH 0/6] genirq/test: Platform/architecture fixes

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <aKi-oXTf0RphLLgn@google.com>
Date: Fri, 22 Aug 2025 12:01:53 -0700
From: Brian Norris <briannorris@...omium.org>
To: Guenter Roeck <linux@...ck-us.net>
Cc: Thomas Gleixner <tglx@...utronix.de>, David Gow <davidgow@...gle.com>,
	linux-kernel@...r.kernel.org, kunit-dev@...glegroups.com
Subject: Re: [PATCH 0/6] genirq/test: Platform/architecture fixes

On Fri, Aug 22, 2025 at 11:34:04AM -0700, Guenter Roeck wrote:
> On 8/21/25 12:06, Brian Norris wrote:
> > On Thu, Aug 21, 2025 at 10:02:52AM -0700, Guenter Roeck wrote:
> > > Build results:
> > > 	total: 162 pass: 162 fail: 0
> > > Qemu test results:
> > > 	total: 637 pass: 637 fail: 0
> > > Unit test results:
> > > 	pass: 640616 fail: 13
> > > Failed unit tests:
> > > 	arm64:imx8mp-evk:irq_cpuhotplug_test
> > > 	arm64:imx8mp-evk:irq_test_cases
> > > 	m68k:q800:irq_test_cases
> > > 	m68k:virt:irq_test_cases
> > > 
> > > Individual failures:
> > > 
> > > [   32.613761]     # irq_cpuhotplug_test: EXPECTATION FAILED at kernel/irq/irq_test.c:210
> > > [   32.613761]     Expected remove_cpu(1) == 0, but
> > > [   32.613761]         remove_cpu(1) == -16 (0xfffffffffffffff0)
> > > [   32.621522]     # irq_cpuhotplug_test: EXPECTATION FAILED at kernel/irq/irq_test.c:212
> > > [   32.621522]     Expected add_cpu(1) == 0, but
> > > [   32.621522]         add_cpu(1) == 1 (0x1)
> > > [   32.630930]     # irq_cpuhotplug_test: pass:0 fail:1 skip:0 total:1
> > 
> > I managed to get an imx8mp-evk setup running (both little and big
> > endian) and couldn't reproduce. But I'm guessing based on the logs that
> > we're racing with pci_call_probe(), which disables CPU hotplug
> > (cpu_hotplug_disable()) for its duration.
> > 
> > I'm not sure how to handle that.
> > 
> > 1. I could just SKIP the test on EBUSY. But that'd make for flaky test
> >     coverage.
> > 2. Expose some method to block cpu_hotplug_disable() users temporarily.
> > 3. Stop trying to do CPU hotplug in a unit test. (It's bordering on
> >     "integration test"; but it's still useful IMO...)
> > 4. Add an EBUSY retry loop? Or some other similar polling (if we had,
> >     say, a cpu_hotplug_disabled() API).

Ah, I see that add_cpu() (cpu_subsys_online()) already has an -EBUSY
retry loop, but remove_cpu() doesn't. So #4 seems like a good solution.
It might even make sense to retry in cpu_subsys_offline(), rather than
just in the test.

I'll give this some thought for later though.

> Here is an additional data point: It only happens with big endian tests.
> This always happens in my setup, and it only happens when booting from
> virtio-pci but not when booting from other devices.
> 
> I just re-ran the test and it passed this time, so this is apparently
> a flake. I'd suggest to ignore it for now. If I see it again and find
> a clean way to reproduce it we can have another look. The emulated PCIe
> controller for imx8mp-evk isn't exactly stable, so this may just be a side
> effect of emulation problems.

This furthers my suspicion that it's a race with PCIe probing. On the
failure case, the test is running right after some PCI scan logs.

But I'm fine deferring for now, since it's not very reproducible.

Brian