lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87a5yuzvzd.ffs@tglx>
Date:   Wed, 26 Apr 2023 14:08:54 +0200
From:   Thomas Gleixner <tglx@...utronix.de>
To:     kernel test robot <yujie.liu@...el.com>,
        Shanker Donthineni <sdonthineni@...dia.com>
Cc:     oe-lkp@...ts.linux.dev, lkp@...el.com,
        linux-kernel@...r.kernel.org, Marc Zyngier <maz@...nel.org>,
        Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
        Michael Walle <michael@...le.cc>,
        Shanker Donthineni <sdonthineni@...dia.com>,
        Vikram Sethi <vsethi@...dia.com>
Subject: Re: [PATCH v3 3/3] genirq: Use the maple tree for IRQ descriptors
 management

On Tue, Apr 25 2023 at 11:16, kernel test robot wrote:
> kernel test robot noticed "WARNING:at_arch/x86/kernel/apic/ipi.c:#default_send_IPI_mask_logical" on:
>
> commit: 13eb5c4e7d2fb860d3dc5f63d910e3acf78dfd28 ("[PATCH v3 3/3] genirq: Use the maple tree for IRQ descriptors management")
> url: https://github.com/intel-lab-lkp/linux/commits/Shanker-Donthineni/genirq-Use-hlist-for-managing-resend-handlers/20230410-235853
> base: https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git 6f3ee0e22b4c62f44b8fa3c8de6e369a4d112a75
> patch link: https://lore.kernel.org/all/20230410155721.3720991-4-sdonthineni@nvidia.com/
> patch subject: [PATCH v3 3/3] genirq: Use the maple tree for IRQ
> descriptors management

This happens during CPU hot-unplug.

[  206.930774][  T228] block/008 => sdb2 (do IO while hotplugging CPUs)            
[  206.935757][ T2086] run blktests block/008 at 2023-04-22 16:27:25
[  207.199359][ T2086] smpboot: CPU 2 is now offline

[  207.468574][   T30] WARNING: CPU: 3 PID: 30 at arch/x86/kernel/apic/ipi.c:299 default_send_IPI_mask_logical+0x40/0x44
[  207.568426][   T30] CPU: 3 PID: 30 Comm: migration/3 Tainted: G S          E      6.2.0-rc4-00051-g13eb5c4e7d2f #1
[  207.588372][   T30] Stopper: multi_cpu_stop+0x0/0xf0 <- stop_machine_cpuslocked+0xf5/0x138
[  207.596649][   T30] EIP: default_send_IPI_mask_logical+0x40/0x44

This warns because fixup_irqs() sends an IPI to an offline CPU. In this
case to CPU3 which just cleared its online bit and is about to vanish:

[  207.622147][   T30] EAX: 00000008 EBX: 00000002 ECX: fffffffc EDX: 00000022

EAX contains the target and ECX the inverted online mask. That's
probably the ata2 interrupt as that later detects a timeout:

[  238.826212][  T174] ata2.00: exception Emask 0x0 SAct 0x3c00000 SErr 0x0 action 0x6 frozen
[  238.834522][  T174] ata2.00: failed command: READ FPDMA QUEUED
[  238.840378][  T174] ata2.00: cmd 60/08:b0:90:3e:90/00:00:25:00:00/40 tag 22 ncq dma 4096 in
[  238.840378][  T174]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)

Which means that migrating the interrupt away from the outgoing CPU3
failed for yet to understand reasons.

The patch in question is changing the interrupt descriptor storage and
with that also the iterator function. But I can't spot anything wrong
right now.

But what I can spot is this:

[    0.000000][    T0] Linux version 6.2.0-rc4-00051-g13eb5c4e7d2f

IOW, that test is based on some random upstream version, which lacks
about 30 commits to maple_tree, where 12 of them have 'fix' in the
commit subject.

Can you please retest this on v6.3 and report back when the problem
persists?

Thanks,

        tglx


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ