lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <xfzcvv6ezleds24wvha2apkz5kirhcmoydm3on2hnfrxcwuc3g@koj6plovnvbd>
Date: Tue, 14 Oct 2025 08:54:49 +0800
From: Inochi Amaoto <inochiama@...il.com>
To: Genes Lists <lists@...ience.com>, Inochi Amaoto <inochiama@...il.com>, 
	Jens Axboe <axboe@...nel.dk>, linux-block@...r.kernel.org, linux-kernel@...r.kernel.org, 
	linux-nvme@...ts.infradead.org
Cc: linux-pci@...r.kernel.org
Subject: Re: mainline boot fail nvme/block? [BISECTED]

On Mon, Oct 13, 2025 at 07:45:05AM -0400, Genes Lists wrote:
> On Mon, 2025-10-13 at 16:46 +0800, Inochi Amaoto wrote:
> > On Fri, Oct 10, 2025 at 07:49:34PM -0400, Genes Lists wrote:
> > > On Fri, 2025-10-10 at 08:54 -0600, Jens Axboe wrote:
> > > > On 10/10/25 8:29 AM, Genes Lists wrote:
> > > > > Mainline fails to boot - 6.17.1 works fine.
> > > > > Same kernel on an older laptop without any nvme works just
> > > > > fine.
> > > > > 
> > > > > It seems to get stuck enumerating disks within the initramfs
> > > > > created by
> > > > > dracut.
> > > > > 
> > > > > ,
> ...
> 
> > > Bisect landed here. (cc linux-pci@...r.kernel.org)
> > > Hopefully it is helpful, even though I don't see MSI in lspci
> > > output
> > > (which is provided below).
> > > 
> > > gene
> > > 
> > > 
> > > 54f45a30c0d0153d2be091ba2d683ab6db6d1d5b is the first bad commit
> > > commit 54f45a30c0d0153d2be091ba2d683ab6db6d1d5b (HEAD)
> > > Author: Inochi Amaoto <inochiama@...il.com>
> > > Date:   Thu Aug 14 07:28:32 2025 +0800
> > > 
> > >     PCI/MSI: Add startup/shutdown for per device domains
> > > 
> > >     As the RISC-V PLIC cannot apply affinity settings without
> > > invoking
> > >     irq_enable(), it will make the interrupt unavailble when used
> > > as an
> > >     underlying interrupt chip for the MSI controller.
> > > 
> > >     Implement the irq_startup() and irq_shutdown() callbacks for
> > > the
> > > PCI MSI
> > >     and MSI-X templates.
> > > 
> > >     For chips that specify MSI_FLAG_PCI_MSI_STARTUP_PARENT, the
> > > parent
> > > startup
> > >     and shutdown functions are invoked. That allows the interrupt
> > > on
> > > the parent
> > >     chip to be enabled if the interrupt has not been enabled during
> > >     allocation. This is necessary for MSI controllers which use
> > > PLIC as
> > >     underlying parent interrupt chip.
> > > 
> > >     Suggested-by: Thomas Gleixner <tglx@...utronix.de>
> > >     Signed-off-by: Inochi Amaoto <inochiama@...il.com>
> > >     Signed-off-by: Thomas Gleixner <tglx@...utronix.de>
> > >     Tested-by: Chen Wang <unicorn_wang@...look.com> # Pioneerbox
> > >     Reviewed-by: Chen Wang <unicorn_wang@...look.com>
> > >     Acked-by: Bjorn Helgaas <bhelgaas@...gle.com>
> > >     Link: https://lore.kernel.org/all/20250813232835.43458-3-
> > > inochiama@...il.com
> > > 
> > >  drivers/pci/msi/irqdomain.c | 52
> > > ++++++++++++++++++++++++++++++++++++++++++++++++++++
> > >  include/linux/msi.h         |  2 ++
> > >  2 files changed, 54 insertions(+)
> > > 
> > > 
> ...
> 
> > 
> > 
> > I think this is caused by VMD device, which I have a temporary
> > solution
> > here [1]. Since I have no idea about how VMD works, I hope if anyone
> > can help to convert this as an formal fix.
> > 
> > [1]
> > https://lore.kernel.org/all/qs2vydzm6xngul77xuwjli7h757gzfhmb4siiklzo
> > gihz5oplw@...gn75lib6t/
> > 
> > Regards,
> > Inochi
> 
> Thank you Inochi
> 
> I tried this patch over 6.18-rc1.
> 
>  It get's further than without the patch but around the time I get
> prompted for passphrase for the luks partition
> (root is not encrypted) it crashes. 
> 
> I have uploaded 2 images I took of the screen when this happens and
> uploaded them to here:
> 
>     https://0x0.st/KSNz.jpg
>     https://0x0.st/KSNi.jpg
> 

This picture is only a WARNING from perf_get_x86_pmu_capability,
and no other information. So I am not sure whether it is caused
by this change. But from the original report I have, it solves
the problem at that time.

By the way, can you test the following change?
https://lore.kernel.org/all/2hyxqqdootjw5yepbimacuuapfsf26c5mmu5w2jsdmamxvsjdq@gnibocldkuz5/

If it is OK, I will send a patch for it.

Regards,
Inochi

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ