[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20231203122935.GA5986@workstation.local>
Date: Sun, 3 Dec 2023 21:29:35 +0900
From: Takashi Sakamoto <o-takashi@...amocchi.jp>
To: Mario Limonciello <mario.limonciello@....com>
Cc: a.mark.broadworth@...il.com, matthias.schrumpf@...enet.de,
LKML <linux-kernel@...r.kernel.org>, aros@....com,
bagasdotme@...il.com,
"open list:PCI SUBSYSTEM" <linux-pci@...r.kernel.org>,
Bjorn Helgaas <bhelgaas@...gle.com>,
Borislav Petkov <bp@...en8.de>
Subject: Re: Regression from dcadfd7f7c74ef9ee415e072a19bdf6c085159eb
Hi Mario,
Thanks for the advices.
I note that In my experiments I use Ubuntu 23.04 amd64 (v6.2 kernel) with
backported FireWire stack[1]. Except for the stack, the kernel and software
packages can be retrieved from repositories of Ubuntu project.
On Tue, Nov 28, 2023 at 12:09:41AM -0600, Mario Limonciello wrote:
> On 11/27/2023 23:24, Takashi Sakamoto wrote:
> > Hi Mario
> >
> > Following up on our last conversation, I purchase some hardware to
> > attempt to retrieve outputs from serial port. Finally, I bought another
> > mother board in used market which provides serial port from Super I/O
> > chip (ASUS TUF Gaming X570-Plus). However, I have retrieved no helpful
> > outputs yet when encountering the system reboot.
>
> Did you up the loglevel to 8 to make sure you'll get all kernel output on
> the serial port, not just errors?
Even if giving either 'debug' cmdline option or incrementing console
loglevel via syctl, I receive no useful output from console when loading
the module at or after booting up.
```
$ sysctl kernel.printk
kernel.printk = 7 7 1 7
```
I tried at several difference cases; enabling/disabling IOMMU,
enabling/disabling SVM in motherboard level. But nothing effective.
> > As you mentioned, I check whether PCIe AER is enabled or not in the
> > running kernel (Ubuntu 23.04 linux-image-6.2.0-37-generic). It is
> > certainly enabled, however I can see nothing in the output as I noted.
> >
> > I experienced extra troubles relevant to AMD Ryzen machine and the issued
> > PCIe device:
> >
> > * ASRock X570 Phantom Gaming 4 with AMD Ryzen 5 3600X does not detect
> > the card. We can see no corresponding entry in lspci.
> > * After associating the card to vfio-pci, lspci command can reboot the
> > system even if firewire-ohci driver is not loaded. I can regenerate it
> > in both Gigabyte AX370-Gaming 5/ASUS TUF Gaming X570-plus with AMD
> > Ryzen 2400G.
>
> Rather than lspci, is it specifically config space access from sysfs? Does
> the output from the serial port change with IOMMU enabled vs disabled?
In lspci case, I can work with debugger and figure out that 'pread(2)' to
file descriptor for 'config' node in sysfs causes the unexpected system
reboot. Additionally I can regenerate it by hexdump(1) to the node:
```
$ lspci
...
04:00.0 PCI bridge: ASMedia Technology Inc. ASM1083/1085 PCIe to PCI Bridge [1b21:1080] (rev 03)
05:00.0 FireWire (IEEE 1394): VIA Technologies, Inc. VT6306/7/8 [Fire II(M)] IEEE 1394 OHCI Controller [1106:3044] (rev 80)
...
$ hexdump -C /sys/bus/pci/devices/0000\:05\:00.0/config
00000000 06 11 44 30 80 00 10 02 80 10 00 0c 10 20 00 00 |..D0......... ..|
00000010 00 00 90 fc 01 d0 00 00 00 00 00 00 00 00 00 00 |................|
00000020 00 00 00 00 00 00 00 00 00 00 00 00 06 11 44 30 |..............D0|
00000030 00 00 00 00 50 00 00 00 00 00 00 00 ff 01 00 20 |....P.......... |
00000040
$ lsmod | grep firewire
(no output)
$ sudo -i
# modprobe vfio-pci
# echo 1106 3044 > /sys/bus/pci/drivers/vfio-pci/new_id
# exit
$ hexdump -C /sys/bus/pci/devices/0000\:05\:00.0/config
(reboot)
```
I can suppress it when disabling IOMMU in motherboard. In this point, the
issue of lspci is a bit different from the issue of driver issue.
> > I'm plreased to see if you have extra ideas to get helpful output from
> > the system. But I guess that I should start finding some workaround to
> > avoid the issued access to register instead of investigating the reboot
> > mechanism, sigh...
> >
> > Anyway, thanks for your help. >
>
> Can you check FCH::PM::S5_RESET_STATUS on next boot after failure has
> occurred? It is available at MMIO FED80300 or through indirect IO access at
> 0xC0.
>
> If MMIO doesn't work, double check FCH::PM_ISACONTROL bit 1 (described on
> page 296) to confirm if your system enables it.
>
> The meanings of the different bits can be found in a recent PPR:
> https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/programmer-references/55901_B1_pub_053.zip
>
> Indirect IO is described on PDF page 294.
>
> This will at least give us a hint what's going on in this case.
I'll try the above in this week. Thanks.
[1] https://github.com/takaswie/linux-firewire-dkms/
Takashi Sakamoto
Powered by blists - more mailing lists