[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <witv2eh3lxidto53m2twxht4lxhfjbcjwzaggalhtnqf73wpng@tzmwda7eeiaa>
Date: Mon, 4 Nov 2024 12:22:26 +0200
From: "Kirill A. Shutemov" <kirill@...temov.name>
To: "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>
Cc: "Michael S. Tsirkin" <mst@...hat.com>,
Jason Wang <jasowang@...hat.com>, Xuan Zhuo <xuanzhuo@...ux.alibaba.com>,
Eugenio PĂ©rez <eperezma@...hat.com>, virtualization@...ts.linux.dev, linux-kernel@...r.kernel.org,
Hongyu Ning <hongyu.ning@...ux.intel.com>
Subject: Re: [PATCH] virtio: Remove virtio devices on device_shutdown()
On Thu, Aug 08, 2024 at 06:28:02PM +0300, Kirill A. Shutemov wrote:
> On Thu, Aug 08, 2024 at 11:03:30AM -0400, Michael S. Tsirkin wrote:
> > On Thu, Aug 08, 2024 at 04:15:25PM +0300, Kirill A. Shutemov wrote:
> > > On Thu, Aug 08, 2024 at 08:10:34AM -0400, Michael S. Tsirkin wrote:
> > > > On Thu, Aug 08, 2024 at 10:51:41AM +0300, Kirill A. Shutemov wrote:
> > > > > Hongyu reported a hang on kexec in a VM. QEMU reported invalid memory
> > > > > accesses during the hang.
> > > > >
> > > > > Invalid read at addr 0x102877002, size 2, region '(null)', reason: rejected
> > > > > Invalid write at addr 0x102877A44, size 2, region '(null)', reason: rejected
> > > > > ...
> > > > >
> > > > > It was traced down to virtio-console. Kexec works fine if virtio-console
> > > > > is not in use.
> > > >
> > > > virtio is not doing a lot of 16 bit reads.
> > > > Are these the reads:
> > > >
> > > > virtio_cread(vdev, struct virtio_console_config, cols, &cols);
> > > > virtio_cread(vdev, struct virtio_console_config, rows, &rows);
> > > >
> > > > ?
> > > >
> > > > write is a bit puzzling too. This one?
> > > >
> > > > bool vp_notify(struct virtqueue *vq)
> > > > {
> > > > /* we write the queue's selector into the notification register to
> > > > * signal the other end */
> > > > iowrite16(vq->index, (void __iomem *)vq->priv);
> > > > return true;
> > > > }
> > >
> > > Given that we are talking about console issue, any suggestion on how to
> > > check?
> >
> >
> > If you do lspci -v on the device, we'll know where the BARs are,
> > and can compare to 0x102877002, 0x102877A44.
>
> 00:01.0 Ethernet controller: Red Hat, Inc. Virtio 1.0 network device (rev 01)
> Subsystem: Red Hat, Inc. Device 1100
> Flags: bus master, fast devsel, latency 0, IRQ 21
> Memory at 80005000 (32-bit, non-prefetchable) [size=4K]
> Memory at 380000000000 (64-bit, prefetchable) [size=16K]
> Capabilities: [98] MSI-X: Enable+ Count=4 Masked-
> Capabilities: [84] Vendor Specific Information: VirtIO: <unknown>
> Capabilities: [70] Vendor Specific Information: VirtIO: Notify
> Capabilities: [60] Vendor Specific Information: VirtIO: DeviceCfg
> Capabilities: [50] Vendor Specific Information: VirtIO: ISR
> Capabilities: [40] Vendor Specific Information: VirtIO: CommonCfg
> Kernel driver in use: virtio-pci
>
> 00:02.0 Communication controller: Red Hat, Inc. Virtio 1.0 socket (rev 01)
> Subsystem: Red Hat, Inc. Device 1100
> Flags: bus master, fast devsel, latency 0, IRQ 22
> Memory at 80004000 (32-bit, non-prefetchable) [size=4K]
> Memory at 380000004000 (64-bit, prefetchable) [size=16K]
> Capabilities: [98] MSI-X: Enable- Count=3 Masked-
> Capabilities: [84] Vendor Specific Information: VirtIO: <unknown>
> Capabilities: [70] Vendor Specific Information: VirtIO: Notify
> Capabilities: [60] Vendor Specific Information: VirtIO: DeviceCfg
> Capabilities: [50] Vendor Specific Information: VirtIO: ISR
> Capabilities: [40] Vendor Specific Information: VirtIO: CommonCfg
> Kernel driver in use: virtio-pci
>
> 00:03.0 Communication controller: Red Hat, Inc. Virtio 1.0 console (rev 01)
> Subsystem: Red Hat, Inc. Device 1100
> Flags: bus master, fast devsel, latency 0, IRQ 23
> Memory at 80003000 (32-bit, non-prefetchable) [size=4K]
> Memory at 380000008000 (64-bit, prefetchable) [size=16K]
> Capabilities: [98] MSI-X: Enable+ Count=2 Masked-
> Capabilities: [84] Vendor Specific Information: VirtIO: <unknown>
> Capabilities: [70] Vendor Specific Information: VirtIO: Notify
> Capabilities: [60] Vendor Specific Information: VirtIO: DeviceCfg
> Capabilities: [50] Vendor Specific Information: VirtIO: ISR
> Capabilities: [40] Vendor Specific Information: VirtIO: CommonCfg
> Kernel driver in use: virtio-pci
>
> 00:04.0 SCSI storage controller: Red Hat, Inc. Virtio 1.0 block device (rev 01)
> Subsystem: Red Hat, Inc. Device 1100
> Flags: bus master, fast devsel, latency 0, IRQ 20
> Memory at 80002000 (32-bit, non-prefetchable) [size=4K]
> Memory at 38000000c000 (64-bit, prefetchable) [size=16K]
> Capabilities: [98] MSI-X: Enable+ Count=17 Masked-
> Capabilities: [84] Vendor Specific Information: VirtIO: <unknown>
> Capabilities: [70] Vendor Specific Information: VirtIO: Notify
> Capabilities: [60] Vendor Specific Information: VirtIO: DeviceCfg
> Capabilities: [50] Vendor Specific Information: VirtIO: ISR
> Capabilities: [40] Vendor Specific Information: VirtIO: CommonCfg
> Kernel driver in use: virtio-pci
>
> 00:05.0 SCSI storage controller: Red Hat, Inc. Virtio 1.0 block device (rev 01)
> Subsystem: Red Hat, Inc. Device 1100
> Flags: bus master, fast devsel, latency 0, IRQ 21
> Memory at 80001000 (32-bit, non-prefetchable) [size=4K]
> Memory at 380000010000 (64-bit, prefetchable) [size=16K]
> Capabilities: [98] MSI-X: Enable+ Count=17 Masked-
> Capabilities: [84] Vendor Specific Information: VirtIO: <unknown>
> Capabilities: [70] Vendor Specific Information: VirtIO: Notify
> Capabilities: [60] Vendor Specific Information: VirtIO: DeviceCfg
> Capabilities: [50] Vendor Specific Information: VirtIO: ISR
> Capabilities: [40] Vendor Specific Information: VirtIO: CommonCfg
> Kernel driver in use: virtio-pci
> ....
> Invalid read at addr 0x100C37904, size 2, region '(null)', reason: rejected
> Invalid read at addr 0x1036F9002, size 2, region '(null)', reason: rejected
> Invalid read at addr 0x1036F9002, size 2, region '(null)', reason: rejected
> Invalid write at addr 0x1036F9A44, size 2, region '(null)', reason: rejected
> Invalid read at addr 0x1036F7002, size 2, region '(null)', reason: rejected
> Invalid read at addr 0x1036F7002, size 2, region '(null)', reason: rejected
> Invalid write at addr 0x1036F7A44, size 2, region '(null)', reason: rejected
> ....
>
> Yeah, looks like it is not BARs.
Michael, can we get back to this?
So it looks like to be a DMA.
It is a TDX guest. Some TDX context: it has concept of private and shared
memory. Private memory in only accessible to the guest. Shared memory is
accessible by both host and guest. It is used for guest/host communication
including DMA.
By default all memory is private and guest kernel converts some memory to
shared. On kexec, we convert all memory back to private, so the next
kernel can start from a known state. This conversion makes DMA impossible.
I think stopping devices before doing this conversion is a reasonable
solution.
--
Kiryl Shutsemau / Kirill A. Shutemov
Powered by blists - more mailing lists