lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Wed, 8 May 2024 07:18:48 -0400
From: Catherine Redfield <catherine.redfield@...onical.com>
To: Jason Wang <jasowang@...hat.com>
Cc: Joseph Salisbury <joseph.salisbury@...onical.com>, feliu@...dia.com, parav@...dia.com, 
	jiri@...dia.com, mst@...hat.com, yishaih@...dia.com, 
	alex.williamson@...hat.com, xuanzhuo@...ux.alibaba.com, 
	virtualization@...ts.linux.dev, linux-kernel@...r.kernel.org, 
	Francis Ginther <francis.ginther@...onical.com>, John Cabaj <john.cabaj@...onical.com>, 
	Ankush Pathak <ankush.pathak@...onical.com>, Chlo Smith <chloe.smith@...onical.com>
Subject: Re: [REGRESSION][v6.8-rc1] virtio-pci: Introduce admin virtqueue

On a VM with the GCP kernel (where we first identified the problem), I see:

1. The full kernel log from `journalctl --system > kernlog` attached.  The
specific suspend section is here:

May 08 11:08:42 kernel-test-202405080702.c.ubuntu-catred.internal
systemd[1]: Reached target sleep.target - Sleep.
May 08 11:08:42 kernel-test-202405080702.c.ubuntu-catred.internal
systemd[1]: Starting systemd-suspend.service - System Suspend...
May 08 11:08:42 kernel-test-202405080702.c.ubuntu-catred.internal
systemd-sleep[1413]: Performing sleep operation 'suspend'...
May 08 11:08:42 kernel-test-202405080702.c.ubuntu-catred.internal kernel:
PM: suspend entry (deep)
May 08 11:08:42 kernel-test-202405080702.c.ubuntu-catred.internal kernel:
Filesystems sync: 0.008 seconds
May 08 11:08:42 kernel-test-202405080702.c.ubuntu-catred.internal kernel:
Freezing user space processes
May 08 11:08:42 kernel-test-202405080702.c.ubuntu-catred.internal kernel:
Freezing user space processes completed (elapsed 0.001 seconds)
May 08 11:08:42 kernel-test-202405080702.c.ubuntu-catred.internal kernel:
OOM killer disabled.
May 08 11:08:42 kernel-test-202405080702.c.ubuntu-catred.internal kernel:
Freezing remaining freezable tasks
May 08 11:08:42 kernel-test-202405080702.c.ubuntu-catred.internal kernel:
Freezing remaining freezable tasks completed (elapsed 0.000 seconds)
May 08 11:08:42 kernel-test-202405080702.c.ubuntu-catred.internal kernel:
printk: Suspending console(s) (use no_console_suspend to debug)
May 08 11:08:42 kernel-test-202405080702.c.ubuntu-catred.internal kernel:
port 00:03:0.0: PM: dpm_run_callback(): pm_runtime_force_suspend+0x0/0x130
returns -16
May 08 11:08:42 kernel-test-202405080702.c.ubuntu-catred.internal kernel:
port 00:03:0.0: PM: failed to suspend: error -16
May 08 11:08:42 kernel-test-202405080702.c.ubuntu-catred.internal kernel:
sd 0:0:1:0: [sda] Synchronizing SCSI cache
May 08 11:08:42 kernel-test-202405080702.c.ubuntu-catred.internal kernel:
PM: Some devices failed to suspend, or early wake event detected
May 08 11:08:42 kernel-test-202405080702.c.ubuntu-catred.internal kernel:
OOM killer enabled.
May 08 11:08:42 kernel-test-202405080702.c.ubuntu-catred.internal kernel:
Restarting tasks ... done.
May 08 11:08:42 kernel-test-202405080702.c.ubuntu-catred.internal kernel:
random: crng reseeded on system resumption
May 08 11:08:42 kernel-test-202405080702.c.ubuntu-catred.internal kernel:
PM: suspend exit
May 08 11:08:42 kernel-test-202405080702.c.ubuntu-catred.internal kernel:
PM: suspend entry (s2idle)
-- Boot 61828bc938b44fc68a8aeedc16a23a9d --
May 08 11:09:03 localhost kernel: Linux version 6.8.0-1007-gcp
(buildd@...02-amd64-079) (x86_64-linux-gnu-gcc-13 (Ubuntu 13.2.0-23ubuntu4)
13.2.0, GNU ld (GNU Binutils for Ubuntu) 2.42) #7-Ubuntu SMP Sat Apr 20
00:58:31 UTC 2024 (Ubuntu 6.8.0-1007.7-gcp 6.8.1)
May 08 11:09:03 localhost kernel: Command line:
BOOT_IMAGE=/vmlinuz-6.8.0-1007-gcp
root=PARTUUID=7a949935-6bf2-4cae-b404-803c95163572 ro console=ttyS0,115200
panic=-1

2. The features the devices has:

catred@...nel-test-202405080702:~$ cat
/sys/bus/virtio/devices/virtio0/features
0110000000000000000000000000010000000000000000000000000000000000
catred@...nel-test-202405080702:~$ cat
/sys/bus/virtio/devices/virtio1/features
1110010110011001110000100000010000000000000000000000000000000000
catred@...nel-test-202405080702:~$ cat
/sys/bus/virtio/devices/virtio2/features
1110000000000000000000000000000000000000000000000000000000000000
catred@...nel-test-202405080702:~$ cat
/sys/bus/virtio/devices/virtio3/features
0000000000000000000000000000000000000000000000000000000000000000

Catherine

On Tue, May 7, 2024 at 11:34 PM Jason Wang <jasowang@...hat.com> wrote:

> On Sat, May 4, 2024 at 2:10 AM Joseph Salisbury
> <joseph.salisbury@...onical.com> wrote:
> >
> > Hi Feng,
> >
> > During testing, a kernel bug was identified with the suspend/resume
> > functionality on instances running in a public cloud [0].  This bug is a
> > regression introduced in v6.8-rc1.  After a kernel bisect, the following
> > commit was identified as the cause of the regression:
> >
> >         fd27ef6b44be  ("virtio-pci: Introduce admin virtqueue")
>
> Have a quick glance at the patch it seems it should not damage the
> freeze/restore as it should behave as in the past.
>
> But I found something interesting:
>
> 1) assumes 1 admin vq which is not what spec said
> 2) special function for admin virtqueue during freeze/restore, but it
> doesn't do anything special than del_vq()
> 3) lack real users but I guess e.g the destroy_avq() needs to be
> synchronized with the one that is using admin virtqueue
>
> >
> > I was hoping to get your feedback, since you are the patch author. Do
> > you think gathering any additional data will help diagnose this issue?
>
> Yes, please show us
>
> 1) the kernel log here.
> 2) the features that the device has like
> /sys/bus/virtio/devices/virtio0/features
>
> > This commit is depended upon by other virtio commits, so a revert test
> > is not really straight forward without reverting all the dependencies.
> > Any ideas you have would be greatly appreciated.
>
> Thanks
>
> >
> >
> > Thanks,
> >
> > Joe
> >
> > http://pad.lv/2063315
> >
>
>

Content of type "text/html" skipped

Download attachment "kernlog20240508_0714" of type "application/octet-stream" (545149 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ