lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CA+cxXhnb2i5O7_BiOfKLth5Zwp5T62d6F6c39vnuT53cUkU_uw@mail.gmail.com>
Date: Wed, 21 Aug 2024 14:30:59 -0700
From: Philip Chen <philipchen@...omium.org>
To: Ira Weiny <ira.weiny@...el.com>
Cc: Dave Jiang <dave.jiang@...el.com>, Pankaj Gupta <pankaj.gupta.linux@...il.com>, 
	Dan Williams <dan.j.williams@...el.com>, Vishal Verma <vishal.l.verma@...el.com>, 
	virtualization@...ts.linux.dev, nvdimm@...ts.linux.dev, 
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2] virtio_pmem: Check device status before requesting flush

Hi

On Wed, Aug 21, 2024 at 1:37 PM Ira Weiny <ira.weiny@...el.com> wrote:
>
> Philip Chen wrote:
> > Hi,
> >
> > On Tue, Aug 20, 2024 at 1:01 PM Dave Jiang <dave.jiang@...el.com> wrote:
> > >
> > >
> > >
> > > On 8/20/24 10:22 AM, Philip Chen wrote:
> > > > If a pmem device is in a bad status, the driver side could wait for
> > > > host ack forever in virtio_pmem_flush(), causing the system to hang.
> > > >
> > > > So add a status check in the beginning of virtio_pmem_flush() to return
> > > > early if the device is not activated.
> > > >
> > > > Signed-off-by: Philip Chen <philipchen@...omium.org>
> > > > ---
> > > >
> > > > v2:
> > > > - Remove change id from the patch description
> > > > - Add more details to the patch description
> > > >
> > > >  drivers/nvdimm/nd_virtio.c | 9 +++++++++
> > > >  1 file changed, 9 insertions(+)
> > > >
> > > > diff --git a/drivers/nvdimm/nd_virtio.c b/drivers/nvdimm/nd_virtio.c
> > > > index 35c8fbbba10e..97addba06539 100644
> > > > --- a/drivers/nvdimm/nd_virtio.c
> > > > +++ b/drivers/nvdimm/nd_virtio.c
> > > > @@ -44,6 +44,15 @@ static int virtio_pmem_flush(struct nd_region *nd_region)
> > > >       unsigned long flags;
> > > >       int err, err1;
> > > >
> > > > +     /*
> > > > +      * Don't bother to submit the request to the device if the device is
> > > > +      * not acticated.
> > >
> > > s/acticated/activated/
> >
> > Thanks for the review.
> > I'll fix this typo in v3.
> >
> > In addition to this typo, does anyone have any other concerns?
>
> I'm not super familiar with the virtio-pmem workings and the needs reset
> flag is barely used.
>
> Did you actually experience this hang?  How was this found?  What is the
> user visible issue and how critical is it?

Yes, I experienced the problem when trying to enable hibernation for a VM.

In the typical hibernation flow, the kernel would try to:
(1) freeze the processes
(2) freeze the devices
(3) take a snapshot of the memory
(4) thaw the devices
(5) write the snapshot to the storage
(6) power off the system (or perform platform-specific action)

In my case, I see VMM fail to re-activate the virtio_pmem device in (4).
(And therefore the virtio_pmem device side sets VIRTIO_CONFIG_S_NEEDS_RESET.)
As a result, when the kernel tries to power off the virtio_pmem device
in (6), the system would hang in virtio_pmem_flush() if this patch is
not added.

To fix the root cause of this issue, I sent another patch to add
freeze/restore PM callbacks to the virtio_pmem driver:
https://lore.kernel.org/lkml/20240815004617.2325269-1-philipchen@chromium.org/
(Please also help review that patch.)

However, I think this patch is still helpful since the system
shouldn't hang in virtio_pmem_flush() regardless of the device state.

>
> Thanks,
> Ira
>
> >
> > >
> > > > +      */
> > > > +     if (vdev->config->get_status(vdev) & VIRTIO_CONFIG_S_NEEDS_RESET) {
> > > > +             dev_info(&vdev->dev, "virtio pmem device needs a reset\n");
> > > > +             return -EIO;
> > > > +     }
> > > > +
> > > >       might_sleep();
> > > >       req_data = kmalloc(sizeof(*req_data), GFP_KERNEL);
> > > >       if (!req_data)
>
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ