lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALpufv0h-sQ4Qfp-Sxd7wME4onMNAMop_gi-np6Dk2R96sba0Q@mail.gmail.com>
Date: Tue, 23 Jan 2024 11:27:40 +0800
From: yi sun <sunyibuaa@...il.com>
To: Stefan Hajnoczi <stefanha@...hat.com>
Cc: Yi Sun <yi.sun@...soc.com>, axboe@...nel.dk, mst@...hat.com, jasowang@...hat.com, 
	xuanzhuo@...ux.alibaba.com, pbonzini@...hat.com, 
	virtualization@...ts.linux.dev, linux-block@...r.kernel.org, 
	linux-kernel@...r.kernel.org, zhiguo.niu@...soc.com, hongyu.jin@...soc.com
Subject: Re: [PATCH 2/2] virtio-blk: Ensure no requests in virtqueues before
 deleting vqs.

On Mon, Jan 22, 2024 at 11:43 PM Stefan Hajnoczi <stefanha@...hat.com> wrote:
>
> On Mon, Jan 22, 2024 at 07:07:22PM +0800, Yi Sun wrote:
> > Ensure no remaining requests in virtqueues before resetting vdev and
> > deleting virtqueues. Otherwise these requests will never be completed.
> > It may cause the system to become unresponsive. So it is better to place
> > blk_mq_quiesce_queue() in front of virtio_reset_device().
>
> QEMU's virtio-blk device implementation completes all requests during
> device reset. Most device implementations have to do the same to avoid
> leaving dangling requests in flight across device reset.
>
> Which device implementation are you using and why is it safe for the
> device is simply drop requests across device reset?
>
> Stefan

Virtio-blk device implementation completes all requests during device reset, but
this can only ensure that the device has finished using virtqueue. We should
also consider the driver's use of virtqueue.

I caught such an example. Before del_vqs, the request had been processed by
the device, but it had not been processed by the driver. Although I am
using kernel5.4,
I think this problem may also occur with the latest version of kernel.

The debug code I added is as follows:
virtblk_freeze()
{
        vdev reset();
        quiesce queue();
        if (virtqueue->num_free != 1024) //1024 is the init value.
                BUG_ON(1);
        vdev del_vqs();
}

BUG_ON triggered the dump, the analysis is as follows:

There is one request left in request_queue.
crash_arm64> struct request ffffff81f0560000 | grep -e state -e __data_len
  __data_len = 20480,
  state = MQ_RQ_IN_FLIGHT,

crash_arm64> vring_virtqueue.packed,last_used_idx,broken,vq 0xffffff8086f92900 |
grep -e num -e used_wrap_counter -e last_used_idx -e broken -e
num_free -e desc_state -e "desc ="
        num = 1024,
        desc = 0xffffff8085ff8000,
      used_wrap_counter = false,
      desc_state = 0xffffff8085610000,
  last_used_idx = 487,
  broken = false,
    num_free = 1017,

Find desc based on last_used_idx. Through flags, we can know that the request
has been processed by the device, but it is still in flight state
because it has not
had time to run virtblk_done().
crash_arm> vring_packed_desc ffffff8085ff9e70
struct vring_packed_desc {
  addr = 10474619192,
  len = 20481,
  id = 667,
  flags = 2
}

I'm using a closed source virtual machine, so I can't see the source
code for it,
but I'm guessing it's similar to qemu.

After the device completes the request, we must also ensure that the driver can
complete the request in virtblk_done().

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ