[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <705314501.4326759.1659411022241.JavaMail.zimbra@sjtu.edu.cn>
Date: Tue, 2 Aug 2022 11:30:22 +0800 (CST)
From: Guo Zhi <qtxuning1999@...u.edu.cn>
To: eperezma <eperezma@...hat.com>
Cc: jasowang <jasowang@...hat.com>, sgarzare <sgarzare@...hat.com>,
Michael Tsirkin <mst@...hat.com>,
netdev <netdev@...r.kernel.org>,
linux-kernel <linux-kernel@...r.kernel.org>,
kvm list <kvm@...r.kernel.org>,
virtualization <virtualization@...ts.linux-foundation.org>
Subject: Re: [RFC 1/5] vhost: reorder used descriptors in a batch
----- Original Message -----
> From: "eperezma" <eperezma@...hat.com>
> To: "Guo Zhi" <qtxuning1999@...u.edu.cn>
> Cc: "jasowang" <jasowang@...hat.com>, "sgarzare" <sgarzare@...hat.com>, "Michael Tsirkin" <mst@...hat.com>, "netdev"
> <netdev@...r.kernel.org>, "linux-kernel" <linux-kernel@...r.kernel.org>, "kvm list" <kvm@...r.kernel.org>,
> "virtualization" <virtualization@...ts.linux-foundation.org>
> Sent: Friday, July 22, 2022 3:07:17 PM
> Subject: Re: [RFC 1/5] vhost: reorder used descriptors in a batch
> On Thu, Jul 21, 2022 at 10:44 AM Guo Zhi <qtxuning1999@...u.edu.cn> wrote:
>>
>> Device may not use descriptors in order, for example, NIC and SCSI may
>> not call __vhost_add_used_n with buffers in order. It's the task of
>> __vhost_add_used_n to order them. This commit reorder the buffers using
>> vq->heads, only the batch is begin from the expected start point and is
>> continuous can the batch be exposed to driver. And only writing out a
>> single used ring for a batch of descriptors, according to VIRTIO 1.1
>> spec.
>>
>> Signed-off-by: Guo Zhi <qtxuning1999@...u.edu.cn>
>> ---
>> drivers/vhost/vhost.c | 44 +++++++++++++++++++++++++++++++++++++++++--
>> drivers/vhost/vhost.h | 3 +++
>> 2 files changed, 45 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
>> index 40097826c..e2e77e29f 100644
>> --- a/drivers/vhost/vhost.c
>> +++ b/drivers/vhost/vhost.c
>> @@ -317,6 +317,7 @@ static void vhost_vq_reset(struct vhost_dev *dev,
>> vq->used_flags = 0;
>> vq->log_used = false;
>> vq->log_addr = -1ull;
>> + vq->next_used_head_idx = 0;
>> vq->private_data = NULL;
>> vq->acked_features = 0;
>> vq->acked_backend_features = 0;
>> @@ -398,6 +399,8 @@ static long vhost_dev_alloc_iovecs(struct vhost_dev *dev)
>> GFP_KERNEL);
>> if (!vq->indirect || !vq->log || !vq->heads)
>> goto err_nomem;
>> +
>> + memset(vq->heads, 0, sizeof(*vq->heads) * dev->iov_limit);
>> }
>> return 0;
>>
>> @@ -2374,12 +2377,49 @@ static int __vhost_add_used_n(struct vhost_virtqueue
>> *vq,
>> unsigned count)
>> {
>> vring_used_elem_t __user *used;
>> + struct vring_desc desc;
>> u16 old, new;
>> int start;
>> + int begin, end, i;
>> + int copy_n = count;
>> +
>> + if (vhost_has_feature(vq, VIRTIO_F_IN_ORDER)) {
>> + /* calculate descriptor chain length for each used buffer */
>> + for (i = 0; i < count; i++) {
>> + begin = heads[i].id;
>> + end = begin;
>> + vq->heads[begin].len = 0;
>> + do {
>> + vq->heads[begin].len += 1;
>> + if (unlikely(vhost_get_desc(vq, &desc, end))) {
>> + vq_err(vq, "Failed to get descriptor:
>> idx %d addr %p\n",
>> + end, vq->desc + end);
>> + return -EFAULT;
>> + }
>> + } while ((end = next_desc(vq, &desc)) != -1);
>> + }
>> +
>> + count = 0;
>> + /* sort and batch continuous used ring entry */
>> + while (vq->heads[vq->next_used_head_idx].len != 0) {
>> + count++;
>> + i = vq->next_used_head_idx;
>> + vq->next_used_head_idx = (vq->next_used_head_idx +
>> +
>> vq->heads[vq->next_used_head_idx].len)
>> + % vq->num;
>> + vq->heads[i].len = 0;
>> + }
>
> You're iterating vq->heads with two different indexes here.
>
> The first loop is working with indexes [0, count), which is fine if
> heads is a "cache" and everything can be overwritten (as it used to be
> before this patch).
>
> The other loop trusts in vq->next_used_head_idx, which is saved between calls.
>
> So both uses are going to conflict with each other.
>
The first loop is to calculate the length of each descriptor, and the next is to find
the begin point of next batch. The next loop contains the first loop.
> A proposal for checking this is to push the data in the chains
> incrementally at the virtio_test driver, and check that they are
> returned properly. Like, the first buffer in the chain has the value
> of N, the second one N+1, and so on.
>
LGTM. I'll try this to enhance the test.
> Let's split saving chains in its own patch.
>
>
>> + /* only write out a single used ring entry with the id
>> corresponding
>> + * to the head entry of the descriptor chain describing the last
>> buffer
>> + * in the batch.
>> + */
>
> Let's delay the batching for now, we can add it as an optimization on
> top in the case of devices.
>
> My proposal is to define a new struct vring_used_elem_inorder:
>
> struct vring_used_elem_inorder {
> uint16_t written'
> uint16_t num;
> }
>
> And create a per vq array of them, with vq->num size. Let's call it
> used_inorder for example.
>
> Everytime the device uses a buffer chain of N buffers, written L and
> first descriptor id D, it stores vq->used_inorder[D] = { .written = L,
> .num = N }. .num == 0 means the buffer is not available.
>
> After storing that information, you have your next_used_head_idx. You
> can check if vq->used_inorder[next_used_head_idx] is used (.num != 0).
> In case is not, there is no need to perform any actions for now.
>
> In case it is, you iterate vq->used_inorder. First you write as used
> next_used_head_idx. After that, next_used_head_idx increments by .num,
> and we need to clean .num. If vq->used_inorder[vq->next_used_head_idx]
> is used too, repeat.
>
> I think we could even squash vq->heads and vq->used_inorder with some
> tricks, because a chain's length would always be bigger or equal than
> used descriptor one, but to store in a different array would be more
> clear.
>
I think this algorithm is the same with that in the patch. But it is better
to add a struct named vring_used_elem_inorder instead of vq->heads, which
is more clear.
>> + heads[0].id = i;
>> + copy_n = 1;
>
> The device must not write anything to the used ring if the next
> descriptor has not been used. I'm failing to trace how this works when
> the second half of the batch in vhost/test.c is used here.
>
> Thanks!
>
>
Sorry for my mistake, I forgot add the check if(count == 0) in the patch.
>> + }
>>
>> start = vq->last_used_idx & (vq->num - 1);
>> used = vq->used->ring + start;
>> - if (vhost_put_used(vq, heads, start, count)) {
>> + if (vhost_put_used(vq, heads, start, copy_n)) {
>> vq_err(vq, "Failed to write used");
>> return -EFAULT;
>> }
>> @@ -2410,7 +2450,7 @@ int vhost_add_used_n(struct vhost_virtqueue *vq, struct
>> vring_used_elem *heads,
>>
>> start = vq->last_used_idx & (vq->num - 1);
>> n = vq->num - start;
>> - if (n < count) {
>> + if (n < count && !vhost_has_feature(vq, VIRTIO_F_IN_ORDER)) {
>> r = __vhost_add_used_n(vq, heads, n);
>> if (r < 0)
>> return r;
>> diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
>> index d9109107a..7b2c0fbb5 100644
>> --- a/drivers/vhost/vhost.h
>> +++ b/drivers/vhost/vhost.h
>> @@ -107,6 +107,9 @@ struct vhost_virtqueue {
>> bool log_used;
>> u64 log_addr;
>>
>> + /* Sort heads in order */
>> + u16 next_used_head_idx;
>> +
>> struct iovec iov[UIO_MAXIOV];
>> struct iovec iotlb_iov[64];
>> struct iovec *indirect;
>> --
>> 2.17.1
>>
Powered by blists - more mailing lists