linux-kernel - Re: [PATCH 2/2] virtio-mmio: Support multiple interrupts per device

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <87o7h1dx0n.fsf@cloudflare.com>
Date:   Sat, 14 Oct 2023 12:49:38 +0200
From:   Jakub Sitnicki <jakub@...udflare.com>
To:     Jason Wang <jasowang@...hat.com>
Cc:     virtualization@...ts.linux-foundation.org,
        "Michael S. Tsirkin" <mst@...hat.com>,
        Xuan Zhuo <xuanzhuo@...ux.alibaba.com>,
        linux-kernel@...r.kernel.org, kernel-team@...udflare.com
Subject: Re: [PATCH 2/2] virtio-mmio: Support multiple interrupts per device

Sorry for the delay in my response. I've been away at a conference.

On Tue, Oct 10, 2023 at 02:52 PM +08, Jason Wang wrote:
> On Sat, Sep 30, 2023 at 4:46 AM Jakub Sitnicki <jakub@...udflare.com> wrote:
>>
>> Some virtual devices, such as the virtio network device, can use multiple
>> virtqueues (or multiple pairs of virtqueues in the case of a vNIC). In such
>> case, when there are multiple vCPUs present, it is possible to process
>> virtqueue events in parallel. Each vCPU can service a subset of all
>> virtqueues when notified that there is work to carry out.
>>
>> However, the current virtio-mmio transport implementation poses a
>> limitation. Only one vCPU can service notifications from any of the
>> virtqueues of a single virtio device. This is because a virtio-mmio device
>> driver supports registering just one interrupt per device. With such setup
>> we are not able to scale virtqueue event processing among vCPUs.
>>
>> Now, with more than one IRQ resource registered for a virtio-mmio platform
>> device, we can address this limitation.
>>
>> First, we request multiple IRQs when creating virtqueues for a device.
>>
>> Then, map each virtqueue to one of the IRQs assigned to the device. The
>> mapping is done in a device type specific manner. For instance, a network
>> device will want each RX/TX virtqueue pair mapped to a different IRQ
>> line. Other device types might require a different mapping scheme. We
>> currently provide a mapping for virtio-net device type.
>>
>> Finally, when handling an interrupt, we service only the virtqueues
>> associated with the IRQ line that triggered the event.
>>
>> Signed-off-by: Jakub Sitnicki <jakub@...udflare.com>
>> ---
>>  drivers/virtio/virtio_mmio.c | 102 +++++++++++++++++++++++++++++++++++--------
>>  1 file changed, 83 insertions(+), 19 deletions(-)
>>
>> diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
>> index 06a587b23542..180c51c27704 100644
>> --- a/drivers/virtio/virtio_mmio.c
>> +++ b/drivers/virtio/virtio_mmio.c

[...]

>> @@ -488,6 +511,18 @@ static struct virtqueue *vm_setup_vq(struct virtio_device *vdev, unsigned int in
>>         return ERR_PTR(err);
>>  }
>>
>> +/* Map virtqueue to zero-based interrupt number */
>> +static unsigned int vq2irq(const struct virtqueue *vq)
>> +{
>> +       switch (vq->vdev->id.device) {
>> +       case VIRTIO_ID_NET:
>> +               /* interrupt shared by rx/tx virtqueue pair */
>> +               return vq->index / 2;
>> +       default:
>> +               return 0;
>> +       }
>
> Transport drivers should have no knowledge of a specific type of device.
>

Makes sense. This breaks layering. I will see how to pull this into the
device driver. Perhaps this can be communicated through set_vq_affinity
op.

>> +}
>> +
>>  static int vm_find_vqs(struct virtio_device *vdev, unsigned int nvqs,
>>                        struct virtqueue *vqs[],
>>                        vq_callback_t *callbacks[],

[...]

>> @@ -519,12 +544,51 @@ static int vm_find_vqs(struct virtio_device *vdev, unsigned int nvqs,
>>                 vqs[i] = vm_setup_vq(vdev, queue_idx++, callbacks[i], names[i],
>>                                      ctx ? ctx[i] : false);
>>                 if (IS_ERR(vqs[i])) {
>> -                       vm_del_vqs(vdev);
>> -                       return PTR_ERR(vqs[i]);
>> +                       err = PTR_ERR(vqs[i]);
>> +                       goto fail_vq;
>>                 }
>>         }
>>
>> +       nirqs = platform_irq_count(vm_dev->pdev);
>> +       if (nirqs < 0) {
>> +               err = nirqs;
>> +               goto fail_vq;
>> +       }
>> +
>> +       for (i = 0; i < nirqs; i++) {
>> +               irq = platform_get_irq(vm_dev->pdev, i);
>> +               if (irq < 0)
>> +                       goto fail_irq;
>> +               if (irq < irq_base)
>> +                       irq_base = irq;
>> +
>> +               err = devm_request_irq(&vdev->dev, irq, vm_interrupt,
>> +                                      IRQF_SHARED, NULL, vm_dev);
>> +               if (err)
>> +                       goto fail_irq;
>> +
>> +               if (of_property_read_bool(vm_dev->pdev->dev.of_node, "wakeup-source"))
>> +                       enable_irq_wake(irq);
>
> Could we simply use the same policy as PCI (vp_find_vqs_msix())?

Reading that routine, the PCI policy is:

1) Best option: one for change interrupt, one per vq.
2) Second best: one for change, shared for all vqs.

Would be great to be able to go with option (1), but we have only a
limited number of legacy IRQs to spread among MMIO devices. 48 IRQs at
most in a 2 x IOAPIC setup.

Having one IRQ per VQ would mean less Rx/Tx queue pairs for a vNIC. Less
than 24 queue pairs. While, from our target workload PoV, ideally, we
would like to support at least 32 queue pairs.

Hence the idea to have one IRQ per Rx/Tx VQ pair. Not as ideal as (1),
but a lot better than (2).

Comparing this to PCI - virtio-net, with one interrupt per VQ, will map
each Rx/Tx VQ pair to the same CPU.

We could achieve the same VQ-CPU affinity setup for MMIO, but with less
interrupt vectors.

Thanks for feedback.