linux-kernel - Re: [PATCH v1 2/2] virtio-mmio: add features for virtio-mmio specification version 3

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <05C9E588-82A0-469D-9364-9ECF9EBE0531@linux.alibaba.com>
Date:   Fri, 3 Jan 2020 14:50:13 +0800
From:   "Liu, Jiang" <gerry@...ux.alibaba.com>
To:     Jason Wang <jasowang@...hat.com>
Cc:     Zha Bin <zhabin@...ux.alibaba.com>, linux-kernel@...r.kernel.org,
        mst@...hat.com, slp@...hat.com, virtio-dev@...ts.oasis-open.org,
        jing2.liu@...el.com, chao.p.peng@...el.com
Subject: Re: [PATCH v1 2/2] virtio-mmio: add features for virtio-mmio
 specification version 3

On Jan 2, 2020, at 2:28 PM, Jason Wang <jasowang@...hat.com> wrote:
> 
> 
> On 2019/12/26 下午9:16, Liu, Jiang wrote:
>> 
>>> On Dec 26, 2019, at 4:09 PM, Jason Wang <jasowang@...hat.com> wrote:
>>> 
>>> 
>>> On 2019/12/25 下午11:20, Liu, Jiang wrote:
>>>>> On Dec 25, 2019, at 6:20 PM, Jason Wang <jasowang@...hat.com> wrote:
>>>>> 
>>>>> 
>>>>> On 2019/12/25 上午10:50, Zha Bin wrote:
>>>>>> From: Liu Jiang <gerry@...ux.alibaba.com>
>>>>>> 
>>>>>> Userspace VMMs (e.g. Qemu microvm, Firecracker) take advantage of using
>>>>>> virtio over mmio devices as a lightweight machine model for modern
>>>>>> cloud. The standard virtio over MMIO transport layer only supports one
>>>>>> legacy interrupt, which is much heavier than virtio over PCI transport
>>>>>> layer using MSI. Legacy interrupt has long work path and causes specific
>>>>>> VMExits in following cases, which would considerably slow down the
>>>>>> performance:
>>>>>> 
>>>>>> 1) read interrupt status register
>>>>>> 2) update interrupt status register
>>>>>> 3) write IOAPIC EOI register
>>>>>> 
>>>>>> We proposed to update virtio over MMIO to version 3[1] to add the
>>>>>> following new features and enhance the performance.
>>>>>> 
>>>>>> 1) Support Message Signaled Interrupt(MSI), which increases the
>>>>>>    interrupt performance for virtio multi-queue devices
>>>>>> 2) Support per-queue doorbell, so the guest kernel may directly write
>>>>>>    to the doorbells provided by virtio devices.
>>>>>> 
>>>>>> The following is the network tcp_rr performance testing report, tested
>>>>>> with virtio-pci device, vanilla virtio-mmio device and patched
>>>>>> virtio-mmio device (run test 3 times for each case):
>>>>>> 
>>>>>> 	netperf -t TCP_RR -H 192.168.1.36 -l 30 -- -r 32,1024
>>>>>> 
>>>>>> 		Virtio-PCI    Virtio-MMIO   Virtio-MMIO(MSI)
>>>>>> 	trans/s	    9536	6939		9500
>>>>>> 	trans/s	    9734	7029		9749
>>>>>> 	trans/s	    9894	7095		9318
>>>>>> 
>>>>>> [1] https://lkml.org/lkml/2019/12/20/113
>>>>> Thanks for the patch. Two questions after a quick glance:
>>>>> 
>>>>> 1) In PCI we choose to support MSI-X instead of MSI for having extra flexibility like alias, independent data and address (e.g for affinity) . Any reason for not start from MSI-X? E.g having MSI-X table and PBA (both of which looks pretty independent).
>>>> Hi Jason,
>>>> 	Thanks for reviewing patches on Christmas Day:)
>>>> 	The PCI MSI-x has several advantages over PCI MSI, mainly
>>>> 1) support 2048 vectors, much more than 32 vectors supported by MSI.
>>>> 2) dedicated address/data for each vector,
>>>> 3) per vector mask/pending bits.
>>>> The proposed MMIO MSI extension supports both 1) and 2),
>>> 
>>> Aha right, I mis-read the patch. But more questions comes:
>>> 
>>> 1) The association between vq and MSI-X vector is fixed. This means it can't work for a device that have more than 2047 queues. We probably need something similar to virtio-pci to allow a dynamic association.
>> We have considered both the PCI MSI-x like dynamic association design and fix mapping design.
>> The fix mapping design simplifies both the interrupt configuration process and VMM implementations.
> 
> 
> Well, for VMM just an indirection and for guest, it can choose to use fixed mapping, just need to program once during probe.
> 
> 
>> And the virtio mmio transport layer is mainly used by light VMMs to support small scale virtual machines,
> 
> 
> Let's not limit the interface to be used by a specific case :). Eliminating PCIE would be appealing for other scenarios.
> 
> 
>>   2048 vectors should be enough for these usage cases.
>> So the fix mapping design has been used.
>> 
>>> 2) The mask and unmask control is missed
>>> 
>>> 
>>>>  but the extension doesn’t support 3) because
>>>> we noticed that the Linux virtio subsystem doesn’t really make use of interrupt masking/unmasking.
>>> 
>>> Not directly used but masking/unmasking is widely used in irq subsystem which allows lots of optimizations.
>>> 
>>> 
>>>> On the other hand, we want to simplify VMM implementations as simple as possible, and mimicking the PCI MSI-x
>>>> will cause some complexity to VMM implementations.
>>> 
>>> I agree to simplify VMM implementation, but it looks to me introducing masking/pending won't cost too much code in the VMM implementation. Just new type of command for VIRTIO_MMIO_MSI_COMMAND.
>> We want to make VMM implementations as simple as possible:)
>> And based on following observations, we have disabled support of mask/unmask,
>> 1) MSI is edge triggering, which means it won’t be shared with other interrupt sources,
> 
> 
> Is this true? I think the spec does not forbid such usages, e.g using the same MSI address/command for different queues or devices?
Yes, the spec doesn’t forbid this.
We could share the same MSIx vector for multiple queues, rx/tax etc.
But we can’t share a Linux MSI interrupt for different devices/MSIx vectors, this is an implementation constraint of the Linux interrupt subsystem.

> 
> 
>> so masking/unmasking won’t be used for normal interrupt management logic.
>> 2) Linux virtio mmio transport layer doesn’t support  suspend/resume yet, so there’s no need to quiesce the device by masking interrupts.
> 
> 
> Yes, but it's a limitation only for virtio mmio transport. We can add it.
> 
> 
>> 3) The legacy PCI 2.2 devices doesn’t support irq masking/unmasking, so irq masking/unmasking may be optional operations.
> 
> 
> Yes, but as you said, it helps for performance and some other cases. I still prefer to implement that consider it is not complex. If we do MSI without masking/unmasking, I suspect we will implement MSI-X finally somedays then maintaining MSI will become a burden... (still takes virtio-pci as an example, it choose to implement MSI-X not MSI).
> 
> 
>> So we skipped support of irq masking/unmasking. We will recheck whether irq masking/unmasking is mandatory for MMIO devices.
>> On the other hand, we may enhance the spec to define command codes for masking/unmasking, and VMM may optionally support masking/unmasking.
> 
> 
> Yes, thanks.
> 
> 
>> 
>> Thanks,
>> Gerry
>> 
>>> Thanks
>>> 
>>> 
>>>>> 2) It's better to split notify_multiplexer out of MSI support to ease the reviewers (apply to spec patch as well)
>>>> Great suggestion, we will try to split the patch.
>>>> 
>>>> Thanks,
>>>> Gerry
>>>> 
>>>>> Thanks