linux-kernel - Re: [PATCH V2 2/3] vdpa_sim_net: support feature provisioning

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <84027fbd-a340-4d85-e609-f8f1e1466825@redhat.com>
Date:   Tue, 18 Oct 2022 15:45:39 +0800
From:   Jason Wang <jasowang@...hat.com>
To:     Si-Wei Liu <si-wei.liu@...cle.com>
Cc:     mst <mst@...hat.com>, Eli Cohen <elic@...dia.com>,
        Parav Pandit <parav@...dia.com>,
        Wu Zongyong <wuzongyong@...ux.alibaba.com>,
        virtualization <virtualization@...ts.linux-foundation.org>,
        linux-kernel <linux-kernel@...r.kernel.org>,
        eperezma <eperezma@...hat.com>,
        Zhu Lingshan <lingshan.zhu@...el.com>,
        Gautam Dawar <gdawar@...inx.com>, Cindy Lu <lulu@...hat.com>,
        Yongji Xie <xieyongji@...edance.com>,
        Eli Cohen <elic@...dia.com>
Subject: Re: [PATCH V2 2/3] vdpa_sim_net: support feature provisioning


在 2022/10/18 02:43, Si-Wei Liu 写道:
>
>
> On 10/13/2022 12:10 AM, Jason Wang wrote:
>>
>> 在 2022/10/7 08:35, Si-Wei Liu 写道:
>>>
>>>
>>> On 9/28/2022 9:55 PM, Jason Wang wrote:
>>>> On Tue, Sep 27, 2022 at 5:41 PM Si-Wei Liu <si-wei.liu@...cle.com> 
>>>> wrote:
>>>>>
>>>>>
>>>>> On 9/26/2022 8:59 PM, Jason Wang wrote:
>>>>>
>>>>> On Tue, Sep 27, 2022 at 9:02 AM Si-Wei Liu <si-wei.liu@...cle.com> 
>>>>> wrote:
>>>>>
>>>>>
>>>>> On 9/26/2022 12:11 AM, Jason Wang wrote:
>>>>>
>>>>> On Sat, Sep 24, 2022 at 4:01 AM Si-Wei Liu <si-wei.liu@...cle.com> 
>>>>> wrote:
>>>>>
>>>>>
>>>>> On 9/21/2022 7:43 PM, Jason Wang wrote:
>>>>>
>>>>> This patch implements features provisioning for vdpa_sim_net.
>>>>>
>>>>> 1) validating the provisioned features to be a subset of the parent
>>>>>      features.
>>>>> 2) clearing the features that is not wanted by the userspace
>>>>>
>>>>> For example:
>>>>>
>>>>> # vdpa mgmtdev show
>>>>> vdpasim_net:
>>>>>     supported_classes net
>>>>>     max_supported_vqs 3
>>>>>     dev_features MTU MAC CTRL_VQ CTRL_MAC_ADDR ANY_LAYOUT 
>>>>> VERSION_1 ACCESS_PLATFORM
>>>>>
>>>>> Sighs, not to blame any one and it's perhaps too late, but this
>>>>> "dev_features" attr in "mgmtdev show" command output should have been
>>>>> called "supported_features" in the first place.
>>>>>
>>>>> Not sure I get this, but I guess this is the negotiated features 
>>>>> actually.
>>>>>
>>>>> Actually no, that is why I said the name is a bit confusing and 
>>>>> "supported_features" might sound better.
>>>>>
>>>>> You're right, it's an mgmtdev show actually.
>>>>>
>>>>> This attribute in the parent device (mgmtdev) denotes the real 
>>>>> device capability for what virtio features can be supported by the 
>>>>> parent device. Any unprivileged user can check into this field to 
>>>>> know parent device's capability without having to create a child 
>>>>> vDPA device at all. The features that child vDPA device may 
>>>>> support should be a subset of, or at most up to what the parent 
>>>>> device offers. For e.g. the vdpa device dev1 you created below can 
>>>>> expose less or equal device_features bit than 0x308820028 (MTU MAC 
>>>>> CTRL_VQ CTRL_MAC_ADDR ANY_LAYOUT VERSION_1 ACCESS_PLATFORM), but 
>>>>> shouldn't be no more than what the parent device can actually 
>>>>> support.
>>>>>
>>>>> Yes, I didn't see anything wrong with "dev_features",
>>>>>
>>>>> Yep, it didn't appear to me anything wrong either at first sight, 
>>>>> then I gave my R-b on the series introduced this attribute. But 
>>>>> it's not a perfect name, either, on the other hand. Parav later 
>>>>> pointed out that the corresponding enum definition for this 
>>>>> attribute should follow pre-existing naming convention that we 
>>>>> should perhaps do 
>>>>> s/VDPA_ATTR_DEV_SUPPORTED_FEATURES/VDPA_ATTR_MGMTDEV_SUPPORTED_FEATURES/ 
>>>>> to get it renamed, as this is a mgmtdev level attribute, which I 
>>>>> agree. Now that with the upcoming "device_features" attribute 
>>>>> (vdpa dev level) from this series, it's subject to another 
>>>>> confusions between these two similar names, but actually would 
>>>>> represent things at different level. While all other attributes in 
>>>>> "mgmtdev dev show" seem to be aligned with the "supported_" 
>>>>> prefix, e.g. supported_classes, max_supported_vqs, from which I 
>>>>> think the stance of device is already implied through "mgmtdev" in 
>>>>> the command. For the perspective of clarify and easy distinction, 
>>>>> "supported_features" seems to be a better name than "dev_features".
>>>> See another reply, I think I get your point,
>>>>
>>>> 1) VDPA_ATTR_VDPA_DEV_SUPPORTED_FEATURES (lingshan's series) and
>>>> VDPA_ATTR_VDPA_DEV_FEATURES should be equivalent and unified to a
>>>> single attribute.
>>>> 2) A better name to "supported_features" should be fine, patches 
>>>> are welcomed
>>>>
>>>>>   it aligns to the
>>>>> virtio spec which means the features could be used to create a vdpa
>>>>> device. But if everyone agree on the renaming, I'm fine.
>>>>>
>>>>> Never mind, if it's late don't have to bother.
>>>>>
>>>>>
>>>>>
>>>>> I think Ling Shan is working on reporting both negotiated features
>>>>> with the device features.
>>>>>
>>>>> Does it imply this series is connected to another work in 
>>>>> parallel? Is it possible to add a reference in the cover letter?
>>>>>
>>>>> I'm not sure, I remember Ling Shan did some work to not block the
>>>>> config show in this commit:
>>>>>
>>>>> commit a34bed37fc9d3da319bb75dfbf02a7d3e95e12de
>>>>> Author: Zhu Lingshan <lingshan.zhu@...el.com>
>>>>> Date:   Fri Jul 22 19:53:07 2022 +0800
>>>>>
>>>>>      vDPA: !FEATURES_OK should not block querying device config space
>>>>>
>>>>> We need some changes in the vdpa tool to show device_features
>>>>> unconditionally in the "dev config show" command.
>>>>>
>>>>> That's true, I think I ever pointed it out to Lingshan before, 
>>>>> that it's not needed to bother exposing those config space fields 
>>>>> in "dev config show" output, if the only intent is for live 
>>>>> migration of device features between nodes. For vDPA live 
>>>>> migration, what cares most is those configuration parameters 
>>>>> specified on vdpa creation, and userspace VMM (QEMU) is supposed 
>>>>> to take care of saving and restoring live device states. I think 
>>>>> it's easier to extend "vdpa dev show" output to include 
>>>>> device_features and other config params as well, rather than count 
>>>>> on validity of various config space fields.
>>>> Probably, but for the migration it's more about the ability of the
>>>> mgmtdev instead of the vDPA device itself I think.
>>> If picking the appropriate device for migration is what it is 
>>> concerned, it's subject to the capability that mgmtdev offers. 
>>> That's true, for sure.
>>>
>>> On the other hand, mgmt software would also need to take care of 
>>> reconstructing the destination device with the same configuration as 
>>> that at the source side. For example, a mgmtdev on source supports 
>>> features A, B, C, D,  and destination mgmtdev supports features B, 
>>> C, D, E. When vdpa device on the source is initially created with 
>>> features B and C but without feature D (noted: creation with a 
>>> subset of mgmtdev features was already supported before, and this 
>>> series just makes it more explicit), the mgmt software is supposed 
>>> to equally create a device with features B and C on dest, even 
>>> though the destination may support feature D that the mgmtdev on 
>>> both sides can support. My point is, we should have some flexibility 
>>> on the mgmt software implementation that allows the mgmt software to 
>>> easily tell apart the exact features (i.e. B and C in the above 
>>> example) and the exact configuration a specific vdpa device was 
>>> originally created with, via some simple query command. Having mgmt 
>>> software to remember the vdpa creation args could work, but on the 
>>> other hand it'd be nice to allow lightweight software implementation 
>>> without having to persist the list of vdpa args and make vdpa tool 
>>> more self-contained.
>>>
>>> I will post a patch (series) shortly to demonstrate this idea. 
>>> Basically, there's no actual need to mess around with those config 
>>> space values for live migration. It was not built for that purpose.
>>
>>
>> Ok, let's see.
>>
>>
>>>
>>>>
>>>>> https://lore.kernel.org/virtualization/454bdf1b-daa1-aa67-2b8c-bc15351c1851@oracle.com/ 
>>>>>
>>>>>
>>>>> It's not just insufficient, but sometimes is incorrect to create 
>>>>> vDPA device using the config space fields.  For instance, MAC 
>>>>> address in config space can be changed temporarily (until device 
>>>>> reset) via ctrl_vq VIRTIO_NET_CTRL_MAC_ADDR_SET command. It's 
>>>>> incorrect to create vDPA using the MAC address shown in the config 
>>>>> space.
>>>> I think it's still a must for create the mac with the exact mac 
>>>> address:
>>>>
>>>> 1) VIRTIO_NET_F_CTRL_MAC is not a must
>>>> 2) there's no way for us to know whether or not the mac has been 
>>>> changed
>>> Noted I think here we are still talking about VERSION_1 device which 
>>> is spec conforming. So far as I understand the spec, if the 
>>> VIRTIO_NET_F_CTRL_MAC feature is not negotiated, there's no way for 
>>> driver to change the default MAC address.
>>
>>
>> For 1.0 device yes.
>>
>>
>>>
>>> Even if we want to simulate or support a legacy device model, when 
>>> MAC address is changed by legacy driver somehow, QEMU should be able 
>>> to detect this and issue a vdpa ioctl call to change the MAC address 
>>> filter underneath. I don't see that it ever happens in today's code, 
>>> so I presume the only possibility this may work is that the specific 
>>> vDPA device may have an internal learning bridge that adapts to what 
>>> MAC address the driver actually sends. 
>>
>>
>> This is not true AFAIK. E.g when switchdev is enabled for mlx5 parent.
> Hmmm, I guess you mean switchdev mode with external learning bridge 
> software e.g. Open vSwitch? It's conceptionally the same with device 
> internal learning bridge, no?


I was told by Eli that when switchdev is enabled there will be no 
learning bridge. I might be wrong, cc Eli for more comments.


>
> OK, though the point was that QEMU should anyway notify the backend of 
> the mac address change for vDPA driver to apply the new MAC filter, 
> similar to the way how CTRL_MAC ctrl_vq command is doing.


Yes.


> It should not blindly assume that the every vDPA hardware may have 
> underlying learning bridge construct, being internal or external. 
> Basically it's not a universal assumption on the existence of learning 
> bridge, this won't work for any other vDPA NIC without switchdev or 
> learning bridge.
>
>>
>>
>>> In this case, since the device doesn't care, recreate with the MAC 
>>> in use is not needed, and technically it is even incorrect. In data 
>>> centers or cloud environment, MAC address is usually controlled and 
>>> managed by some central entity/service. If a driver can dominate the 
>>> MAC address in use by deliberately overriding the default MAC and 
>>> bypassing the central rule via live migration, that'd be a more 
>>> severe security issue to address in the first place.
>>
>>
>> There used to be a discussion to allow trust and spoof check as what 
>> SR-IOV did. For safety, we can filter out CTRL_MAC right now. But I 
>> think it's something out of the scope for this discussion.
> Sorry I don't get what you mean here, but I guess we may talk about 
> different thing here (it seems you talked about trusted model that 
> allows any MAC address, but it's orthogonal to programming MAC address 
> filter to the underlying hardware as far as I understand).


Probably, what I meant is that, most SRIOV vendor allow to forbid change 
VF mac addresses and other filter for safety.


>
>>
>> But I still don't get what's wrong with have the same mac address 
>> provisioned in both src and dst.
> It looks like we may have misunderstood each other - that's exactly 
> the point I want to make. The persistent mac address provisioned in 
> dst by the mgmt software should stay the same with that on src, which 
> is the default mac address rather than whatever other (temporary) mac 
> address the VM might be using at the time of migration switchover at 
> the source.


Yes, I think we are at the same page now.


>
>> It is the model used currently (e.g libvirt will guarantee the same 
>> mac in both src and dst). The mgmt software can then guarantee that 
>> the mac was fetched from the centralized manager.
> Right, this is exactly the way our mgmt software works.
>
>> And we can't depend purely on the migration since in some case it 
>> can't work: e.g src MTU 4000 dst MTU 1500, migration will fail and 
>> mgmt stack need to provision an 4000 to work.
> This seems like a bug of libvirt. For our case, we strictly prohibit 
> unequal MTU on src & dst to work. Even migrating from MTU 1500 to 
> 4000, it effectively changes the underlying behavior for packet 
> dropping and network setup at which maximum size the packet should be 
> allowed when entering the VM.


Yes, this means the management layer should guarantee exact the same 
attribute for vDPA provisioned in both source and destination.


>>
>>
>>>
>>>> 3) migration code can restore the mac during restore
>>>>
>>>> So exactly the same mac address is still required. (This is the same
>>>> as we are doing for live migration on software virtio)
>>>>
>>>>>   Another example, if the source vDPA device has MAC address table 
>>>>> size limit of 100, then in the destination we should pick parent 
>>>>> device with size limit no smaller than that, and create vDPA on 
>>>>> remote node matching the exact same size. There's nothing config 
>>>>> space field can assist here.
>>>> Two ways:
>>>>
>>>> 1) mgmtdev should show the mac table size so mgmt layer can provision
>>>> the mac table size
>>>> 2) If the mac table exceeds what is supported in the destination, it
>>>> needs to enable the all uni in this case.
>>> Yep, so no field in the config space can help with these two 
>>> solutions, right? MAC table size is not showing up there. Whether 
>>> the parent device supports ALLUNI via VIRTIO_NET_F_CTRL_RX is not 
>>> there, either. (showing them in the 'vdpa mgmtdev show' output is 
>>> the right thing IMHO).
>>>
>>>>
>>>>> One example further, in the future, if we are going to introduce 
>>>>> mandatory feature (for e.g. VERSION_1, RING_PACKED) that the 
>>>>> device is unable to support the opposite case, the destination 
>>>>> device should be created with equally same mandatory device 
>>>>> features, which only vDPA creation parameters should matter. While 
>>>>> I can't think of a case that the mgmt software or live migration 
>>>>> tool would have to count on config space fields only.
>>>> Yes, in this case we need to introduce new netlink attributes for both
>>>> getting mandatory features from the management device and provisioning
>>>> those manadating features.
>>> True, management device level thing again, not related to anything 
>>> in the config space.
>>>
>>>>
>>>>>
>>>>>
>>>>>
>>>>> 1) provision vDPA device with all features that are supported by the
>>>>>      net simulator
>>>>>
>>>>> # vdpa dev add name dev1 mgmtdev vdpasim_net
>>>>> # vdpa dev config show
>>>>> dev1: mac 00:00:00:00:00:00 link up link_announce false mtu 1500
>>>>>     negotiated_features MTU MAC CTRL_VQ CTRL_MAC_ADDR VERSION_1 
>>>>> ACCESS_PLATFORM
>>>>>
>>>>> Maybe not in this patch, but for completeness for the whole series,
>>>>> could we also add device_features to the output?
>>>>>
>>>>> Lingshan, could you please share your thoughts or patch on this?
>>>>>
>>>>> Noted here the device_features argument specified during vdpa 
>>>>> creation is introduced by this series itself, it somehow slightly 
>>>>> changed the original semantics of what device_features used to be.
>>>>>
>>>>> I'm not sure I get this, we don't support device_features in the past
>>>>> and it is used to provision device features to the vDPA device which
>>>>> seems to be fine.
>>>>>
>>>>> Before this change, only look at the dev_features in "mgmtdev 
>>>>> show" and remember creation parameters is sufficient to get to all 
>>>>> needed info for creating vDPA at destination.
>>>> Note that even with the same vendor, mgmtdev may support different 
>>>> features.
>>>>
>>>>> After this change, dev_features in "mgmtdev show" becomes less 
>>>>> relevant, as it would need to remember vdpa creation parameters 
>>>>> plus the device_features attribute. While this series allows cross 
>>>>> vendor live migration, it would complicate the implementation of 
>>>>> mgmt software, on the other hand.
>>>> To allow cross vendor live migration I couldn't find a better way.
>>> The idea itself is great, I think, though the CLI interface may have 
>>> some space for improvement. For example, user has to supply the 
>>> heximal value consisting of each feature bit, which is a bit 
>>> challenging for normal users who are not super familiar with each 
>>> virtio feature. On the other hand, there could be ambiguity against 
>>> other vdpa create option, e.g. users may do "vdpa dev add name vdpa0 
>>> mgmtdev ... mtu 1500 device_features 0x300020000" (no F_MTU feature 
>>> bit in device_features) that needs special validation in the code.
>>
>>
>> We can accept e.g XML in the future I think.
> Regardless of XML being a good interface or not for end users, but I 
> don't see how it relates to the issue here i.e. conflict/ambiguity or 
> extra validation in the (kernel) code.


It's only about choosing a suitable interface between vdpa tool and 
mangment layer instead of solely depending on the command line arguments 
since we may support a lot of different features in the future.


>
>>
>>
>>>
>>> How about we follow the CPU flags model or QEMU virtio-net-pci args 
>>> to define property representing each feature bit? I think the 
>>> current convention for each "vdpa dev add" option implies that the 
>>> corresponding feature bit will be enabled once specified in 
>>> creation. Similarly we can introduce additional option representing 
>>> each feature bit, along with a new features_default property 
>>> denoting the initial value for device_feature bits:
>>>
>>> # vdpa mgmtdev show
>>> vdpasim_net:
>>>   supported_classes net
>>>   max_supported_vqs 3
>>>   dev_features MTU MAC CTRL_VQ CTRL_MAC_ADDR ANY_LAYOUT VERSION_1 
>>> ACCESS_PLATFORM
>>> # vdpa dev add name dev1 mgmtdev vdpasim_net features_default off \
>>>                           csum on guest_csum on mtu 2000 ctrl_vq on 
>>> version1 on access_platform on
>>> # vdpa dev show
>>> dev1: type network mgmtdev vdpasim_net vendor_id 0 max_vqs 3 
>>> max_vq_size 256
>>>    features_default off mtu 2000
>>>    device_features CSUM GUEST_CSUM MTU CTRL_VQ VERSION_1 
>>> ACCESS_PLATFORM
>>>
>>> If the features_default property is left unspecified or with the 
>>> "inherited" value, it would just inherit all of the 
>>> supported_features from mgmtdev (which is already the case of today).
>>>
>>> # vdpa dev add name dev1 mgmtdev vdpasim_net features_default inherited
>>> # vdpa dev show
>>> dev1: type network mgmtdev vdpasim_net vendor_id 0 max_vqs 3 
>>> max_vq_size 256
>>>   features_default inherited
>>>   device_features MTU MAC CTRL_VQ CTRL_MAC_ADDR ANY_LAYOUT VERSION_1 
>>> ACCESS_PLATFORM
>>>
>>> I can definitely help implement this model if you find it fits.
>>
>>
>> I prefer XML since it could be reused and we may exceed 64bit 
>> limitation in the future. But we can hear from others.
> I don't see how this can be re-used by QEMU as QMP is not 
> taking/exporting XML. Are we talking about libvirt here, which happens 
> to be one amongst many others? My personal feeling is that not quite a 
> lot of human end users (rather than management software or script) 
> today would prefer using XML. Instead, like any other iproute utility, 
> json seems to a more popular interface for script and mgmt software to 
> consume, which vdpa tool supports natively already.
>
> I think basically we can support two set of CLI interfaces, one 
> friendly enough for human users, and another scriptable and parseable 
> by management software users. IMO for now we should start to focus on 
> the human oriented CLI design first. Whether XML v.s. json being an 
> ideal interface for managment software would be another discussion 
> that would need more inputs from the broader range of extended 
> community, which is not worth distracting from.


That's fine, is the end user is more about a program but a human, json 
should be better.

Thanks


>
>>
>>>
>>>>
>>>>>
>>>>>
>>>>> When simply look at the "vdpa dev config show" output, I cannot 
>>>>> really
>>>>> tell the actual device_features that was used in vdpa creation. 
>>>>> For e.g.
>>>>> there is a missing feature ANY_LAYOUT from negotiated_features 
>>>>> compared
>>>>> with supported_features in mgmtdev, but the orchestration software
>>>>> couldn't tell if the vdpa device on destination host should be 
>>>>> created
>>>>> with or without the ANY_LAYOUT feature.
>>>>>
>>>>> I think VERSION_1 implies ANY_LAYOUT.
>>>>>
>>>>> Right, ANY_LAYOUT is a bad example. A good example might be that, 
>>>>> I knew the parent mgmtdev on migration source node supports 
>>>>> CTRL_MAC_ADDR, but I don't find it in negotiated_features.
>>>>>
>>>>> I think we should use the features that we got from "mgmtdev show"
>>>>> instead of "negotiated features".
>>>>>
>>>>> That was how it's supposed to work previously, but with this 
>>>>> series, I think the newly introduced device_features will be 
>>>>> needed instead of the one in "mgmtdev show".
>>>> Just to clarify, there won't be a device_features in mgmtdev show
>>>> since it is device specific, each individual device can have its own
>>>> device features which are subset of what is supported by the mgmtdev.
>>> Yep.
>>>>
>>>>>
>>>>> On the migration destination node, the parent device does support 
>>>>> all features as the source offers, including CTRL_MAC_ADDR. What 
>>>>> device features you would expect the mgmt software to create 
>>>>> destination vdpa device with, if not otherwise requiring mgmt 
>>>>> software to remember all the arguments on device creation?
>>>> So the provisioning in the destination should use exactly the same
>>>> device_feautres as what the vdpa device has in the source. But before
>>>> this, management layer should guarantee to provision a vDPA device
>>>> whose device_features can be supported on the destination in order to
>>>> allow live migration.
>>> Exactly.
>>>>
>>>>> So in this example, we need use "dev_features" so we get exact the
>>>>> same features after and operation as either src or dst.
>>>>>
>>>>> If the device_features vDPA created with at the source doesn't 
>>>>> include CTRL_MAC_ADDR even though parent supports it, then the 
>>>>> vDPA to be created at the destination shouldn't come with 
>>>>> CTRL_MAC_ADDR either, regardless of whether or not CTRL_MAC_ADDR 
>>>>> is present in destination "mgmtdev show".
>>>>>
>>>>> However, if just taking look at negotiated_features, some mgmt 
>>>>> software implementations which don't persist the creation 
>>>>> parameters can't get the device features a certain vDPA device at 
>>>>> the source node was created with.
>>>>>
>>>>>
>>>>> SOURCE# vdpa mgmtdev show
>>>>> vdpasim_net:
>>>>>     supported_classes net
>>>>>     max_supported_vqs 3
>>>>>     dev_features MTU MAC CTRL_VQ CTRL_MAC_ADDR ANY_LAYOUT 
>>>>> VERSION_1 ACCESS_PLATFORM
>>>>> SOURCE# vdpa dev config show
>>>>> dev1: mac 00:00:00:00:00:00 link up link_announce false mtu 1500
>>>>>     negotiated_features MTU MAC CTRL_VQ VERSION_1 ACCESS_PLATFORM
>>>>>
>>>>> DESTINATION# vdpa mgmtdev show
>>>>> vdpasim_net:
>>>>>     supported_classes net
>>>>>     max_supported_vqs 3
>>>>>     dev_features MTU MAC CTRL_VQ CTRL_MAC_ADDR ANY_LAYOUT 
>>>>> VERSION_1 ACCESS_PLATFORM
>>>>>
>>>>>   But it should be sufficient to
>>>>> use features_src & feature_dst in this case. Actually, it should work
>>>>> similar as to the cpu flags, the management software should introduce
>>>>> the concept of cluster which means the maximal set of common features
>>>>> is calculated and provisioned during device creation to allow
>>>>> migration among the nodes inside the cluster.
>>>>>
>>>>> Yes, this is one way mgmt software may implement, but I am not 
>>>>> sure if it's the only way. For e.g. for cpu flags, mgmt software 
>>>>> can infer the guest cpus features in use from all qemu command 
>>>>> line arguments and host cpu features/capability, which doesn't 
>>>>> need to remember creation arguments and is easy to recover from 
>>>>> failure without having to make the VM config persistent in data 
>>>>> store. I thought it would be great if vdpa CLI design could offer 
>>>>> the same.
>>>>>
>>>>> One minor difference is that we have cpu model abstraction, so we can
>>>>> have things like:
>>>>>
>>>>> ./qemu-system-x86_64 -cpu EPYC
>>>>>
>>>>> Which implies the cpu features/flags where vDPA doesn't have. But
>>>>> consider it's just a 64bit (or 128 in the future), it doesn't 
>>>>> seems to
>>>>> be too complex for the management to know, we probably need to start
>>>>> from this and then we can try to introduce some generation/model 
>>>>> after
>>>>> it is agreed on most of the vendors.
>>>>>
>>>>> What you refer to is the so-called named model for CPU flags. I 
>>>>> think it's a good addition to have some generation or named model 
>>>>> defined for vDPA. But I don't get the point for how it relates to 
>>>>> exposing the actual value of device features? Are you saying in 
>>>>> this case you'd rather expose the model name than the actual value 
>>>>> of feature bits? Well, I think we can expose both in different 
>>>>> fields when there's really such a need.
>>>> It's something like:
>>>>
>>>> vdpa dev add name dev1 mgmtdev vdpasim_net device_features 
>>>> VDPA_NET_MODEL_1
>>>>
>>>> and VDPA_NET_MODEL_1 implies some feature sets.
>>>
>>> Not sure if this could be very useful for virtio devices, given we 
>>> don't have a determined set of virtio features unlike CPU 
>>> generation/model, but I think even with the features_default 
>>> property we can still achieve that.
>>>
>>> -Siwei
>>
>>
>> Yes.
> Let me get some time to implement and post the relevant patches 
> (mostly in iproute) later. Basically this would supplement those 
> config attributes introduced through the 'vdpa dev show' series below 
> [1] with provisioned device features to reference:
>
> [1] 
> https://lore.kernel.org/virtualization/1665793690-28120-1-git-send-email-si-wei.liu@oracle.com/
>
>
> Thanks,
> -Siwei
>
>>
>> Thanks
>>
>>
>>>
>>>>
>>>>> BTW with regard to the cpu model in mgmt software implementation, 
>>>>> the one implemented in libvirt is a mixed "Host model" [1] with 
>>>>> taking advantage of QEMU named model and exposing additional 
>>>>> individual CPU features that gets close to what host CPU offers.
>>>> So creating vDPA device without "device_features" is somehow the host
>>>> model, it will have all features that is supported by the parent.
>>>>
>>>>> I think this implies that mgmt software should have to understand 
>>>>> what the model name really means in terms of individual CPU 
>>>>> features, so having feature bit value exposed will just do more 
>>>>> help if vDPA goes the same way.
>>>> Exactly.
>>>>
>>>> Thanks
>>>>
>>>>>
>>>>> Regards,
>>>>> -Siwei
>>>>>
>>>>> [1] 
>>>>> https://qemu-project.gitlab.io/qemu/system/qemu-cpu-models.html#two-ways-to-configure-cpu-models-with-qemu-kvm
>>>>>
>>>>>
>>>>> Thanks
>>>>>
>>>>> Thanks,
>>>>> -Siwei
>>>>>
>>>>>
>>>>> Thanks
>>>>>
>>>>> Thanks,
>>>>> -Siwei
>>>>>
>>>>>
>>>>> 2) provision vDPA device with a subset of the features
>>>>>
>>>>> # vdpa dev add name dev1 mgmtdev vdpasim_net device_features 
>>>>> 0x300020000
>>>>> # vdpa dev config show
>>>>> dev1: mac 00:00:00:00:00:00 link up link_announce false mtu 1500
>>>>>     negotiated_features CTRL_VQ VERSION_1 ACCESS_PLATFORM
>>>>>
>>>>> Reviewed-by: Eli Cohen <elic@...dia.com>
>>>>> Signed-off-by: Jason Wang <jasowang@...hat.com>
>>>>> ---
>>>>>    drivers/vdpa/vdpa_sim/vdpa_sim_net.c | 11 ++++++++++-
>>>>>    1 file changed, 10 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim_net.c 
>>>>> b/drivers/vdpa/vdpa_sim/vdpa_sim_net.c
>>>>> index 886449e88502..a9ba02be378b 100644
>>>>> --- a/drivers/vdpa/vdpa_sim/vdpa_sim_net.c
>>>>> +++ b/drivers/vdpa/vdpa_sim/vdpa_sim_net.c
>>>>> @@ -254,6 +254,14 @@ static int vdpasim_net_dev_add(struct 
>>>>> vdpa_mgmt_dev *mdev, const char *name,
>>>>>        dev_attr.work_fn = vdpasim_net_work;
>>>>>        dev_attr.buffer_size = PAGE_SIZE;
>>>>>
>>>>> +     if (config->mask & BIT_ULL(VDPA_ATTR_DEV_FEATURES)) {
>>>>> +             if (config->device_features &
>>>>> +                 ~dev_attr.supported_features)
>>>>> +                     return -EINVAL;
>>>>> +             dev_attr.supported_features &=
>>>>> +                      config->device_features;
>>>>> +     }
>>>>> +
>>>>>        simdev = vdpasim_create(&dev_attr);
>>>>>        if (IS_ERR(simdev))
>>>>>                return PTR_ERR(simdev);
>>>>> @@ -294,7 +302,8 @@ static struct vdpa_mgmt_dev mgmt_dev = {
>>>>>        .id_table = id_table,
>>>>>        .ops = &vdpasim_net_mgmtdev_ops,
>>>>>        .config_attr_mask = (1 << VDPA_ATTR_DEV_NET_CFG_MACADDR |
>>>>> -                          1 << VDPA_ATTR_DEV_NET_CFG_MTU),
>>>>> +                          1 << VDPA_ATTR_DEV_NET_CFG_MTU |
>>>>> +                          1 << VDPA_ATTR_DEV_FEATURES),
>>>>>        .max_supported_vqs = VDPASIM_NET_VQ_NUM,
>>>>>        .supported_features = VDPASIM_NET_FEATURES,
>>>>>    };
>>>>>
>>>>>
>>>>>
>>>
>>
>