lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ab5d903e-f78f-d304-24fb-de7ed73f1c55@oracle.com>
Date:   Wed, 26 Oct 2022 23:31:11 -0700
From:   Si-Wei Liu <si-wei.liu@...cle.com>
To:     Jason Wang <jasowang@...hat.com>
Cc:     mst@...hat.com, parav@...dia.com,
        virtualization@...ts.linux-foundation.org,
        linux-kernel@...r.kernel.org, Cindy Lu <lulu@...hat.com>
Subject: Re: [PATCH v3 3/4] vdpa: show dev config as-is in "vdpa dev show"
 output



On 10/25/2022 9:44 PM, Jason Wang wrote:
>
> 在 2022/10/26 09:10, Si-Wei Liu 写道:
>>
>>
>> On 10/24/2022 7:24 PM, Jason Wang wrote:
>>> On Tue, Oct 25, 2022 at 3:14 AM Si-Wei Liu<si-wei.liu@...cle.com>  
>>> wrote:
>>>> On 10/24/2022 1:40 AM, Jason Wang wrote:
>>>>> On Sat, Oct 22, 2022 at 7:49 AM Si-Wei Liu<si-wei.liu@...cle.com>  
>>>>> wrote:
>>>>>> Live migration of vdpa would typically require re-instate vdpa
>>>>>> device with an idential set of configs on the destination node,
>>>>>> same way as how source node created the device in the first
>>>>>> place. In order to save orchestration software from memorizing
>>>>>> and keeping track of vdpa config, it will be helpful if the vdpa
>>>>>> tool provides the aids for exporting the initial configs as-is,
>>>>>> the way how vdpa device was created. The "vdpa dev show" command
>>>>>> seems to be the right vehicle for that. It is unlike the "vdpa dev
>>>>>> config show" command output which usually goes with the live value
>>>>>> in the device config space, and is not quite reliable subject to
>>>>>> the dynamics of feature negotiation or possible change by the
>>>>>> driver to the config space.
>>>>>>
>>>>>> Examples:
>>>>>>
>>>>>> 1) Create vDPA by default without any config attribute
>>>>>>
>>>>>> $ vdpa dev add mgmtdev pci/0000:41:04.2 name vdpa0
>>>>>> $ vdpa dev show vdpa0
>>>>>> vdpa0: type network mgmtdev pci/0000:41:04.2 vendor_id 5555 
>>>>>> max_vqs 9 max_vq_size 256
>>>>>> $ vdpa dev -jp show vdpa0
>>>>>> {
>>>>>>       "dev": {
>>>>>>           "vdpa0": {
>>>>>>               "type": "network",
>>>>>>               "mgmtdev": "pci/0000:41:04.2",
>>>>>>               "vendor_id": 5555,
>>>>>>               "max_vqs": 9,
>>>>>>               "max_vq_size": 256,
>>>>>>           }
>>>>>>       }
>>>>>> }
>>>>>>
>>>>>> 2) Create vDPA with config attribute(s) specified
>>>>>>
>>>>>> $ vdpa dev add mgmtdev pci/0000:41:04.2 name vdpa0 \
>>>>>>       mac e4:11:c6:d3:45:f0 max_vq_pairs 4
>>>>>> $ vdpa dev show
>>>>>> vdpa0: type network mgmtdev pci/0000:41:04.2 vendor_id 5555 
>>>>>> max_vqs 9 max_vq_size 256
>>>>>>     initial_config: mac e4:11:c6:d3:45:f0 max_vq_pairs 4
>>>>>> $ vdpa dev -jp show
>>>>>> {
>>>>>>       "dev": {
>>>>>>           "vdpa0": {
>>>>>>               "type": "network",
>>>>>>               "mgmtdev": "pci/0000:41:04.2",
>>>>>>               "vendor_id": 5555,
>>>>>>               "max_vqs": 9,
>>>>>>               "max_vq_size": 256,
>>>>>>               "initial_config": {
>>>>>>                   "mac": "e4:11:c6:d3:45:f0",
>>>>>>                   "max_vq_pairs": 4
>>>>>>               }
>>>>>>           }
>>>>>>       }
>>>>>> }
>>>>>>
>>>>>> Signed-off-by: Si-Wei Liu<si-wei.liu@...cle.com>
>>>>>> ---
>>>>>>    drivers/vdpa/vdpa.c | 39 +++++++++++++++++++++++++++++++++++++++
>>>>>>    1 file changed, 39 insertions(+)
>>>>>>
>>>>>> diff --git a/drivers/vdpa/vdpa.c b/drivers/vdpa/vdpa.c
>>>>>> index bebded6..bfb8f54 100644
>>>>>> --- a/drivers/vdpa/vdpa.c
>>>>>> +++ b/drivers/vdpa/vdpa.c
>>>>>> @@ -677,6 +677,41 @@ static int 
>>>>>> vdpa_nl_cmd_dev_del_set_doit(struct sk_buff *skb, struct 
>>>>>> genl_info *i
>>>>>>    }
>>>>>>
>>>>>>    static int
>>>>>> +vdpa_dev_initcfg_fill(struct vdpa_device *vdev, struct sk_buff 
>>>>>> *msg, u32 device_id)
>>>>>> +{
>>>>>> +       struct vdpa_dev_set_config *cfg = &vdev->init_cfg;
>>>>>> +       int err = -EMSGSIZE;
>>>>>> +
>>>>>> +       if (!cfg->mask)
>>>>>> +               return 0;
>>>>>> +
>>>>>> +       switch (device_id) {
>>>>>> +       case VIRTIO_ID_NET:
>>>>>> +               if ((cfg->mask & 
>>>>>> BIT_ULL(VDPA_ATTR_DEV_NET_CFG_MACADDR)) != 0 &&
>>>>>> +                   nla_put(msg, VDPA_ATTR_DEV_NET_CFG_MACADDR,
>>>>>> +                           sizeof(cfg->net.mac), cfg->net.mac))
>>>>>> +                       return err;
>>>>>> +               if ((cfg->mask & 
>>>>>> BIT_ULL(VDPA_ATTR_DEV_NET_CFG_MTU)) != 0 &&
>>>>>> +                   nla_put_u16(msg, VDPA_ATTR_DEV_NET_CFG_MTU, 
>>>>>> cfg->net.mtu))
>>>>>> +                       return err;
>>>>>> +               if ((cfg->mask & 
>>>>>> BIT_ULL(VDPA_ATTR_DEV_NET_CFG_MAX_VQP)) != 0 &&
>>>>>> +                   nla_put_u16(msg, VDPA_ATTR_DEV_NET_CFG_MAX_VQP,
>>>>>> + cfg->net.max_vq_pairs))
>>>>>> +                       return err;
>>>>>> +               break;
>>>>>> +       default:
>>>>>> +               break;
>>>>>> +       }
>>>>>> +
>>>>>> +       if ((cfg->mask & BIT_ULL(VDPA_ATTR_DEV_FEATURES)) != 0 &&
>>>>>> +           nla_put_u64_64bit(msg, VDPA_ATTR_DEV_FEATURES,
>>>>>> +                             cfg->device_features, VDPA_ATTR_PAD))
>>>>>> +               return err;
>>>>> A question: If any of those above attributes were not provisioned,
>>>>> should we show the ones that are inherited from the parent?
>>>> A simple answer would be yes, but the long answer is that I am not 
>>>> sure
>>>> if there's any for the moment - there's no  default value for mtu, 
>>>> mac,
>>>> and max_vqp that can be inherited from the parent (max_vqp by default
>>>> being 1 is spec defined, not something inherited from the parent).
>>> Note that it is by default from driver level that if _F_MQ is not
>>> negotiated. But I think we are talking about something different that
>>> is out of the spec here, what if:
>>>
>>> vDPA inherit _F_MQ but mqx_vqp is not provisioned via netlink.
>>>
>>> Or is it not allowed?
>> My understanding is that this is not allowed any more since the 
>> introduction of max_vqp attribute. Noted, currently we don't have a 
>> way for vendor driver to report the default value for mqx_vqp, 
>
>
> I think it can be reported in this patch?
Yes, we can add, but I am not sure whether or not this will be 
practically useful, for e.g. the same command without max_vqp specified 
may render different number of queues across different devices, or 
different revisions of the same vendor's devices. Does it complicate the 
mgmt software even more, I'm not sure.... Could we instead mandate 
max_vqp to be 1 from vdpa core level if user doesn't explicitly specify 
the value? That way it is more consistent in terms of the resulting 
number of queue pairs (=1) with the case where parent device does not 
offer the _F_MQ feature.

>
>
>> if not otherwise specified in the CLI. Without getting the default 
>> value reported in 'vdpa mgmtdev show' level, it'd just confuse mgmt 
>> software even more.
>
>
> Yes, this is something that we need to fix. And what's more in order 
> to support dynamic provisioning, we need a way to report the number of 
> available instances that could be used for vDPA device provisioning.
Wouldn't it be possible to achieve that by simply checking how many 
parent mgmtdev instances don't have vdpa device provisioned yet? e.g.

inuse=$(vdpa dev show | grep mgmtdev | wc -l)
total=$(vdpa mgmtdev show  | grep "supported_classes" | wc -l )
echo $((total - inuse))

>
>
>>
>>>    At least some time in the past, mlx5 were
>>> enabled with MQ with 8 queue pairs by default.
>> That was the situation when there's no max_vqp attribute support from 
>> vdpa netlink API level. I think now every driver honors the vdpa core 
>> disposition to get a single queue pair if max_vqp config is missing.
>
> So we have:
>
> int vdpa_register_device(struct vdpa_device *vdev, int nvqs)
>
> This means technically, parent can allocate a multiqueue devices with 
> _F_MQ features if max_vqp and device_features is not provisioned. And 
> what's more, what happens if _F_MQ is provisioned by max_vqp is not 
> specified?
>
> The question is:
>
> When a attribute is not specificed/provisioned via net link, what's 
> the default value? The answer should be consistent: if device_features 
> is determined by the parent, we should do the same for mqx_vqp. 
OK I got your point.

> And it looks to me all of those belongs to the initial config 
> (self-contained)
Right. I wonder if we can have vdpa core define the default value (for 
e.g. max_vqp=1) for those unspecified attribute (esp. when the 
corresponding device feature is offered and provisioned) whenever 
possible. Which I think it'll be more consistent for the same command to 
get to the same result between different vendor drivers. While we still 
keep the possibility for future extension to allow driver override the 
vdpa core disposition if the real use case emerges. What do you think?

>
>
>> And the mlx5_vdpa driver with 8 queue pairs in the wild days is just 
>> irrelevant to be manageable by mgmt software, regardless of live 
>> migration.
>>>> And
>>>> the device_features if inherited is displayed at 'vdpa dev config 
>>>> show'
>>>> output. Can you remind me of a good example for inherited value 
>>>> that we
>>>> may want to show here?
>>> Some other cases:
>>>
>>> 1) MTU: there should be something that the device needs to report if
>>> _F_MTU is negotiated even if it is not provisioned from netlink.
>> I am not sure I understand the ask here. Noted the QEMU argument has 
>> to offer host_mtu=X with the maximum MTU value for guest to use (and 
>> applied as the initial MTU config during virtio-net probing for Linux 
>> driver), 
>
>
> Adding Cindy.
>
> I think it's a known issue that we need to do sanity check to make 
> sure cli parameters matches what is provisioned from netlink.
Right. How's the plan for QEMU to get to the mtu provisioned by netlink, 
via a new vhost-vdpa ioctl call? If so, will  QEMU be able to read it 
directly from kernel when it comes to the vhost-vdpa backend, without 
having user to specify host_mtu from CLI?
>
>
>> and the way to get the parent device MTU and whether that's relevant 
>> to vdpa device's MTU is very vendor specific.
>
>
> So I think the max MTU of parent should be equal to the max MTU of the 
> vDPA.
Noted here the parent might not be necessarily the mgmtdev where vdpa 
gets created over. It may well end up with the MTU on the PF (uplink 
port) which the mgmt software has to concern with. My point is the 
utility and tool chain able to derive the maximal MTU effectively 
allowed for vDPA device may live out of vDPA's realm. It's a rare or 
even invalid configuration to have vDPA configured with a bigger value 
than the MTU on the uplink port or parent device. It's more common when 
MTU config is involved, it has to be consistently configured across all 
the network links along, from parent device (uplink port) down to the 
switchdev representor port, vdpa device, and QEMU virtio-net object.

>
>
>> I think we would need new attribute(s) in the mgmtdev level to 
>> support what you want here?
>
>
> Not sure, but what I want to ask is consider we provision MTU feature 
> but without max MTU value, do we need to report the initial max MTU here?
Yep, maybe. I'm not very sure if this will be very useful to be honest, 
consider it's kinda a rare case to me were to provision MTU feature 
without a specific MTU value. If one cares about MTU, mgmt software 
should configure some mtu through "vdpa dev add ... mtu ...", no?

On the other hand, no mtu value specified may mean "go with what the 
uplink port or parent device has". I think this is a pretty useful case 
if the vendor's NIC supports updating MTU on the fly without having to 
tear down QEMU and reconfigure vdpa. I'm not sure if we end up with 
killing this use case by limiting initial max MTU to a fixed value.

>
>
>>
>>> 2) device_features: if device_features is not provisioned, we should
>>> still report it via netlink here
>> Not the way I expected it, but with Lingshan's series to expose 
>> fields out of FEATURES_OK, the device_features is now reported 
>> through 'vdpa dev config show' regardless being specified or not, if 
>> I am not mistaken?
>
>
> Yes.
Do you want me to relocate to 'vdpa dev show', or it's okay to leave it 
behind there?

>
>
>>
>> Currently we export the config attributes upon vdpa creation under 
>> the "initial_config" key. If we want to expose more default values 
>> inherited from mgmtdev, I think we can wrap up these default values 
>> under another key "inherited_config" to display in 'vdpa dev show' 
>> output. Does it fit what you have in mind?
>
>
> I wonder if it's better to merge those two, or is there any advantages 
> of splitting them?
I think for the most part "initial_config" will be sufficient for those 
config attributes with "vdpa dev add" equivalents, be it user specified, 
vdpa enforced default if missing user input, or default overridden by 
the parent device. "inherited_config" will be useful for the configs 
with no "vdpa dev add" equivalent or live out side of vdpa tool, but 
still important for mgmt software to replicate identical vdpa setup. 
Like max-supported-mtu (for the uplink port or parent device), 
effective-link-speed, effective-link-status et al. Let's see if there's 
more when we get there.

Thanks,
-Siwei

>
>
>>
>>> or do you mean the mgmt can assume it
>>> should be the same as mgmtdev. Anyhow if we don't show device_features
>>> if it is not provisioned, it will complicate the mgmt software.
>> Yes, as I said earlier, since the device_features attr getting added 
>> to the 'vdpa dev config show' command, this divergence started to 
>> complicate mgmt software already.
>>
>> Thanks,
>
>
> Thanks
>
>
>> -Siwei
>>> Thanks
>>>
>>>> Thanks,
>>>> -Siwei
>>>>
>>>>
>>>>> Thanks
>>>>>
>>>>>> +
>>>>>> +       return 0;
>>>>>> +}
>>>>>> +
>>>>>> +static int
>>>>>>    vdpa_dev_fill(struct vdpa_device *vdev, struct sk_buff *msg, 
>>>>>> u32 portid, u32 seq,
>>>>>>                 int flags, struct netlink_ext_ack *extack)
>>>>>>    {
>>>>>> @@ -715,6 +750,10 @@ static int 
>>>>>> vdpa_nl_cmd_dev_del_set_doit(struct sk_buff *skb, struct 
>>>>>> genl_info *i
>>>>>>           if (nla_put_u16(msg, VDPA_ATTR_DEV_MIN_VQ_SIZE, 
>>>>>> min_vq_size))
>>>>>>                   goto msg_err;
>>>>>>
>>>>>> +       err = vdpa_dev_initcfg_fill(vdev, msg, device_id);
>>>>>> +       if (err)
>>>>>> +               goto msg_err;
>>>>>> +
>>>>>>           genlmsg_end(msg, hdr);
>>>>>>           return 0;
>>>>>>
>>>>>> -- 
>>>>>> 1.8.3.1
>>>>>>
>>
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ