[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <22e0236f-b556-c6a8-0043-b39b02928fd6@intel.com>
Date: Tue, 16 Aug 2022 17:08:12 +0800
From: "Zhu, Lingshan" <lingshan.zhu@...el.com>
To: Si-Wei Liu <si-wei.liu@...cle.com>, jasowang@...hat.com,
mst@...hat.com
Cc: virtualization@...ts.linux-foundation.org, netdev@...r.kernel.org,
kvm@...r.kernel.org, parav@...dia.com, xieyongji@...edance.com,
gautam.dawar@....com
Subject: Re: [PATCH 2/2] vDPA: conditionally read fields in virtio-net dev
On 8/16/2022 3:58 PM, Si-Wei Liu wrote:
>
>
> On 8/15/2022 6:58 PM, Zhu, Lingshan wrote:
>>
>>
>> On 8/16/2022 7:32 AM, Si-Wei Liu wrote:
>>>
>>>
>>> On 8/15/2022 2:26 AM, Zhu Lingshan wrote:
>>>> Some fields of virtio-net device config space are
>>>> conditional on the feature bits, the spec says:
>>>>
>>>> "The mac address field always exists
>>>> (though is only valid if VIRTIO_NET_F_MAC is set)"
>>>>
>>>> "max_virtqueue_pairs only exists if VIRTIO_NET_F_MQ
>>>> or VIRTIO_NET_F_RSS is set"
>>>>
>>>> "mtu only exists if VIRTIO_NET_F_MTU is set"
>>>>
>>>> so we should read MTU, MAC and MQ in the device config
>>>> space only when these feature bits are offered.
>>>>
>>>> For MQ, if both VIRTIO_NET_F_MQ and VIRTIO_NET_F_RSS are
>>>> not set, the virtio device should have
>>>> one queue pair as default value, so when userspace querying queue
>>>> pair numbers,
>>>> it should return mq=1 than zero.
>>>>
>>>> For MTU, if VIRTIO_NET_F_MTU is not set, we should not read
>>>> MTU from the device config sapce.
>>>> RFC894 <A Standard for the Transmission of IP Datagrams over
>>>> Ethernet Networks>
>>>> says:"The minimum length of the data field of a packet sent over an
>>>> Ethernet is 1500 octets, thus the maximum length of an IP datagram
>>>> sent over an Ethernet is 1500 octets. Implementations are encouraged
>>>> to support full-length packets"
>>> Noted there's a typo in the above "The *maximum* length of the data
>>> field of a packet sent over an Ethernet is 1500 octets ..." and the
>>> RFC was written 1984.
>> the spec RFC894 says it is 1500, see <a
>> href="https://urldefense.com/v3/__https://www.rfc-editor.org/rfc/rfc894.txt__;!!ACWV5N9M2RV99hQ!MdgxZjw5sp5Qz-GKfwT1IWcw_L4Jo1-UekuJPFz1UrG3YuqirKz7P9ksdJFh1vB6zHJ7z8Q04fpT0-9jWXCtlWM$">https://www.rfc-editor.org/rfc/rfc894.txt</a>
>>>
>>> Apparently that is no longer true with the introduction of Jumbo
>>> size frame later in the 2000s. I'm not sure what is the point of
>>> mention this ancient RFC. It doesn't say default MTU of any Ethernet
>>> NIC/switch should be 1500 in either case.
>> This could be a larger number for sure, we are trying to find out the
>> min value for Ethernet here, to support 1500 octets, MTU should be
>> 1500 at least, so I assume 1500 should be the default value for MTU
>>>
>>>>
>>>> virtio spec says:"The virtio network device is a virtual ethernet
>>>> card",
>>> Right,
>>>> so the default MTU value should be 1500 for virtio-net.
>>> ... but it doesn't say the default is 1500. At least, not in
>>> explicit way. Why it can't be 1492 or even lower? In practice, if
>>> the network backend has a MTU higher than 1500, there's nothing
>>> wrong for guest to configure default MTU more than 1500.
>> same as above
>>>
>>>>
>>>> For MAC, the spec says:"If the VIRTIO_NET_F_MAC feature bit is set,
>>>> the configuration space mac entry indicates the “physical” address
>>>> of the network card, otherwise the driver would typically
>>>> generate a random local MAC address." So there is no
>>>> default MAC address if VIRTIO_NET_F_MAC not set.
>>>>
>>>> This commits introduces functions vdpa_dev_net_mtu_config_fill()
>>>> and vdpa_dev_net_mac_config_fill() to fill MTU and MAC.
>>>> It also fixes vdpa_dev_net_mq_config_fill() to report correct
>>>> MQ when _F_MQ is not present.
>>>>
>>>> These functions should check devices features than driver
>>>> features, and struct vdpa_device is not needed as a parameter
>>>>
>>>> The test & userspace tool output:
>>>>
>>>> Feature bit VIRTIO_NET_F_MTU, VIRTIO_NET_F_RSS, VIRTIO_NET_F_MQ
>>>> and VIRTIO_NET_F_MAC can be mask out by hardcode.
>>>>
>>>> However, it is challenging to "disable" the related fields
>>>> in the HW device config space, so let's just assume the values
>>>> are meaningless if the feature bits are not set.
>>>>
>>>> Before this change, when feature bits for RSS, MQ, MTU and MAC
>>>> are not set, iproute2 output:
>>>> $vdpa vdpa0: mac 00:e8:ca:11:be:05 link up link_announce false mtu
>>>> 1500
>>>> negotiated_features
>>>>
>>>> without this commit, function vdpa_dev_net_config_fill()
>>>> reads all config space fields unconditionally, so let's
>>>> assume the MAC and MTU are meaningless, and it checks
>>>> MQ with driver_features, so we don't see max_vq_pairs.
>>>>
>>>> After applying this commit, when feature bits for
>>>> MQ, RSS, MAC and MTU are not set,iproute2 output:
>>>> $vdpa dev config show vdpa0
>>>> vdpa0: link up link_announce false max_vq_pairs 1 mtu 1500
>>>> negotiated_features
>>>>
>>>> As explained above:
>>>> Here is no MAC, because VIRTIO_NET_F_MAC is not set,
>>>> and there is no default value for MAC. It shows
>>>> max_vq_paris = 1 because even without MQ feature,
>>>> a functional virtio-net must have one queue pair.
>>>> mtu = 1500 is the default value as ethernet
>>>> required.
>>>>
>>>> This commit also add supplementary comments for
>>>> __virtio16_to_cpu(true, xxx) operations in
>>>> vdpa_dev_net_config_fill() and vdpa_fill_stats_rec()
>>>>
>>>> Signed-off-by: Zhu Lingshan <lingshan.zhu@...el.com>
>>>> ---
>>>> drivers/vdpa/vdpa.c | 60
>>>> +++++++++++++++++++++++++++++++++++----------
>>>> 1 file changed, 47 insertions(+), 13 deletions(-)
>>>>
>>>> diff --git a/drivers/vdpa/vdpa.c b/drivers/vdpa/vdpa.c
>>>> index efb55a06e961..a74660b98979 100644
>>>> --- a/drivers/vdpa/vdpa.c
>>>> +++ b/drivers/vdpa/vdpa.c
>>>> @@ -801,19 +801,44 @@ static int vdpa_nl_cmd_dev_get_dumpit(struct
>>>> sk_buff *msg, struct netlink_callba
>>>> return msg->len;
>>>> }
>>>> -static int vdpa_dev_net_mq_config_fill(struct vdpa_device *vdev,
>>>> - struct sk_buff *msg, u64 features,
>>>> +static int vdpa_dev_net_mq_config_fill(struct sk_buff *msg, u64
>>>> features,
>>>> const struct virtio_net_config *config)
>>>> {
>>>> u16 val_u16;
>>>> - if ((features & BIT_ULL(VIRTIO_NET_F_MQ)) == 0)
>>>> - return 0;
>>>> + if ((features & BIT_ULL(VIRTIO_NET_F_MQ)) == 0 &&
>>>> + (features & BIT_ULL(VIRTIO_NET_F_RSS)) == 0)
>>>> + val_u16 = 1;
>>>> + else
>>>> + val_u16 = __virtio16_to_cpu(true,
>>>> config->max_virtqueue_pairs);
>>>> - val_u16 = le16_to_cpu(config->max_virtqueue_pairs);
>>>> return nla_put_u16(msg, VDPA_ATTR_DEV_NET_CFG_MAX_VQP, val_u16);
>>>> }
>>>> +static int vdpa_dev_net_mtu_config_fill(struct sk_buff *msg, u64
>>>> features,
>>>> + const struct virtio_net_config *config)
>>>> +{
>>>> + u16 val_u16;
>>>> +
>>>> + if ((features & BIT_ULL(VIRTIO_NET_F_MTU)) == 0)
>>>> + val_u16 = 1500;
>>> As said, there's no virtio spec defined value for MTU. Please leave
>>> this field out if feature VIRTIO_NET_F_MTU is not negotiated.
>> same as above
>>>> + else
>>>> + val_u16 = __virtio16_to_cpu(true, config->mtu);
>>>> +
>>>> + return nla_put_u16(msg, VDPA_ATTR_DEV_NET_CFG_MTU, val_u16);
>>>> +}
>>>> +
>>>> +static int vdpa_dev_net_mac_config_fill(struct sk_buff *msg, u64
>>>> features,
>>>> + const struct virtio_net_config *config)
>>>> +{
>>>> + if ((features & BIT_ULL(VIRTIO_NET_F_MAC)) == 0)
>>>> + return 0;
>>>> + else
>>>> + return nla_put(msg, VDPA_ATTR_DEV_NET_CFG_MACADDR,
>>>> + sizeof(config->mac), config->mac);
>>>> +}
>>>> +
>>>> +
>>>> static int vdpa_dev_net_config_fill(struct vdpa_device *vdev,
>>>> struct sk_buff *msg)
>>>> {
>>>> struct virtio_net_config config = {};
>>>> @@ -822,18 +847,16 @@ static int vdpa_dev_net_config_fill(struct
>>>> vdpa_device *vdev, struct sk_buff *ms
>>>> vdpa_get_config_unlocked(vdev, 0, &config, sizeof(config));
>>>> - if (nla_put(msg, VDPA_ATTR_DEV_NET_CFG_MACADDR,
>>>> sizeof(config.mac),
>>>> - config.mac))
>>>> - return -EMSGSIZE;
>>>> + /*
>>>> + * Assume little endian for now, userspace can tweak this for
>>>> + * legacy guest support.
>>> You can leave it as a TODO for kernel (vdpa core limitation), but
>>> AFAIK there's nothing userspace needs to do to infer the endianness.
>>> IMHO it's the kernel's job to provide an abstraction rather than
>>> rely on userspace guessing it.
>> we have discussed it in another thread, and this comment is suggested
>> by MST.
> Can you provide the context or link? It shouldn't work like this,
> otherwise it is breaking uABI. E.g. how will a legacy/BE supporting
> kernel/device be backward compatible with older vdpa tool (which has
> knowledge of this endianness implication/assumption from day one)?
https://www.spinics.net/lists/netdev/msg837114.html
The challenge is that the status filed is virtio16, not le16, so
le16_to_cpu(xxx) is wrong anyway. However we can not tell whether it is
a LE or BE device from struct vdpa_device, so for most cases, we assume
it is LE, and leave this comment.
Thanks
>
> -Siwei
>
>>>
>>>> + */
>>>> + val_u16 = __virtio16_to_cpu(true, config.status);
>>>> val_u16 = __virtio16_to_cpu(true, config.status);
>>>> if (nla_put_u16(msg, VDPA_ATTR_DEV_NET_STATUS, val_u16))
>>>> return -EMSGSIZE;
>>>> - val_u16 = __virtio16_to_cpu(true, config.mtu);
>>>> - if (nla_put_u16(msg, VDPA_ATTR_DEV_NET_CFG_MTU, val_u16))
>>>> - return -EMSGSIZE;
>>>> -
>>>> features_driver = vdev->config->get_driver_features(vdev);
>>>> if (nla_put_u64_64bit(msg, VDPA_ATTR_DEV_NEGOTIATED_FEATURES,
>>>> features_driver,
>>>> VDPA_ATTR_PAD))
>>>> @@ -846,7 +869,13 @@ static int vdpa_dev_net_config_fill(struct
>>>> vdpa_device *vdev, struct sk_buff *ms
>>>> VDPA_ATTR_PAD))
>>>> return -EMSGSIZE;
>>>> - return vdpa_dev_net_mq_config_fill(vdev, msg,
>>>> features_driver, &config);
>>>> + if (vdpa_dev_net_mac_config_fill(msg, features_device, &config))
>>>> + return -EMSGSIZE;
>>>> +
>>>> + if (vdpa_dev_net_mtu_config_fill(msg, features_device, &config))
>>>> + return -EMSGSIZE;
>>>> +
>>>> + return vdpa_dev_net_mq_config_fill(msg, features_device,
>>>> &config);
>>>> }
>>>> static int
>>>> @@ -914,6 +943,11 @@ static int vdpa_fill_stats_rec(struct
>>>> vdpa_device *vdev, struct sk_buff *msg,
>>>> }
>>>> vdpa_get_config_unlocked(vdev, 0, &config, sizeof(config));
>>>> + /*
>>>> + * Assume little endian for now, userspace can tweak this for
>>>> + * legacy guest support.
>>>> + */
>>>> +
>>> Ditto.
>> same as above
>>
>> Thanks
>>>
>>> Thanks,
>>> -Siwei
>>>> max_vqp = __virtio16_to_cpu(true, config.max_virtqueue_pairs);
>>>> if (nla_put_u16(msg, VDPA_ATTR_DEV_NET_CFG_MAX_VQP, max_vqp))
>>>> return -EMSGSIZE;
>>>
>>
>
Powered by blists - more mailing lists