lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Mon, 7 Jun 2021 10:28:56 +0800
From:   Jason Wang <jasowang@...hat.com>
To:     wangyunjian <wangyunjian@...wei.com>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        "jcfaracco@...il.com" <jcfaracco@...il.com>
Cc:     "kuba@...nel.org" <kuba@...nel.org>,
        "davem@...emloft.net" <davem@...emloft.net>,
        "mst@...hat.com" <mst@...hat.com>,
        "virtualization@...ts.linux-foundation.org" 
        <virtualization@...ts.linux-foundation.org>,
        dingxiaoxiong <dingxiaoxiong@...wei.com>
Subject: Re: [PATCH net-next] virtio_net: set link state down when virtqueue
 is broken


在 2021/6/5 下午3:10, wangyunjian 写道:
>> -----Original Message-----
>> From: Jason Wang [mailto:jasowang@...hat.com]
>> Sent: Friday, June 4, 2021 10:38 AM
>> To: wangyunjian <wangyunjian@...wei.com>; netdev@...r.kernel.org
>> Cc: kuba@...nel.org; davem@...emloft.net; mst@...hat.com;
>> virtualization@...ts.linux-foundation.org; dingxiaoxiong
>> <dingxiaoxiong@...wei.com>
>> Subject: Re: [PATCH net-next] virtio_net: set link state down when virtqueue is
>> broken
>>
>>
>> 在 2021/6/3 下午7:34, wangyunjian 写道:
>>>> -----Original Message-----
>>>> From: Jason Wang [mailto:jasowang@...hat.com]
>>>> Sent: Monday, May 31, 2021 11:29 AM
>>>> To: wangyunjian <wangyunjian@...wei.com>; netdev@...r.kernel.org
>>>> Cc: kuba@...nel.org; davem@...emloft.net; mst@...hat.com;
>>>> virtualization@...ts.linux-foundation.org; dingxiaoxiong
>>>> <dingxiaoxiong@...wei.com>
>>>> Subject: Re: [PATCH net-next] virtio_net: set link state down when
>>>> virtqueue is broken
>>>>
>>>>
>>>> 在 2021/5/28 下午6:58, wangyunjian 写道:
>>>>>> -----Original Message-----
>>>>>>> From: Yunjian Wang <wangyunjian@...wei.com>
>>>>>>>
>>>>>>> The NIC can't receive/send packets if a rx/tx virtqueue is broken.
>>>>>>> However, the link state of the NIC is still normal. As a result,
>>>>>>> the user cannot detect the NIC exception.
>>>>>> Doesn't we have:
>>>>>>
>>>>>>            /* This should not happen! */
>>>>>>             if (unlikely(err)) {
>>>>>>                     dev->stats.tx_fifo_errors++;
>>>>>>                     if (net_ratelimit())
>>>>>>                             dev_warn(&dev->dev,
>>>>>>                                      "Unexpected TXQ (%d)
>>>> queue
>>>>>> failure: %d\n",
>>>>>>                                      qnum, err);
>>>>>>                     dev->stats.tx_dropped++;
>>>>>>                     dev_kfree_skb_any(skb);
>>>>>>                     return NETDEV_TX_OK;
>>>>>>             }
>>>>>>
>>>>>> Which should be sufficient?
>>>>> There may be other reasons for this error, e.g -ENOSPC(no free desc).
>>>> This should not happen unless the device or driver is buggy. We
>>>> always reserved sufficient slots:
>>>>
>>>>            if (sq->vq->num_free < 2+MAX_SKB_FRAGS) {
>>>>                    netif_stop_subqueue(dev, qnum); ...
>>>>
>>>>
>>>>> And if rx virtqueue is broken, there is no error statistics.
>>>> Feel free to add one if it's necessary.
>>> Currently receiving scenario, it is impossible to distinguish whether
>>> the reason for not receiving packet is virtqueue's broken or no packet.
>>
>> Can we introduce rx_fifo_errors for that?
>>
>>
>>>> Let's leave the policy decision (link down) to userspace.
>>>>
>>>>
>>>>>>> The driver can set the link state down when the virtqueue is broken.
>>>>>>> If the state is down, the user can switch over to another NIC.
>>>>>> Note that, we probably need the watchdog for virtio-net in order to
>>>>>> be a complete solution.
>>>>> Yes, I can think of is that the virtqueue's broken exception is
>>>>> detected on
>>>> watchdog.
>>>>> Is there anything else that needs to be done?
>>>> Basically, it's all about TX stall which watchdog tries to catch.
>>>> Broken vq is only one of the possible reason.
>>> Are there any plans for the watchdog?
>>
>> Somebody posted a prototype 3 or 4 years ago, you can search it and maybe we
>> can start from there.
> I find the patch (https://patchwork.ozlabs.org/project/netdev/patch/20191126200628.22251-3-jcfaracco@gmail.com/)
>
> The patch checks only the scenario where the sending queue is abnormal, but can
> not detect the exception in the receiving queue.


It's almost impossible to detect the abnormal of receiving queue since 
we there's no deterministic time for a receiving packet.


>
> And the patch restores the NIC by reset, which is inappropriate because the broken
> state may be caused by a front-end or back-end bug. We should keep the scene to
> locate bugs.


This could be changed, we can increase the error counters and let 
userspce to decide what to do.

Thanks


>
> Thanks
>
>> Thanks
>>
>>
>>> Thanks
>>>
>>>> Thanks
>>>>
>>>>
>>>>> Thanks
>>>>>
>>>>>> Thanks
>>>>>>
>>>>>>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ