linux-kernel - Re: [PATCH 3/3] mlx5_vdpa: defer clear_virtqueues to until DRIVER

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <460e414c-afab-842a-a278-16dbb2eed656@oracle.com>
Date:   Mon, 8 Feb 2021 17:40:03 -0800
From:   Si-Wei Liu <si-wei.liu@...cle.com>
To:     Eli Cohen <elic@...dia.com>
Cc:     mst@...hat.com, jasowang@...hat.com, linux-kernel@...r.kernel.org,
        virtualization@...ts.linux-foundation.org, netdev@...r.kernel.org
Subject: Re: [PATCH 3/3] mlx5_vdpa: defer clear_virtqueues to until DRIVER_OK



On 2/7/2021 9:48 PM, Eli Cohen wrote:
> On Sat, Feb 06, 2021 at 04:29:24AM -0800, Si-Wei Liu wrote:
>> While virtq is stopped,  get_vq_state() is supposed to
>> be  called to  get  sync'ed  with  the latest internal
>> avail_index from device. The saved avail_index is used
>> to restate  the virtq  once device is started.  Commit
>> b35ccebe3ef7 introduced the clear_virtqueues() routine
>> to  reset  the saved  avail_index,  however, the index
>> gets cleared a bit earlier before get_vq_state() tries
>> to read it. This would cause consistency problems when
>> virtq is restarted, e.g. through a series of link down
>> and link up events. We  could  defer  the  clearing of
>> avail_index  to  until  the  device  is to be started,
>> i.e. until  VIRTIO_CONFIG_S_DRIVER_OK  is set again in
>> set_status().
>
> Not sure I understand the scenario. You are talking about reset of the
> device followed by up/down events on the interface. How can you trigger
> this?
Currently it's not possible to trigger link up/down events with upstream 
QEMU due lack of config/control interrupt implementation. And live 
migration could be another scenario that cannot be satisfied because of 
inconsistent queue state. They share the same root of cause as captured 
here.

-Siwei

>
>> Fixes: b35ccebe3ef7 ("vdpa/mlx5: Restore the hardware used index after change map")
>> Signed-off-by: Si-Wei Liu <si-wei.liu@...cle.com>
>> ---
>>   drivers/vdpa/mlx5/net/mlx5_vnet.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
>> index aa6f8cd..444ab58 100644
>> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
>> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
>> @@ -1785,7 +1785,6 @@ static void mlx5_vdpa_set_status(struct vdpa_device *vdev, u8 status)
>>   	if (!status) {
>>   		mlx5_vdpa_info(mvdev, "performing device reset\n");
>>   		teardown_driver(ndev);
>> -		clear_virtqueues(ndev);
>>   		mlx5_vdpa_destroy_mr(&ndev->mvdev);
>>   		ndev->mvdev.status = 0;
>>   		++mvdev->generation;
>> @@ -1794,6 +1793,7 @@ static void mlx5_vdpa_set_status(struct vdpa_device *vdev, u8 status)
>>   
>>   	if ((status ^ ndev->mvdev.status) & VIRTIO_CONFIG_S_DRIVER_OK) {
>>   		if (status & VIRTIO_CONFIG_S_DRIVER_OK) {
>> +			clear_virtqueues(ndev);
>>   			err = setup_driver(ndev);
>>   			if (err) {
>>   				mlx5_vdpa_warn(mvdev, "failed to setup driver\n");
>> -- 
>> 1.8.3.1
>>