lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 18 Dec 2020 12:38:16 +0100
From:   Stefano Garzarella <sgarzare@...hat.com>
To:     Jason Wang <jasowang@...hat.com>
Cc:     virtualization@...ts.linux-foundation.org,
        Stefan Hajnoczi <stefanha@...hat.com>,
        Laurent Vivier <lvivier@...hat.com>,
        linux-kernel@...r.kernel.org, Eli Cohen <elic@...dia.com>,
        "Michael S. Tsirkin" <mst@...hat.com>,
        Max Gurtovoy <mgurtovoy@...dia.com>
Subject: Re: [PATCH RFC 00/12] vdpa: generalize vdpa simulator and add block
 device

On Mon, Nov 16, 2020 at 11:37:48AM +0800, Jason Wang wrote:
>
>On 2020/11/13 下午9:47, Stefano Garzarella wrote:
>>Thanks to Max that started this work!
>>I took his patches, and extended the block simulator a bit.
>>
>>This series moves the network device simulator in a new module
>>(vdpa_sim_net) and leaves the generic functions in the vdpa_sim core
>>module, allowing the possibility to add new vDPA device simulators.
>>Then we added a new vdpa_sim_blk module to simulate a block device.
>>
>>I'm not sure about patch 11 ("vringh: allow vringh_iov_xfer() to skip
>>bytes when ptr is NULL"), maybe we can add a new functions instead of
>>modify vringh_iov_xfer().
>>
>>As Max reported, I'm also seeing errors with vdpa_sim_blk related to
>>iotlb and vringh when there is high load, these are some of the error
>>messages I can see randomly:
>>
>>   vringh: Failed to access avail idx at 00000000e8deb2cc
>>   vringh: Failed to read head: idx 6289 address 00000000e1ad1d50
>>   vringh: Failed to get flags at 000000006635d7a3
>>
>>   virtio_vdpa vdpa0: vringh_iov_push_iotlb() error: -14 offset: 
>>   0x2840000 len: 0x20000
>>   virtio_vdpa vdpa0: vringh_iov_pull_iotlb() error: -14 offset: 
>>   0x58ee000 len: 0x3000
>>
>>These errors should all be related to the fact that iotlb_translate()
>>fails with -EINVAL, so it seems that we miss some mapping.
>
>
>Is this only reproducible when there's multiple co-current accessing 
>of IOTLB? If yes, it's probably a hint that some kind of 
>synchronization is still missed somewhere.
>
>It might be useful to log the dma_map/unmp in both virtio_ring and 
>vringh to see who is missing the map.
>

Just an update about these issues with vdpa-sim-blk.
I've been focusing a little bit on these failures over the last few days 
and have found two issues related to the IOTLB/IOMMU:

1. Some requests coming from the block layer fills the SG list with 
multiple buffers that had the same physical address. This happens for 
example while using 'mkfs', at some points multiple sectors are zeroed 
so multiple SG elements point to the same physical page that is zeroed.
Since we are using vhost_iotlb_del_range() in the vdpasim_unmap_page(), 
this removes all the overlapped ranges. I fixed removing a single map in 
vdpasim_unmap_page(), but has an alternative we can implement some kind 
of reference counts.

2. There was a race between dma_map/unmap and the worker thread, since 
both are accessing the IOMMU. Taking the iommu_lock while using 
vhost_iotlb_* API in the worker thread fixes the "vringh: Failed to *" 
issues.

Whit these issues fixed the vdpa-blk simulator seems to work well.
I'll send the patches next week or after the break.

Thanks,
Stefano

Powered by blists - more mailing lists