lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6e0faec9-5840-653e-cb43-86545a48e65d@redhat.com>
Date:   Tue, 12 Nov 2019 21:34:32 +0100
From:   Auger Eric <eric.auger@...hat.com>
To:     Shameerali Kolothum Thodi <shameerali.kolothum.thodi@...wei.com>,
        "eric.auger.pro@...il.com" <eric.auger.pro@...il.com>,
        "iommu@...ts.linux-foundation.org" <iommu@...ts.linux-foundation.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "kvm@...r.kernel.org" <kvm@...r.kernel.org>,
        "kvmarm@...ts.cs.columbia.edu" <kvmarm@...ts.cs.columbia.edu>,
        "joro@...tes.org" <joro@...tes.org>,
        "alex.williamson@...hat.com" <alex.williamson@...hat.com>,
        "jacob.jun.pan@...ux.intel.com" <jacob.jun.pan@...ux.intel.com>,
        "yi.l.liu@...el.com" <yi.l.liu@...el.com>,
        "jean-philippe.brucker@....com" <jean-philippe.brucker@....com>,
        "will.deacon@....com" <will.deacon@....com>,
        "robin.murphy@....com" <robin.murphy@....com>
Cc:     "kevin.tian@...el.com" <kevin.tian@...el.com>,
        "vincent.stehle@....com" <vincent.stehle@....com>,
        "ashok.raj@...el.com" <ashok.raj@...el.com>,
        "marc.zyngier@....com" <marc.zyngier@....com>,
        "tina.zhang@...el.com" <tina.zhang@...el.com>,
        Linuxarm <linuxarm@...wei.com>, "xuwei (O)" <xuwei5@...wei.com>
Subject: Re: [PATCH v9 00/11] SMMUv3 Nested Stage Setup (VFIO part)

Hi Shameer,

On 11/12/19 6:56 PM, Shameerali Kolothum Thodi wrote:
> Hi Eric,
> 
>> -----Original Message-----
>> From: Shameerali Kolothum Thodi
>> Sent: 12 November 2019 14:21
>> To: 'Auger Eric' <eric.auger@...hat.com>; eric.auger.pro@...il.com;
>> iommu@...ts.linux-foundation.org; linux-kernel@...r.kernel.org;
>> kvm@...r.kernel.org; kvmarm@...ts.cs.columbia.edu; joro@...tes.org;
>> alex.williamson@...hat.com; jacob.jun.pan@...ux.intel.com;
>> yi.l.liu@...el.com; jean-philippe.brucker@....com; will.deacon@....com;
>> robin.murphy@....com
>> Cc: kevin.tian@...el.com; vincent.stehle@....com; ashok.raj@...el.com;
>> marc.zyngier@....com; tina.zhang@...el.com; Linuxarm
>> <linuxarm@...wei.com>; xuwei (O) <xuwei5@...wei.com>
>> Subject: RE: [PATCH v9 00/11] SMMUv3 Nested Stage Setup (VFIO part)
>>
> [...]
>>>>>> I am trying to get this running on one of our platform that has smmuv3
>> dual
>>>>>> stage support. I am seeing some issues with this when an ixgbe vf dev is
>>>>>> made pass-through and is behind a vSMMUv3 in Guest.
>>>>>>
>>>>>> Kernel used : https://github.com/eauger/linux/tree/v5.3.0-rc0-2stage-v9
>>>>>> Qemu: https://github.com/eauger/qemu/tree/v4.1.0-rc0-2stage-rfcv5
>>>>>>
>>>>>> And this is my Qemu cmd line,
>>>>>>
>>>>>> ./qemu-system-aarch64
>>>>>> -machine virt,kernel_irqchip=on,gic-version=3,iommu=smmuv3 -cpu host
>> \
>>>>>> -kernel Image \
>>>>>> -drive if=none,file=ubuntu,id=fs \
>>>>>> -device virtio-blk-device,drive=fs \
>>>>>> -device vfio-pci,host=0000:01:10.1 \
>>>>>> -bios QEMU_EFI.fd \
>>>>>> -net none \
>>>>>> -m 4G \
>>>>>> -nographic -D -d -enable-kvm \
>>>>>> -append "console=ttyAMA0 root=/dev/vda rw acpi=force"
>>>>>>
>>>>>> The basic ping from Guest works fine,
>>>>>> root@...ntu:~# ping 10.202.225.185
>>>>>> PING 10.202.225.185 (10.202.225.185) 56(84) bytes of data.
>>>>>> 64 bytes from 10.202.225.185: icmp_seq=2 ttl=64 time=0.207 ms
>>>>>> 64 bytes from 10.202.225.185: icmp_seq=3 ttl=64 time=0.203 ms
>>>>>> ...
>>>>>>
>>>>>> But if I increase ping packet size,
>>>>>>
>>>>>> root@...ntu:~# ping -s 1024 10.202.225.185
>>>>>> PING 10.202.225.185 (10.202.225.185) 1024(1052) bytes of data.
>>>>>> 1032 bytes from 10.202.225.185: icmp_seq=22 ttl=64 time=0.292 ms
>>>>>> 1032 bytes from 10.202.225.185: icmp_seq=23 ttl=64 time=0.207 ms
>>>>>> From 10.202.225.169 icmp_seq=66 Destination Host Unreachable
>>>>>> From 10.202.225.169 icmp_seq=67 Destination Host Unreachable
>>>>>> From 10.202.225.169 icmp_seq=68 Destination Host Unreachable
>>>>>> From 10.202.225.169 icmp_seq=69 Destination Host Unreachable
>>>>>>
>>>>>> And from Host kernel I get,
>>>>>> [  819.970742] ixgbe 0000:01:00.1 enp1s0f1: 3 Spoofed packets
>> detected
>>>>>> [  824.002707] ixgbe 0000:01:00.1 enp1s0f1: 1 Spoofed packets
>> detected
>>>>>> [  828.034683] ixgbe 0000:01:00.1 enp1s0f1: 1 Spoofed packets
>> detected
>>>>>> [  830.050673] ixgbe 0000:01:00.1 enp1s0f1: 4 Spoofed packets
>> detected
>>>>>> [  832.066659] ixgbe 0000:01:00.1 enp1s0f1: 1 Spoofed packets
>> detected
>>>>>> [  834.082640] ixgbe 0000:01:00.1 enp1s0f1: 3 Spoofed packets
>> detected
>>>>>>
>>>>>> Also noted that iperf cannot work as it fails to establish the connection
>>> with
>>>>> iperf
>>>>>> server.
>>>>>>
>>>>>> Please find attached the trace logs(vfio*, smmuv3*) from Qemu for your
>>>>> reference.
>>>>>> I haven't debugged this further yet and thought of checking with you if
>> this
>>> is
>>>>>> something you have seen already or not. Or maybe I am missing
>> something
>>>>> here?
>>>>>
>>>>> Please can you try to edit and modify hw/vfio/common.c, function
>>>>> vfio_iommu_unmap_notify
>>>>>
>>>>>
>>>>> /*
>>>>>     if (size <= 0x10000) {
>>>>>         ustruct.info.cache = IOMMU_CACHE_INV_TYPE_IOTLB;
>>>>>         ustruct.info.granularity = IOMMU_INV_GRANU_ADDR;
>>>>>         ustruct.info.addr_info.flags =
>>> IOMMU_INV_ADDR_FLAGS_ARCHID;
>>>>>         if (iotlb->leaf) {
>>>>>             ustruct.info.addr_info.flags |=
>>>>> IOMMU_INV_ADDR_FLAGS_LEAF;
>>>>>         }
>>>>>         ustruct.info.addr_info.archid = iotlb->arch_id;
>>>>>         ustruct.info.addr_info.addr = start;
>>>>>         ustruct.info.addr_info.granule_size = size;
>>>>>         ustruct.info.addr_info.nb_granules = 1;
>>>>>         trace_vfio_iommu_addr_inv_iotlb(iotlb->arch_id, start, size, 1,
>>>>>                                         iotlb->leaf);
>>>>>     } else {
>>>>> */
>>>>>         ustruct.info.cache = IOMMU_CACHE_INV_TYPE_IOTLB;
>>>>>         ustruct.info.granularity = IOMMU_INV_GRANU_PASID;
>>>>>         ustruct.info.pasid_info.archid = iotlb->arch_id;
>>>>>         ustruct.info.pasid_info.flags =
>>> IOMMU_INV_PASID_FLAGS_ARCHID;
>>>>>         trace_vfio_iommu_asid_inv_iotlb(iotlb->arch_id);
>>>>> //    }
>>>>>
>>>>> This modification leads to invalidate the whole asid each time we get a
>>>>> guest TLBI instead of invalidating the single IOVA (TLBI). On my end, I
>>>>> saw this was the cause of such kind of issues. Please let me know if it
>>>>> fixes your perf issues
>>>>
>>>> Yes, this seems to fix the issue.
>>>>
>>>> root@...ntu:~# iperf -c 10.202.225.185
>>>> ------------------------------------------------------------
>>>> Client connecting to 10.202.225.185, TCP port 5001
>>>> TCP window size: 85.0 KByte (default)
>>>> ------------------------------------------------------------
>>>> [  3] local 10.202.225.169 port 47996 connected with 10.202.225.185 port
>>> 5001
>>>> [ ID] Interval       Transfer     Bandwidth
>>>> [  3]  0.0-10.0 sec  2.27 GBytes  1.95 Gbits/sec
>>>> root@...ntu:~#
>>>>
>>>> But the performance seems to be very poor as this is a 10Gbps interface(Of
>>> course
>>>> invalidating the whole asid may not be very helpful). It is interesting that
>> why
>>> the
>>>> single iova invalidation is not working.
>>>>
>>>>  and then we may discuss further about the test
>>>>> configuration.
>>>>
>>>> Sure. Please let me know.
>>>
>>> I reported that issue earlier on the ML. I have not been able to find
>>> any integration issue in the kernel/qemu code but maybe I am too blind
>>> now as I wrote it ;-) When I get a guest stage1 TLBI I cascade it down
>>> to the physical IOMMU. I also pass the LEAF flag.
>>
>> Ok.
>>
>>> As you are an expert of the SMMUv3 PMU, if your implementation has any
>>> and you have cycles to look at this, it would be helpful to run it and
>>> see if something weird gets highlighted.
>>
>> :). Sure. I will give it a try and report back if anything suspicious.
> 
> I just noted that CMDQ_OP_TLBI_NH_VA is missing the vmid filed which seems
> to be the cause for single IOVA TLBI not working properly.
> 
> I had this fix in arm-smmuv3.c,
> 
> @@ -947,6 +947,7 @@ static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct arm_smmu_cmdq_ent *ent)
> 		cmd[1] |= FIELD_PREP(CMDQ_CFGI_1_RANGE, 31);
> 		break;
> 	case CMDQ_OP_TLBI_NH_VA:
> +		cmd[0] |= FIELD_PREP(CMDQ_TLBI_0_VMID, ent->tlbi.vmid);
Damn, I did not see that! That's it. ASID invalidation fills this field
indeed. You may post an independent patch for that.> 		cmd[0] |=
FIELD_PREP(CMDQ_TLBI_0_ASID, ent->tlbi.asid);
> 		cmd[1] |= FIELD_PREP(CMDQ_TLBI_1_LEAF, ent->tlbi.leaf);
> 		cmd[1] |= ent->tlbi.addr & CMDQ_TLBI_1_VA_MASK;
> 
> 
> With this, your original qemu branch is working. 
> 
> root@...ntu:~# iperf -c 10.202.225.185
> ------------------------------------------------------------
> Client connecting to 10.202.225.185, TCP port 5001 TCP window size: 85.0 KByte (default)
> ------------------------------------------------------------
> [  3] local 10.202.225.169 port 44894 connected with 10.202.225.185 port 5001
> [ ID] Interval       Transfer     Bandwidth
> [  3]  0.0-10.0 sec  3.21 GBytes  2.76 Gbits/sec
> 
> Could you please check this...
> 
> I also have a rebase of your patches on top of 5.4-rc5. This has some optimizations
> From Will such as batched TLBI inv. Please find it here,
> 
> https://github.com/hisilicon/kernel-dev/tree/private-vSMMUv3-v9-v5.4-rc5
> 
> This gives me a better performance with iperf,
> 
> root@...ntu:~# iperf -c 10.202.225.185
> ------------------------------------------------------------
> Client connecting to 10.202.225.185, TCP port 5001 TCP window size: 85.0 KByte (default)
> ------------------------------------------------------------
> [  3] local 10.202.225.169 port 55450 connected with 10.202.225.185 port 5001
> [ ID] Interval       Transfer     Bandwidth
> [  3]  0.0-10.0 sec  4.91 GBytes  4.22 Gbits/sec root@...ntu:~#
> 
> If possible please check this branch as well.

To be honest I don't really know what to do with this work. Despite the
efforts, this has suffered from a lack of traction in the community. My
last attempt to explain the use cases, upon Will's request at Plumber,
has not received any comment (https://lkml.org/lkml/2019/9/20/104).

I think I will post a rebased version with your fix, as a matter to get
a clean snapshot. If you think this work is useful for your projects,
please let it know on the ML.

Thank you again!

Eric
> 
> Thanks,
> Shameer
> 
>> Thanks,
>> Shameer
>>
>>
>>> Thanks
>>>
>>> Eric
>>>>
>>>> Cheers,
>>>> Shameer
>>>>
>>>>> Thanks
>>>>>
>>>>> Eric
>>>>>
>>>>>
>>>>>
>>>>>>
>>>>>> Please let me know.
>>>>>>
>>>>>> Thanks,
>>>>>> Shameer
>>>>>>
>>>>>>> Best Regards
>>>>>>>
>>>>>>> Eric
>>>>>>>
>>>>>>> This series can be found at:
>>>>>>> https://github.com/eauger/linux/tree/v5.3.0-rc0-2stage-v9
>>>>>>>
>>>>>>> It series includes Tina's patch steming from
>>>>>>> [1] "[RFC PATCH v2 1/3] vfio: Use capability chains to handle device
>>>>>>> specific irq" plus patches originally contributed by Yi.
>>>>>>>
>>>>>>> History:
>>>>>>>
>>>>>>> v8 -> v9:
>>>>>>> - introduce specific irq framework
>>>>>>> - single fault region
>>>>>>> - iommu_unregister_device_fault_handler failure case not handled
>>>>>>>   yet.
>>>>>>>
>>>>>>> v7 -> v8:
>>>>>>> - rebase on top of v5.2-rc1 and especially
>>>>>>>   8be39a1a04c1  iommu/arm-smmu-v3: Add a master->domain
>> pointer
>>>>>>> - dynamic alloc of s1_cfg/s2_cfg
>>>>>>> - __arm_smmu_tlb_inv_asid/s1_range_nosync
>>>>>>> - check there is no HW MSI regions
>>>>>>> - asid invalidation using pasid extended struct (change in the uapi)
>>>>>>> - add s1_live/s2_live checks
>>>>>>> - move check about support of nested stages in domain finalise
>>>>>>> - fixes in error reporting according to the discussion with Robin
>>>>>>> - reordered the patches to have first iommu/smmuv3 patches and then
>>>>>>>   VFIO patches
>>>>>>>
>>>>>>> v6 -> v7:
>>>>>>> - removed device handle from bind/unbind_guest_msi
>>>>>>> - added "iommu/smmuv3: Nested mode single MSI doorbell per domain
>>>>>>>   enforcement"
>>>>>>> - added few uapi comments as suggested by Jean, Jacop and Alex
>>>>>>>
>>>>>>> v5 -> v6:
>>>>>>> - Fix compilation issue when CONFIG_IOMMU_API is unset
>>>>>>>
>>>>>>> v4 -> v5:
>>>>>>> - fix bug reported by Vincent: fault handler unregistration now happens
>> in
>>>>>>>   vfio_pci_release
>>>>>>> - IOMMU_FAULT_PERM_* moved outside of struct definition + small
>>>>>>>   uapi changes suggested by Kean-Philippe (except fetch_addr)
>>>>>>> - iommu: introduce device fault report API: removed the PRI part.
>>>>>>> - see individual logs for more details
>>>>>>> - reset the ste abort flag on detach
>>>>>>>
>>>>>>> v3 -> v4:
>>>>>>> - took into account Alex, jean-Philippe and Robin's comments on v3
>>>>>>> - rework of the smmuv3 driver integration
>>>>>>> - add tear down ops for msi binding and PASID table binding
>>>>>>> - fix S1 fault propagation
>>>>>>> - put fault reporting patches at the beginning of the series following
>>>>>>>   Jean-Philippe's request
>>>>>>> - update of the cache invalidate and fault API uapis
>>>>>>> - VFIO fault reporting rework with 2 separate regions and one
>> mmappable
>>>>>>>   segment for the fault queue
>>>>>>> - moved to PATCH
>>>>>>>
>>>>>>> v2 -> v3:
>>>>>>> - When registering the S1 MSI binding we now store the device handle.
>>> This
>>>>>>>   addresses Robin's comment about discimination of devices beonging
>>> to
>>>>>>>   different S1 groups and using different physical MSI doorbells.
>>>>>>> - Change the fault reporting API: use
>> VFIO_PCI_DMA_FAULT_IRQ_INDEX
>>> to
>>>>>>>   set the eventfd and expose the faults through an mmappable fault
>>> region
>>>>>>>
>>>>>>> v1 -> v2:
>>>>>>> - Added the fault reporting capability
>>>>>>> - asid properly passed on invalidation (fix assignment of multiple
>>>>>>>   devices)
>>>>>>> - see individual change logs for more info
>>>>>>>
>>>>>>>
>>>>>>> Eric Auger (8):
>>>>>>>   vfio: VFIO_IOMMU_SET_MSI_BINDING
>>>>>>>   vfio/pci: Add VFIO_REGION_TYPE_NESTED region type
>>>>>>>   vfio/pci: Register an iommu fault handler
>>>>>>>   vfio/pci: Allow to mmap the fault queue
>>>>>>>   vfio: Add new IRQ for DMA fault reporting
>>>>>>>   vfio/pci: Add framework for custom interrupt indices
>>>>>>>   vfio/pci: Register and allow DMA FAULT IRQ signaling
>>>>>>>   vfio: Document nested stage control
>>>>>>>
>>>>>>> Liu, Yi L (2):
>>>>>>>   vfio: VFIO_IOMMU_SET_PASID_TABLE
>>>>>>>   vfio: VFIO_IOMMU_CACHE_INVALIDATE
>>>>>>>
>>>>>>> Tina Zhang (1):
>>>>>>>   vfio: Use capability chains to handle device specific irq
>>>>>>>
>>>>>>>  Documentation/vfio.txt              |  77 ++++++++
>>>>>>>  drivers/vfio/pci/vfio_pci.c         | 283
>>>>> ++++++++++++++++++++++++++--
>>>>>>>  drivers/vfio/pci/vfio_pci_intrs.c   |  62 ++++++
>>>>>>>  drivers/vfio/pci/vfio_pci_private.h |  24 +++
>>>>>>>  drivers/vfio/pci/vfio_pci_rdwr.c    |  45 +++++
>>>>>>>  drivers/vfio/vfio_iommu_type1.c     | 166 ++++++++++++++++
>>>>>>>  include/uapi/linux/vfio.h           | 109 ++++++++++-
>>>>>>>  7 files changed, 747 insertions(+), 19 deletions(-)
>>>>>>>
>>>>>>> --
>>>>>>> 2.20.1
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> kvmarm mailing list
>>>>>>> kvmarm@...ts.cs.columbia.edu
>>>>>>> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
>>>>
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ