netdev - Re: [PATCH net-next] net/mlx5e: Transmit small messages in linear skb

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20f488f1-8f30-4ae0-8e9c-9910e81e0e1a@nvidia.com>
Date: Fri, 13 Dec 2024 21:41:04 +0100
From: Dragos Tatulea <dtatulea@...dia.com>
To: Niklas Schnelle <schnelle@...ux.ibm.com>,
 Alexandra Winter <wintera@...ux.ibm.com>,
 Alexander Lobakin <aleksander.lobakin@...el.com>
Cc: Rahul Rameshbabu <rrameshbabu@...dia.com>,
 Saeed Mahameed <saeedm@...dia.com>, Tariq Toukan <tariqt@...dia.com>,
 Leon Romanovsky <leon@...nel.org>, David Miller <davem@...emloft.net>,
 Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
 Eric Dumazet <edumazet@...gle.com>, Andrew Lunn <andrew+netdev@...n.ch>,
 Nils Hoppmann <niho@...ux.ibm.com>, netdev@...r.kernel.org,
 linux-s390@...r.kernel.org, Heiko Carstens <hca@...ux.ibm.com>,
 Vasily Gorbik <gor@...ux.ibm.com>, Alexander Gordeev
 <agordeev@...ux.ibm.com>, Christian Borntraeger <borntraeger@...ux.ibm.com>,
 Sven Schnelle <svens@...ux.ibm.com>,
 Thorsten Winkler <twinkler@...ux.ibm.com>, Simon Horman <horms@...nel.org>,
 Jason Gunthorpe <jgg@...dia.com>
Subject: Re: [PATCH net-next] net/mlx5e: Transmit small messages in linear skb



On 11.12.24 18:50, Niklas Schnelle wrote:
> On Wed, 2024-12-11 at 18:28 +0100, Dragos Tatulea wrote:
>>>>>>
> 
> ---8<---
> 
>>
>>> On 09.12.24 12:36, Tariq Toukan wrote:
>>>> Hi,
>>>>
>>>> Many approaches in the past few years are going the opposite direction, trying to avoid copies ("zero-copy").
>>>>
>>>> In many cases, copy up to PAGE_SIZE means copy everything.
>>>> For high NIC speeds this is not realistic.
>>>>
>>>> Anyway, based on past experience, threshold should not exceed "max header size" (128/256b).
>>>
>>>>> 1KB is still to large. As Tariq mentioned, the threshold should not
>>>>> exceed 128/256B. I am currently testing this with 256B on x86. So far no
>>>>> regressions but I need to play with it more.
>> I checked on a x86 system with CX7 and we seem to get ~4% degradation
>> when using this approach. The patch was modified a bit according to
>> previous discussions (diff at end of mail).
>>
>> Here's how I tested:
>> - uperf client side has many queues.
>> - uperf server side has single queue with interrupts pinned to a single
>>   CPU. This is to better isolate CPU behaviour. The idea is to have the
>>   CPU on the server side saturated or close to saturation.
>> - Used the uperf 1B read x 1B write scenario with 50 and 100 threads
>>   (profile attached).
>>   Both the on-cpu and off-cpu cases were checked.
>> - Code change was done only on server side.
> 
> I'm assuming this is with the x86 default IOMMU pass-through mode?
It was in a VM with PCI passthrough for the device.

> Would you be able and willing to try with iommu.passthrough=0
> and amd_iommu=on respectively intel_iommu=on? Check
> /sys/bus/pci/devices/<dev>/iommu_group/type for "DMA-FQ" to make sure
> the dma-iommu code is used. This is obviously not a "tuned for all out
> perf at any cost" configuration but it is recommended in hardening
> guides and I believe some ARM64 systems also default to using the IOMMU
> for bare metal DMA API use. So it's not an unexpected configuration
> either.
> 
I got hold of a bare metal system where I could turn iommu passthrough
off and confirm iommu_group/type as being DMA_FQ. But the results are
inconclusive due to instability. I will look into this again after the
holidays.

Thanks,
Dragos