netdev - Re: [PATCH net-next] net/mlx5e: Transmit small messages in linear skb

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <a8e529b2-1454-4c3f-aa49-b3d989e1014a@intel.com>
Date: Wed, 4 Dec 2024 15:32:59 +0100
From: Alexander Lobakin <aleksander.lobakin@...el.com>
To: Alexandra Winter <wintera@...ux.ibm.com>
CC: Rahul Rameshbabu <rrameshbabu@...dia.com>, Saeed Mahameed
	<saeedm@...dia.com>, Tariq Toukan <tariqt@...dia.com>, Leon Romanovsky
	<leon@...nel.org>, David Miller <davem@...emloft.net>, Jakub Kicinski
	<kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>, Eric Dumazet
	<edumazet@...gle.com>, Andrew Lunn <andrew+netdev@...n.ch>, Nils Hoppmann
	<niho@...ux.ibm.com>, <netdev@...r.kernel.org>, <linux-s390@...r.kernel.org>,
	Heiko Carstens <hca@...ux.ibm.com>, Vasily Gorbik <gor@...ux.ibm.com>,
	Alexander Gordeev <agordeev@...ux.ibm.com>, Christian Borntraeger
	<borntraeger@...ux.ibm.com>, Sven Schnelle <svens@...ux.ibm.com>, "Thorsten
 Winkler" <twinkler@...ux.ibm.com>, Simon Horman <horms@...nel.org>
Subject: Re: [PATCH net-next] net/mlx5e: Transmit small messages in linear skb

From: Alexandra Winter <wintera@...ux.ibm.com>
Date: Wed,  4 Dec 2024 15:02:30 +0100

> Linearize the skb if the device uses IOMMU and the data buffer can fit
> into one page. So messages can be transferred in one transfer to the card
> instead of two.

I'd expect this to be on the generic level, not copied over the drivers?
Not sure about PAGE_SIZE, but I never saw a NIC/driver/platform where
copying let's say 256 bytes would be slower than 2x dma_map (even with
direct DMA).

> 
> Performance issue:
> ------------------
> Since commit 472c2e07eef0 ("tcp: add one skb cache for tx")
> tcp skbs are always non-linear. Especially on platforms with IOMMU,
> mapping and unmapping two pages instead of one per transfer can make a
> noticeable difference. On s390 we saw a 13% degradation in throughput,
> when running uperf with a request-response pattern with 1k payload and
> 250 connections parallel. See [0] for a discussion.
> 
> This patch mitigates these effects using a work-around in the mlx5 driver.
> 
> Notes on implementation:
> ------------------------
> TCP skbs never contain any tailroom, so skb_linearize() will allocate a
> new data buffer.
> No need to handle rc of skb_linearize(). If it fails, we continue with the
> unchanged skb.
> 
> As mentioned in the discussion, an alternative, but more invasive approach
> would be: premapping a coherent piece of memory in which you can copy
> small skbs.

Yes, that one would be better.

[...]

> @@ -269,6 +270,10 @@ static void mlx5e_sq_xmit_prepare(struct mlx5e_txqsq *sq, struct sk_buff *skb,
>  {
>  	struct mlx5e_sq_stats *stats = sq->stats;
>  
> +	/* Don't require 2 IOMMU TLB entries, if one is sufficient */
> +	if (use_dma_iommu(sq->pdev) && skb->truesize <= PAGE_SIZE)

1. What's with the direct DMA? I believe it would benefit, too?
2. Why truesize, not something like

	if (skb->len <= some_sane_value_maybe_1k)

3. As Eric mentioned, PAGE_SIZE can be up to 256 Kb, I don't think
   it's a good idea to rely on this.
   Some test-based hardcode would be enough (i.e. threshold on which
   DMA mapping starts performing better).

> +		skb_linearize(skb);
> +
>  	if (skb_is_gso(skb)) {

BTW can't there be a case when the skb is GSO, but its truesize is
PAGE_SIZE and linearize will be way too slow (not sure it's possible,
just guessing)?

>  		int hopbyhop;
>  		u16 ihs = mlx5e_tx_get_gso_ihs(sq, skb, &hopbyhop);

Thanks,
Olek