lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20250805191436.GY8494@horms.kernel.org>
Date: Tue, 5 Aug 2025 20:14:36 +0100
From: Simon Horman <horms@...nel.org>
To: Mingming Cao <mmc@...ux.ibm.com>
Cc: netdev@...r.kernel.org, bjking1@...ux.ibm.com, haren@...ux.ibm.com,
	ricklind@...ux.ibm.com, kuba@...nel.org, edumazet@...gle.com,
	pabeni@...hat.com, linuxppc-dev@...ts.ozlabs.org,
	maddy@...ux.ibm.com, mpe@...erman.id.au
Subject: Re: [PATCH v3 net-next] ibmvnic: Increase max subcrq indirect
 entries with fallback

On Mon, Aug 04, 2025 at 04:17:04PM -0700, Mingming Cao wrote:
> POWER8 support a maximum of 16 subcrq indirect descriptor entries per
>  H_SEND_SUB_CRQ_INDIRECT call, while POWER9 and newer hypervisors
>  support up to 128 entries. Increasing the max number of indirect
> descriptor entries improves batching efficiency and reduces
> hcall overhead, which enhances throughput under large workload on POWER9+.
> 
> Currently, ibmvnic driver always uses a fixed number of max indirect
> descriptor entries (16). send_subcrq_indirect() treats all hypervisor
> errors the same:
>  - Cleanup and Drop the entire batch of descriptors.
>  - Return an error to the caller.
>  - Rely on TCP/IP retransmissions to recover.
>  - If the hypervisor returns H_PARAMETER (e.g., because 128
>    entries are not supported on POWER8), the driver will continue
>    to drop batches, resulting in unnecessary packet loss.
> 
> In this patch:
> Raise the default maximum indirect entries to 128 to improve ibmvnic
> batching on morden platform. But also gracefully fall back to
> 16 entries for Power 8 systems.
> 
> Since there is no VIO interface to query the hypervisor’s supported
> limit, vnic handles send_subcrq_indirect() H_PARAMETER errors:
>  - On first H_PARAMETER failure, log the failure context
>  - Reduce max_indirect_entries to 16 and allow the single batch to drop.
>  - Subsequent calls automatically use the correct lower limit,
>     avoiding repeated drops.
> 
> The goal is to  optimizes performance on modern systems while handles
> falling back for older POWER8 hypervisors.
> 
> Performance shows 40% improvements with MTU (1500) on largework load.
> 
> --------------------------------------
> Changes since v2:
> link to v2: https://www.spinics.net/lists/netdev/msg1104669.html
> 
> -- was Patch 4 from a patch series v2. v2 introduced a module parameter
> for backward compatibility. Based on review feedback, This patch handles
> older systems fall back case without adding a module parameter.
> 
> Signed-off-by: Mingming Cao <mmc@...ux.ibm.com>
> Reviewed-by: Brian King <bjking1@...ux.ibm.com>
> Reviewed-by: Haren Myneni <haren@...ux.ibm.com>
> ---
>  drivers/net/ethernet/ibm/ibmvnic.c | 56 ++++++++++++++++++++++++++----
>  drivers/net/ethernet/ibm/ibmvnic.h |  6 ++--
>  2 files changed, 53 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c

...

> @@ -862,6 +862,19 @@ static void replenish_rx_pool(struct ibmvnic_adapter *adapter,
>  failure:
>  	if (lpar_rc != H_PARAMETER && lpar_rc != H_CLOSED)
>  		dev_err_ratelimited(dev, "rx: replenish packet buffer failed\n");
> +
> +	/* Detect platform limit H_PARAMETER */
> +	if (lpar_rc == H_PARAMETER &&
> +	    adapter->cur_max_ind_descs > IBMVNIC_MAX_IND_DESC_MIN) {
> +		netdev_info(adapter->netdev,
> +			    "H_PARAMETER, set ind desc to safe limit %u\n",
> +			    IBMVNIC_MAX_IND_DESC_MIN);
> +		adapter->cur_max_ind_descs = IBMVNIC_MAX_IND_DESC_MIN;
> +	}

Hi Mingming, all,

The logic above seems to appear twice in this patch.
I think it would be good to consolidate it somehow.
E.g. in a helper function.

> +
> +	/* for all error case, temporarily drop only this batch
> +	 * Rely on TCP/IP retransmissions to retry and recover
> +	 */

Thanks for adding this comment.
Although perhaps 'for' -> 'For'.

Likewise below.

>  	for (i = ind_bufp->index - 1; i >= 0; --i) {
>  		struct ibmvnic_rx_buff *rx_buff;
>  
> @@ -2381,16 +2394,33 @@ static int ibmvnic_tx_scrq_flush(struct ibmvnic_adapter *adapter,
>  		rc = send_subcrq_direct(adapter, handle,
>  					(u64 *)ind_bufp->indir_arr);
>  
> -	if (rc)
> +	if (rc) {
> +		dev_err_ratelimited(&adapter->vdev->dev,
> +				    "tx_flush failed, rc=%u (%llu entries dma=%pad handle=%llx)\n",
> +				    rc, entries, &dma_addr, handle);
> +		/* Detect platform limit H_PARAMETER */
> +		if (rc == H_PARAMETER &&
> +		    adapter->cur_max_ind_descs > IBMVNIC_MAX_IND_DESC_MIN) {
> +			netdev_info(adapter->netdev,
> +				    "H_PARAMETER, set ind descs to safe limit %u\n",
> +				    IBMVNIC_MAX_IND_DESC_MIN);
> +			adapter->cur_max_ind_descs = IBMVNIC_MAX_IND_DESC_MIN;
> +		}
> +
> +		/* for all error case, temporarily drop only this batch
> +		 * Rely on TCP/IP retransmissions to retry and recover
> +		 */
>  		ibmvnic_tx_scrq_clean_buffer(adapter, tx_scrq);
> -	else
> +	} else {
>  		ind_bufp->index = 0;
> +	}
>  	return rc;
>  }

...

> @@ -6369,6 +6399,17 @@ static int ibmvnic_reset_init(struct ibmvnic_adapter *adapter, bool reset)
>  			rc = reset_sub_crq_queues(adapter);
>  		}
>  	} else {
> +		if (adapter->reset_reason == VNIC_RESET_MOBILITY) {
> +			/* post migrtione reset the max
> +			 * indirect descriptors per hcall to be default max
> +			 * (e.g p8 ->p10)
> +			 * if the destination is on the platform supports
> +			 * do not support max (e.g. p10->p8) the threshold
> +			 * will be reduced to safe min limit for p8 later
> +			 */

nits: Post migration, reset.

      The line breaking seems uneven.

      And if p8 and p10 are POWER8 and POWER10 then I think it would
      be worth spelling that out.
 
> +			adapter->cur_max_ind_descs = IBMVNIC_MAX_IND_DESC_MAX;
> +		}
> +
>  		rc = init_sub_crqs(adapter);
>  	}
>  

...

> diff --git a/drivers/net/ethernet/ibm/ibmvnic.h b/drivers/net/ethernet/ibm/ibmvnic.h
> index 246ddce753f9..829a16116812 100644
> --- a/drivers/net/ethernet/ibm/ibmvnic.h
> +++ b/drivers/net/ethernet/ibm/ibmvnic.h
> @@ -29,8 +29,9 @@
>  #define IBMVNIC_BUFFS_PER_POOL	100
>  #define IBMVNIC_MAX_QUEUES	16
>  #define IBMVNIC_MAX_QUEUE_SZ   4096
> -#define IBMVNIC_MAX_IND_DESCS  16
> -#define IBMVNIC_IND_ARR_SZ	(IBMVNIC_MAX_IND_DESCS * 32)
> +#define IBMVNIC_MAX_IND_DESC_MAX 128
> +#define IBMVNIC_MAX_IND_DESC_MIN 16

...MAX...{MAX,MIN} seems like an unfortunate name.
But I don't feel particularly strongly about this one.

> +#define IBMVNIC_IND_MAX_ARR_SZ (IBMVNIC_MAX_IND_DESC_MAX * 32)
>  
>  #define IBMVNIC_TSO_BUF_SZ	65536
>  #define IBMVNIC_TSO_BUFS	64

...

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ