[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20250805191436.GY8494@horms.kernel.org>
Date: Tue, 5 Aug 2025 20:14:36 +0100
From: Simon Horman <horms@...nel.org>
To: Mingming Cao <mmc@...ux.ibm.com>
Cc: netdev@...r.kernel.org, bjking1@...ux.ibm.com, haren@...ux.ibm.com,
ricklind@...ux.ibm.com, kuba@...nel.org, edumazet@...gle.com,
pabeni@...hat.com, linuxppc-dev@...ts.ozlabs.org,
maddy@...ux.ibm.com, mpe@...erman.id.au
Subject: Re: [PATCH v3 net-next] ibmvnic: Increase max subcrq indirect
entries with fallback
On Mon, Aug 04, 2025 at 04:17:04PM -0700, Mingming Cao wrote:
> POWER8 support a maximum of 16 subcrq indirect descriptor entries per
> H_SEND_SUB_CRQ_INDIRECT call, while POWER9 and newer hypervisors
> support up to 128 entries. Increasing the max number of indirect
> descriptor entries improves batching efficiency and reduces
> hcall overhead, which enhances throughput under large workload on POWER9+.
>
> Currently, ibmvnic driver always uses a fixed number of max indirect
> descriptor entries (16). send_subcrq_indirect() treats all hypervisor
> errors the same:
> - Cleanup and Drop the entire batch of descriptors.
> - Return an error to the caller.
> - Rely on TCP/IP retransmissions to recover.
> - If the hypervisor returns H_PARAMETER (e.g., because 128
> entries are not supported on POWER8), the driver will continue
> to drop batches, resulting in unnecessary packet loss.
>
> In this patch:
> Raise the default maximum indirect entries to 128 to improve ibmvnic
> batching on morden platform. But also gracefully fall back to
> 16 entries for Power 8 systems.
>
> Since there is no VIO interface to query the hypervisor’s supported
> limit, vnic handles send_subcrq_indirect() H_PARAMETER errors:
> - On first H_PARAMETER failure, log the failure context
> - Reduce max_indirect_entries to 16 and allow the single batch to drop.
> - Subsequent calls automatically use the correct lower limit,
> avoiding repeated drops.
>
> The goal is to optimizes performance on modern systems while handles
> falling back for older POWER8 hypervisors.
>
> Performance shows 40% improvements with MTU (1500) on largework load.
>
> --------------------------------------
> Changes since v2:
> link to v2: https://www.spinics.net/lists/netdev/msg1104669.html
>
> -- was Patch 4 from a patch series v2. v2 introduced a module parameter
> for backward compatibility. Based on review feedback, This patch handles
> older systems fall back case without adding a module parameter.
>
> Signed-off-by: Mingming Cao <mmc@...ux.ibm.com>
> Reviewed-by: Brian King <bjking1@...ux.ibm.com>
> Reviewed-by: Haren Myneni <haren@...ux.ibm.com>
> ---
> drivers/net/ethernet/ibm/ibmvnic.c | 56 ++++++++++++++++++++++++++----
> drivers/net/ethernet/ibm/ibmvnic.h | 6 ++--
> 2 files changed, 53 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c
...
> @@ -862,6 +862,19 @@ static void replenish_rx_pool(struct ibmvnic_adapter *adapter,
> failure:
> if (lpar_rc != H_PARAMETER && lpar_rc != H_CLOSED)
> dev_err_ratelimited(dev, "rx: replenish packet buffer failed\n");
> +
> + /* Detect platform limit H_PARAMETER */
> + if (lpar_rc == H_PARAMETER &&
> + adapter->cur_max_ind_descs > IBMVNIC_MAX_IND_DESC_MIN) {
> + netdev_info(adapter->netdev,
> + "H_PARAMETER, set ind desc to safe limit %u\n",
> + IBMVNIC_MAX_IND_DESC_MIN);
> + adapter->cur_max_ind_descs = IBMVNIC_MAX_IND_DESC_MIN;
> + }
Hi Mingming, all,
The logic above seems to appear twice in this patch.
I think it would be good to consolidate it somehow.
E.g. in a helper function.
> +
> + /* for all error case, temporarily drop only this batch
> + * Rely on TCP/IP retransmissions to retry and recover
> + */
Thanks for adding this comment.
Although perhaps 'for' -> 'For'.
Likewise below.
> for (i = ind_bufp->index - 1; i >= 0; --i) {
> struct ibmvnic_rx_buff *rx_buff;
>
> @@ -2381,16 +2394,33 @@ static int ibmvnic_tx_scrq_flush(struct ibmvnic_adapter *adapter,
> rc = send_subcrq_direct(adapter, handle,
> (u64 *)ind_bufp->indir_arr);
>
> - if (rc)
> + if (rc) {
> + dev_err_ratelimited(&adapter->vdev->dev,
> + "tx_flush failed, rc=%u (%llu entries dma=%pad handle=%llx)\n",
> + rc, entries, &dma_addr, handle);
> + /* Detect platform limit H_PARAMETER */
> + if (rc == H_PARAMETER &&
> + adapter->cur_max_ind_descs > IBMVNIC_MAX_IND_DESC_MIN) {
> + netdev_info(adapter->netdev,
> + "H_PARAMETER, set ind descs to safe limit %u\n",
> + IBMVNIC_MAX_IND_DESC_MIN);
> + adapter->cur_max_ind_descs = IBMVNIC_MAX_IND_DESC_MIN;
> + }
> +
> + /* for all error case, temporarily drop only this batch
> + * Rely on TCP/IP retransmissions to retry and recover
> + */
> ibmvnic_tx_scrq_clean_buffer(adapter, tx_scrq);
> - else
> + } else {
> ind_bufp->index = 0;
> + }
> return rc;
> }
...
> @@ -6369,6 +6399,17 @@ static int ibmvnic_reset_init(struct ibmvnic_adapter *adapter, bool reset)
> rc = reset_sub_crq_queues(adapter);
> }
> } else {
> + if (adapter->reset_reason == VNIC_RESET_MOBILITY) {
> + /* post migrtione reset the max
> + * indirect descriptors per hcall to be default max
> + * (e.g p8 ->p10)
> + * if the destination is on the platform supports
> + * do not support max (e.g. p10->p8) the threshold
> + * will be reduced to safe min limit for p8 later
> + */
nits: Post migration, reset.
The line breaking seems uneven.
And if p8 and p10 are POWER8 and POWER10 then I think it would
be worth spelling that out.
> + adapter->cur_max_ind_descs = IBMVNIC_MAX_IND_DESC_MAX;
> + }
> +
> rc = init_sub_crqs(adapter);
> }
>
...
> diff --git a/drivers/net/ethernet/ibm/ibmvnic.h b/drivers/net/ethernet/ibm/ibmvnic.h
> index 246ddce753f9..829a16116812 100644
> --- a/drivers/net/ethernet/ibm/ibmvnic.h
> +++ b/drivers/net/ethernet/ibm/ibmvnic.h
> @@ -29,8 +29,9 @@
> #define IBMVNIC_BUFFS_PER_POOL 100
> #define IBMVNIC_MAX_QUEUES 16
> #define IBMVNIC_MAX_QUEUE_SZ 4096
> -#define IBMVNIC_MAX_IND_DESCS 16
> -#define IBMVNIC_IND_ARR_SZ (IBMVNIC_MAX_IND_DESCS * 32)
> +#define IBMVNIC_MAX_IND_DESC_MAX 128
> +#define IBMVNIC_MAX_IND_DESC_MIN 16
...MAX...{MAX,MIN} seems like an unfortunate name.
But I don't feel particularly strongly about this one.
> +#define IBMVNIC_IND_MAX_ARR_SZ (IBMVNIC_MAX_IND_DESC_MAX * 32)
>
> #define IBMVNIC_TSO_BUF_SZ 65536
> #define IBMVNIC_TSO_BUFS 64
...
Powered by blists - more mailing lists