[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANn89iKUYMb_4vJ5GAE0-BUmM7JNuHo_p8oHbfJfatYKBX8ouw@mail.gmail.com>
Date: Wed, 7 Dec 2022 13:53:35 +0100
From: Eric Dumazet <edumazet@...gle.com>
To: Tariq Toukan <ttoukan.linux@...il.com>
Cc: "David S . Miller" <davem@...emloft.net>,
Jakub Kicinski <kuba@...nel.org>,
Paolo Abeni <pabeni@...hat.com>,
Tariq Toukan <tariqt@...dia.com>, Wei Wang <weiwan@...gle.com>,
netdev@...r.kernel.org, eric.dumazet@...il.com
Subject: Re: [PATCH net-next 2/3] net/mlx4: MLX4_TX_BOUNCE_BUFFER_SIZE depends
on MAX_SKB_FRAGS
On Wed, Dec 7, 2022 at 1:40 PM Tariq Toukan <ttoukan.linux@...il.com> wrote:
>
>
>
> On 12/6/2022 7:50 AM, Eric Dumazet wrote:
> > Google production kernel has increased MAX_SKB_FRAGS to 45
> > for BIG-TCP rollout.
> >
> > Unfortunately mlx4 TX bounce buffer is not big enough whenever
> > an skb has up to 45 page fragments.
> >
> > This can happen often with TCP TX zero copy, as one frag usually
> > holds 4096 bytes of payload (order-0 page).
> >
> > Tested:
> > Kernel built with MAX_SKB_FRAGS=45
> > ip link set dev eth0 gso_max_size 185000
> > netperf -t TCP_SENDFILE
> >
> > I made sure that "ethtool -G eth0 tx 64" was properly working,
> > ring->full_size being set to 16.
> >
> > Signed-off-by: Eric Dumazet <edumazet@...gle.com>
> > Reported-by: Wei Wang <weiwan@...gle.com>
> > Cc: Tariq Toukan <tariqt@...dia.com>
> > ---
> > drivers/net/ethernet/mellanox/mlx4/mlx4_en.h | 16 ++++++++++++----
> > 1 file changed, 12 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
> > index 7cc288db2a64f75ffe64882e3c25b90715e68855..120b8c361e91d443f83f100a1afabcabc776a92a 100644
> > --- a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
> > +++ b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
> > @@ -89,8 +89,18 @@
> > #define MLX4_EN_FILTER_HASH_SHIFT 4
> > #define MLX4_EN_FILTER_EXPIRY_QUOTA 60
> >
> > -/* Typical TSO descriptor with 16 gather entries is 352 bytes... */
> > -#define MLX4_TX_BOUNCE_BUFFER_SIZE 512
> > +#define CTRL_SIZE sizeof(struct mlx4_wqe_ctrl_seg)
> > +#define DS_SIZE sizeof(struct mlx4_wqe_data_seg)
> > +
> > +/* Maximal size of the bounce buffer:
> > + * 256 bytes for LSO headers.
> > + * CTRL_SIZE for control desc.
> > + * DS_SIZE if skb->head contains some payload.
> > + * MAX_SKB_FRAGS frags.
> > + */
> > +#define MLX4_TX_BOUNCE_BUFFER_SIZE (256 + CTRL_SIZE + DS_SIZE + \
> > + MAX_SKB_FRAGS * DS_SIZE)
> > +
> > #define MLX4_MAX_DESC_TXBBS (MLX4_TX_BOUNCE_BUFFER_SIZE / TXBB_SIZE)
> >
>
> Now as MLX4_TX_BOUNCE_BUFFER_SIZE might not be a multiple of TXBB_SIZE,
> simple integer division won't work to calculate the max num of TXBBs.
> Roundup is needed.
I do not see why a roundup is needed. This seems like obfuscation to me.
A divide by TXBB_SIZE always "works".
A round up is already done in mlx4_en_xmit()
/* Align descriptor to TXBB size */
desc_size = ALIGN(real_size, TXBB_SIZE);
nr_txbb = desc_size >> LOG_TXBB_SIZE;
Then the check is :
if (unlikely(nr_txbb > MLX4_MAX_DESC_TXBBS)) {
if (netif_msg_tx_err(priv))
en_warn(priv, "Oversized header or SG list\n");
goto tx_drop_count;
}
If we allocate X extra bytes (in case MLX4_TX_BOUNCE_BUFFER_SIZE %
TXBB_SIZE == X),
we are not going to use them anyway.
Powered by blists - more mailing lists