[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <474c1f71-3a5c-4fe5-a01e-80f2ba95fd7e@bytedance.com>
Date: Mon, 3 Nov 2025 11:13:03 -0800
From: Zijian Zhang <zijianzhang@...edance.com>
To: Alexander Duyck <alexander.duyck@...il.com>,
Tariq Toukan <ttoukan.linux@...il.com>
Cc: netdev@...r.kernel.org, davem@...emloft.net, kuba@...nel.org,
pabeni@...hat.com, edumazet@...gle.com, andrew+netdev@...n.ch,
saeedm@...dia.com, gal@...dia.com, leonro@...dia.com, witu@...dia.com,
parav@...dia.com, tariqt@...dia.com, hkelam@...vell.com,
Alexander Lobakin <aleksander.lobakin@...el.com>,
Jesper Dangaard Brouer <hawk@...nel.org>,
Toke Høiland-Jørgensen <toke@...hat.com>,
Lorenzo Bianconi <lorenzo@...nel.org>,
Jesse Brandeburg <jbrandeburg@...udflare.com>,
Salil Mehta <salil.mehta@...wei.com>
Subject: Re: [PATCH net-next v2] net/mlx5e: Modify mlx5e_xdp_xmit sq selection
Thanks for the info and explanation, that makes a lot of sense :)
Modulo here is too costly.
On 11/2/25 4:13 PM, Alexander Duyck wrote:
> On Sun, Nov 2, 2025 at 5:02 AM Tariq Toukan <ttoukan.linux@...il.com> wrote:
>> On 01/11/2025 1:10, Zijian Zhang wrote:
>
> ...
>
>>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
>>> index 5d51600935a6..6225734b256a 100644
>>> --- a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
>>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
>>> @@ -855,13 +855,10 @@ int mlx5e_xdp_xmit(struct net_device *dev, int n, struct xdp_frame **frames,
>>> if (unlikely(flags & ~XDP_XMIT_FLAGS_MASK))
>>> return -EINVAL;
>>>
>>> - sq_num = smp_processor_id();
>>> -
>>> - if (unlikely(sq_num >= priv->channels.num))
>>> - return -ENXIO;
>>> -
>>> + sq_num = smp_processor_id() % priv->channels.num;
>>
>> Modulo is a costly operation.
>> A while loop with subtraction would likely converge faster.
>
> I agree. The modulo is optimizing for the worst exception case, and
> heavily penalizing the case where it does nothing. A while loop in
> most cases will likely just be a test and short jump which would be
> two or three cycles whereas this would cost you somewhere in the 10s
> of cycles for most processors as I recall.
Powered by blists - more mailing lists