[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID:
<SA3PR21MB3867B98BBA96FF3BA7F42F3FCA8DA@SA3PR21MB3867.namprd21.prod.outlook.com>
Date: Fri, 16 Jan 2026 16:44:33 +0000
From: Haiyang Zhang <haiyangz@...rosoft.com>
To: Jakub Kicinski <kuba@...nel.org>
CC: Haiyang Zhang <haiyangz@...ux.microsoft.com>,
"linux-hyperv@...r.kernel.org" <linux-hyperv@...r.kernel.org>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>, KY Srinivasan
<kys@...rosoft.com>, Wei Liu <wei.liu@...nel.org>, Dexuan Cui
<DECUI@...rosoft.com>, Long Li <longli@...rosoft.com>, Andrew Lunn
<andrew+netdev@...n.ch>, "David S. Miller" <davem@...emloft.net>, Eric
Dumazet <edumazet@...gle.com>, Paolo Abeni <pabeni@...hat.com>, Konstantin
Taranov <kotaranov@...rosoft.com>, Simon Horman <horms@...nel.org>, Erni Sri
Satya Vennela <ernis@...ux.microsoft.com>, Shradha Gupta
<shradhagupta@...ux.microsoft.com>, Saurabh Sengar
<ssengar@...ux.microsoft.com>, Aditya Garg <gargaditya@...ux.microsoft.com>,
Dipayaan Roy <dipayanroy@...ux.microsoft.com>, Shiraz Saleem
<shirazsaleem@...rosoft.com>, "linux-kernel@...r.kernel.org"
<linux-kernel@...r.kernel.org>, "linux-rdma@...r.kernel.org"
<linux-rdma@...r.kernel.org>, Paul Rosswurm <paulros@...rosoft.com>
Subject: RE: [EXTERNAL] Re: [PATCH V2,net-next, 1/2] net: mana: Add support
for coalesced RX packets on CQE
> -----Original Message-----
> From: Jakub Kicinski <kuba@...nel.org>
> Sent: Thursday, January 15, 2026 9:15 PM
> To: Haiyang Zhang <haiyangz@...rosoft.com>
> Cc: Haiyang Zhang <haiyangz@...ux.microsoft.com>; linux-
> hyperv@...r.kernel.org; netdev@...r.kernel.org; KY Srinivasan
> <kys@...rosoft.com>; Wei Liu <wei.liu@...nel.org>; Dexuan Cui
> <DECUI@...rosoft.com>; Long Li <longli@...rosoft.com>; Andrew Lunn
> <andrew+netdev@...n.ch>; David S. Miller <davem@...emloft.net>; Eric
> Dumazet <edumazet@...gle.com>; Paolo Abeni <pabeni@...hat.com>; Konstantin
> Taranov <kotaranov@...rosoft.com>; Simon Horman <horms@...nel.org>; Erni
> Sri Satya Vennela <ernis@...ux.microsoft.com>; Shradha Gupta
> <shradhagupta@...ux.microsoft.com>; Saurabh Sengar
> <ssengar@...ux.microsoft.com>; Aditya Garg
> <gargaditya@...ux.microsoft.com>; Dipayaan Roy
> <dipayanroy@...ux.microsoft.com>; Shiraz Saleem
> <shirazsaleem@...rosoft.com>; linux-kernel@...r.kernel.org; linux-
> rdma@...r.kernel.org; Paul Rosswurm <paulros@...rosoft.com>
> Subject: Re: [EXTERNAL] Re: [PATCH V2,net-next, 1/2] net: mana: Add
> support for coalesced RX packets on CQE
>
> On Thu, 15 Jan 2026 19:57:44 +0000 Haiyang Zhang wrote:
> > > > When coalescing is enabled, the device waits for packets which can
> > > > have the CQE coalesced with previous packet(s). That coalescing
> process
> > > > is finished (and a CQE written to the appropriate CQ) when the CQE
> is
> > > > filled with 4 pkts, or time expired, or other device specific logic
> is
> > > > satisfied.
> > >
> > > See, what I'm afraid is happening here is that you are enabling
> > > completion coalescing (how long the device keeps the CQE pending).
> > > Which is _not_ what rx_max_coalesced_frames controls for most NICs.
> > > For most NICs rx_max_coalesced_frames controls IRQ generation logic.
> > >
> > > The NIC first buffers up CQEs for typically single digit usecs, and
> > > then once CQE timer exipred and writeback happened it starts an IRQ
> > > coalescing timer. Once the IRQ coalescing timer expires IRQ is
> > > triggered, which schedules NAPI. (broad strokes, obviously many
> > > differences and optimizations exist)
> > >
> > > Is my guess correct? Are you controlling CQE coalescing>
> > >
> > > Can you control the timeout instead of the frame count?
> >
> > Our NIC's timeout value cannot be controlled by driver. Also, the
> > timeout may be changed in future NIC HW.
> >
> > So, I use the ethtool/rx-frames, which is either 1 or 4 on our
> > NIC, to switch the CQE coalescing feature on/off.
>
> I feel like this is not the first time I'm having a conversation with
> you where you are not answering my direct questions, not just one
> sliver. IDK why you're doing this, but being able to participate
> in an email exchange is a bare minimum for participating upstream.
> Please consider this a warning.
Sure, let me try to reply again -- does this (see below) answer all
your questions? And, feel free to ask any further questions, we are
willing to collaborate with you and other upstream people at any time :)
> The NIC first buffers up CQEs for typically single digit usecs, and
> then once CQE timer exipred and writeback happened it starts an IRQ
> coalescing timer. Once the IRQ coalescing timer expires IRQ is
> triggered, which schedules NAPI. (broad strokes, obviously many
> differences and optimizations exist)
> Is my guess correct? Are you controlling CQE coalescing?
Yes, it's correct. And we are controlling "CQE coalescing".
>
> If I interpret your reply correctly you are indeed coalescing writeback.
Yes, we are coalescing CQE writeback.
> You need to add a new param to the uAPI.
Since this feature is not common to other NICs, can we use an
ethtool private flag instead?
When the flag is set, the CQE coalescing will be enabled and put
up to 4 pkts in a CQE.
> Please add both size and
> timeout. Expose the timeout as read only if your device doesn't support
> controlling it per queue.
Does the "size" mean the max pks per CQE (1 or 4)?
The timeout value is not even exposed to driver, and subject to change
in the future. Also the HW mechanism is proprietary... So, can we not
"expose" the timeout value in "ethtool -c" outputs, because it's not
available at driver level?
Thanks,
- Haiyang
Powered by blists - more mailing lists