[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <ilh6xgancwvjyeoqmekaemqodbwtr6qfl7npyey5tnw5jb5qt2@oqce6b5jajl2>
Date: Thu, 28 Aug 2025 11:03:41 +0200
From: Jiri Pirko <jiri@...nulli.us>
To: Shay Drory <shayd@...dia.com>
Cc: davem@...emloft.net, edumazet@...gle.com, kuba@...nel.org,
pabeni@...hat.com, horms@...nel.org, netdev@...r.kernel.org,
linux-kernel@...r.kernel.org, ozsh@...dia.com, mbloch@...dia.com, tariqt@...dia.com,
saeedm@...dia.com
Subject: Re: [RFC net-next] net: devlink: add port function attr for vport ↔ eswitch metadata forwarding
:q
Thu, Aug 28, 2025 at 08:52:29AM +0200, shayd@...dia.com wrote:
>In some product architectures, the eswitch manager and the exception
>handler run as separate user space processes. The eswitch manager uses
>the physical uplink device, while the slow path handler uses a virtual
>device.
>
>In this architectures, the eswitch manager application program the HW to
>send the exception packets to specific vport, and on top this vport
>virtual device, the exception application is running and handling these
>packets.
>
>Currently, when packets are forwarded between the eswitch and a vport,
>no per-packet metadata is preserved. As a result, the slow path handler
>cannot implement features that require visibility into the packet's
>hardware context.
A vendor-specific slow path. Basically you provide a possibility for
user to pass a binary blob to hw along with every TX'ed packet and
vice versa. That looks quite odd tbh. I mean, isn't this horribly
breaking the socket abstraction? Also, isn't this horribly breaking the
forwarding offloading model when HW should just mimic the behaviour of
the kernel?
>
>This RFC introduces two optional devlink port-function attributes. When
>these two capabilities are enable for a function of the port, the device
>is making the necessary preparations for the function to exchange
>metadata with the eswitch.
>
>rx_metadata
>When enabled, packets received by the vport from the eswitch will be
>prepended with a device-specific metadata header. This allows the slow
>path application to receive the full context of the packet as seen by
>the hardware.
>
>tx_metadata
>When enabled, the vport can send a packet prepended with a metadata
>header. The eswitch hardware consumes this metadata to steer the packet.
>
>Together they allow the said app to process slow-path events in
>user-space at line rate while still leaving the common fast-path in
>hardware.
>
>User-space interface
>Enable / disable is done with existing devlink port-function syntax:
>
>$ devlink port function set pci/0000:06:00.0/3 rx_metadata enable
>$ devlink port function set pci/0000:06:00.0/3 tx_metadata enable
>
>Querying the state shows the new knobs:
>
>$ devlink port function show pci/0000:06:00.0/3
> pci/0000:06:00.0/3:
> roce enabled rx_metadata enabled tx_metadata enabled
>
>Disabling is symmetrical:
>
>$ devlink port function set pci/0000:06:00.0/3 rx_metadata disable
>$ devlink port function set pci/0000:06:00.0/3 tx_metadata disable
>
>Signed-off-by: Shay Drory <shayd@...dia.com>
>
>
>--
>2.38.1
>
Powered by blists - more mailing lists