netdev - RE: [PATCH] net: mctp: Add MCTP PCIe VDM transport driver

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID:
 <SEZPR06MB5763AD0FC90DD6AF334555DA9051A@SEZPR06MB5763.apcprd06.prod.outlook.com>
Date: Thu, 17 Jul 2025 03:07:01 +0000
From: YH Chung <yh_chung@...eedtech.com>
To: Jeremy Kerr <jk@...econstruct.com.au>, "matt@...econstruct.com.au"
	<matt@...econstruct.com.au>, "andrew+netdev@...n.ch" <andrew+netdev@...n.ch>,
	"davem@...emloft.net" <davem@...emloft.net>, "edumazet@...gle.com"
	<edumazet@...gle.com>, "kuba@...nel.org" <kuba@...nel.org>,
	"pabeni@...hat.com" <pabeni@...hat.com>, "linux-kernel@...r.kernel.org"
	<linux-kernel@...r.kernel.org>, "netdev@...r.kernel.org"
	<netdev@...r.kernel.org>, BMC-SW <BMC-SW@...eedtech.com>
CC: Khang D Nguyen <khangng@...eremail.onmicrosoft.com>
Subject: RE: [PATCH] net: mctp: Add MCTP PCIe VDM transport driver

Hi Jeremy,

>> > Do we really need an abstraction for MCTP VDM drivers? How many are
>> > you expecting? Can you point us to a client of the VDM abstraction?
>> >
>> > There is some value in keeping consistency for the MCTP lladdr
>> > formats across PCIe transports, but I'm not convinced we need a
>> > whole abstraction layer for this.
>> >
>> We plan to follow existing upstream MCTP transports-such as I²C, I³C,
>> and USB-by abstracting the hardware-specific details into a common
>> interface and focus on the transport binding protocol in this patch.
>> This driver has been tested by our AST2600 and AST2700 MCTP driver.
>
>Is that one driver (for both 2600 and 2700) or two?
>
It's written in one file to reuse common functions, but the behavior is slightly different depending on the hardware.

>I'm still not convinced you need an abstraction layer specifically for VDM
>transports, especially as you're forcing a specific driver model with the deferral
>of TX to a separate thread.
>
We followed the same implementation pattern as mctp-i2c and mctp-i3c, both of which also abstract the hardware layer via the existing i2c/i3c device interface and use a kernel thread for TX data. 
That said, I believe it's reasonable to remove the kernel thread and instead send the packet directly downward after we remove the route table part.
Could you kindly help to share your thoughts on which approach might be preferable?

>Even if this abstraction layer is a valid approach, it would not be merged until
>you also have an in-kernel user of it.
>
We have an MCTP controller driver (mentioned above, which we used for testing this driver) that utilizes this abstraction for transmission, which we're planning to upstream in the future.
REF Link: https://github.com/AspeedTech-BMC/linux/blob/aspeed-master-v6.6/drivers/soc/aspeed/aspeed-mctp.c




Since the PCIe VDM driver is implemented as an abstraction layer, our current plan is to submit it separately as we believe the review process for each driver can proceed independently.
Would you recommend submitting both drivers together in the same patch series for review, or is it preferable to keep them separate? 

>> > > TX path uses a dedicated kernel thread and ptr_ring: skbs queued
>> > > by the MCTP stack are enqueued on the ring and processed in-thread
>context.
>> >
>> > Is this somehow more suitable than the existing netdev queues?
>> >
>> Our current implementation has two operations that take time: 1)
>> Configure the PCIe VDM routing type as DSP0238 requested if we are
>> sending certain ctrl message command codes like Discovery Notify
>> request or Endpoint Discovery response. 2) Update the BDF/EID routing
>> table.
>
>More on this below, but: you don't need to handle either of those in a transport
>driver.
>
>> > > +struct mctp_pcie_vdm_route_info {
>> > > +       u8 eid;
>> > > +       u8 dirty;
>> > > +       u16 bdf_addr;
>> > > +       struct hlist_node hnode;
>> > > +};
>> >
>> > Why are you keeping your own routing table in the transport driver?
>> > We already have the route and neighbour tables in the MCTP core code.
>> >
>> > Your assumption that you can intercept MCTP control messages to keep
>> > a separate routing table will not work.
>> >
>> We maintain a routing table in the transport driver to record the
>> mapping between BDFs and EIDs, as the BDF is only present in the PCIe
>> VDM header of received Endpoint Discovery Responses. This information
>> is not forwarded to the MCTP core in the MCTP payload. We update the
>> table with this mapping before forwarding the MCTP message to the
>> core.
>
>There is already support for this in the MCTP core - the neighbour table
>maintains mappings between EID and link-layer addresses. In the case of a PCIe
>VDM transport, those link-layer addresses contain the bdf data.
>
>The transport driver only needs to be involved in packet transmit and receive.
>Your bdf data is provided to the driver through the
>header_ops->create() op.
>
>Any management of the neighbour table is performed by userspace, which has
>visibility of the link-layer addresses of incoming skbs - assuming your drivers are
>properly setting the cb->haddr data on receive.
>
Agreed, we'll remove the table.

>This has already been established through the existing transports that consume
>lladdr data (i2c and i3c). You should not be handling the lladdr-to-EID mapping
>*at all* in the transport driver.
>
>> Additionally, if the MCTP Bus Owner operates in Endpoint (EP) role on
>> the PCIe bus, it cannot obtain the physical addresses of other devices
>> from the PCIe bus.
>
>Sure it can, there are mechanisms for discovery. However, that's entirely
>handled by userspace, which can update the existing neighbour table.
>
>> Agreed. In our implement, we always fill in the "Route By ID" type
>> when core asks us to create the header, since we don't know the
>> correct type to fill at that time.  And later we update the Route type
>> based on the ctrl message code when doing TX. I think it would be nice
>> if we can have a uniformed address format to get the actual Route type
>> by passed-in lladdr when creating the header.
>
>OK, so we'd include the routing type in the lladdr data then.
>
Could you share if there's any preliminary prototype or idea for the format of the lladdr that core plans to implement, particularly regarding how the route type should be encoded or parsed?

For example, should we expect the first byte of the daddr parameter in the create_hdr op to indicate the route type?

>Cheers,
>
>
>Jeremy