[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <42643918b686206c97076cf9fd2f02718e85b108.camel@codeconstruct.com.au>
Date: Wed, 13 Aug 2025 12:46:12 +0800
From: Jeremy Kerr <jk@...econstruct.com.au>
To: admiyo@...amperecomputing.com, Matt Johnston
<matt@...econstruct.com.au>, Andrew Lunn <andrew+netdev@...n.ch>, "David
S. Miller" <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>, Jakub
Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>
Cc: netdev@...r.kernel.org, linux-kernel@...r.kernel.org, Sudeep Holla
<sudeep.holla@....com>, Jonathan Cameron <Jonathan.Cameron@...wei.com>,
Huisong Li <lihuisong@...wei.com>
Subject: Re: [PATCH v24 1/1] mctp pcc: Implement MCTP over PCC Transport
Hi Adam,
From the MCTP device side, mostly looks okay, but there are some things
that may need work on the mailbox interface. As I have mentioned
earlier, I'm not super familiar with that, so some of these may be more
clarifications on my side rather than changes...
I see that this is failing to build on net-next, as the mailbox changes
are not present there. You may need to coordinate for the merge.
> +#define MCTP_PAYLOAD_LENGTH 256
> +#define MCTP_CMD_LENGTH 4
> +#define MCTP_PCC_VERSION 0x1 /* DSP0292 a single version: 1 */
Regarding the comment: I don't think DSP0292 does define a version for
the transport data format? There's the DSP0236 header version, but
that's distinct from the transport implementation. You don't seem to
use these three now anyway, perhaps drop.
And just confirming: the pcc header format is now all host-endian - or
is that a firmware-specified endianness? (what happens if the OS may
boot in either BE or LE? is that at all possible for any PCC-capable
hardware, or are we otherwise guaranteed that we're the same endianness
as the consumer?)
> +struct mctp_pcc_mailbox {
> + u32 index;
> + struct pcc_mbox_chan *chan;
> + struct mbox_client client;
> + struct sk_buff_head packets;
If you're ever able to stash a context pointer in mailbox requests, that
would give you a great opportunity to remove the complexity of
maintaining this list.
No change required now necessarily, just something to track for later
simplification.
> +};
> +
> +/* The netdev structure. One of these per PCC adapter. */
> +struct mctp_pcc_ndev {
> + /* spinlock to serialize access to PCC outbox buffer and registers
> + * Note that what PCC calls registers are memory locations, not CPU
> + * Registers. They include the fields used to synchronize access
> + * between the OS and remote endpoints.
> + *
> + * Only the Outbox needs a spinlock, to prevent multiple
> + * sent packets triggering multiple attempts to over write
> + * the outbox. The Inbox buffer is controlled by the remote
> + * service and a spinlock would have no effect.
> + */
> + spinlock_t lock;
You don't use this anywhere, but I think you do need synchronisation
still. See my comment on tx_done.
> + struct net_device *ndev;
> + struct acpi_device *acpi_device;
> + struct mctp_pcc_mailbox inbox;
> + struct mctp_pcc_mailbox outbox;
> +};
> +
> +static void *mctp_pcc_rx_alloc(struct mbox_client *c, int size)
> +{
> + struct mctp_pcc_ndev *mctp_pcc_ndev =
> + container_of(c, struct mctp_pcc_ndev, inbox.client);
> + struct mctp_pcc_mailbox *box = &mctp_pcc_ndev->inbox;
> + struct sk_buff *skb;
> +
> + if (size > mctp_pcc_ndev->ndev->mtu)
> + return NULL;
> + skb = netdev_alloc_skb(mctp_pcc_ndev->ndev, size);
> + if (!skb)
> + return NULL;
> + skb_put(skb, size);
> + skb->protocol = htons(ETH_P_MCTP);
> +
> + skb_queue_head(&box->packets, skb);
> +
> + return skb->data;
> +}
> +
> +static void mctp_pcc_client_rx_callback(struct mbox_client *c, void *buffer)
> +{
> + struct mctp_pcc_ndev *mctp_pcc_ndev;
> + struct pcc_header pcc_header;
> + struct mctp_skb_cb *cb;
> + struct sk_buff *skb;
> +
> + mctp_pcc_ndev = container_of(c, struct mctp_pcc_ndev, inbox.client);
> + if (!buffer) {
> + dev_dstats_rx_dropped(mctp_pcc_ndev->ndev);
> + return;
> + }
> +
> + skb_queue_walk(&mctp_pcc_ndev->inbox.packets, skb) {
> + if (skb->data != buffer)
> + continue;
> + skb_unlink(skb, &mctp_pcc_ndev->inbox.packets);
> + dev_dstats_rx_add(mctp_pcc_ndev->ndev, skb->len);
> + skb_reset_mac_header(skb);
> + skb_pull(skb, sizeof(pcc_header));
> + skb_reset_network_header(skb);
> + cb = __mctp_cb(skb);
> + cb->halen = 0;
> + netif_rx(skb);
> + return;
> + }
> + pr_warn("Unmatched packet in mctp-pcc inbox packet list");
Minor: would be helpful to include the device instance info here.
Perhaps netdev_warn()?
> +}
> +
> +static void mctp_pcc_tx_done(struct mbox_client *c, void *mssg, int r)
> +{
> + struct mctp_pcc_mailbox *box;
> + struct sk_buff *skb;
> +
> + box = container_of(c, struct mctp_pcc_mailbox, client);
> + skb_queue_walk(&box->packets, skb) {
> + if (skb->data == mssg) {
> + skb_unlink(skb, &box->packets);
> + dev_consume_skb_any(skb);
> + break;
> + }
> + }
How are updates to this queue synchronised against concurrent start_xmit?
Similarly, for the RX path: what about drain_packets() vs. RX
alloc/completion - is there any guarantee that we have quiesced the rx
channel before mctp_cleanup_netdev() runs? It seems like this currently
depends on ordering of the devm cleanup actions.
> +}
> +
> +static netdev_tx_t mctp_pcc_tx(struct sk_buff *skb, struct net_device *ndev)
> +{
> + struct mctp_pcc_ndev *mpnd = netdev_priv(ndev);
> + struct pcc_header *pcc_header;
> + int len = skb->len;
> + int rc;
> +
> + rc = skb_cow_head(skb, sizeof(*pcc_header));
> + if (rc) {
> + dev_dstats_tx_dropped(ndev);
> + kfree_skb(skb);
> + return NETDEV_TX_OK;
> + }
> +
> + pcc_header = skb_push(skb, sizeof(*pcc_header));
> + pcc_header->signature = PCC_SIGNATURE | mpnd->outbox.index;
> + pcc_header->flags = PCC_CMD_COMPLETION_NOTIFY;
> + memcpy(&pcc_header->command, MCTP_SIGNATURE, MCTP_SIGNATURE_LENGTH);
> + pcc_header->length = len + MCTP_SIGNATURE_LENGTH;
> + skb_queue_head(&mpnd->outbox.packets, skb);
> +
> + rc = mbox_send_message(mpnd->outbox.chan->mchan, skb->data);
> +
> + if (rc < 0) {
> + skb_unlink(skb, &mpnd->outbox.packets);
> + return NETDEV_TX_BUSY;
> + }
Is the lack of flow-control here an issue? Is there any facility for the
mailbox API to give you an indication of tx availability, or should you be
tracking that in the number of pending TX messages?
Or, in general: what are you expectations on retransmit for this
NETDEV_TX_BUSY case?
> +
> + dev_dstats_tx_add(ndev, len);
> + return NETDEV_TX_OK;
> +}
> +
> +static const struct net_device_ops mctp_pcc_netdev_ops = {
> + .ndo_start_xmit = mctp_pcc_tx,
> +};
> +
> +static const struct mctp_netdev_ops mctp_netdev_ops = {
> + NULL
> +};
Minor: mctp_register_netdev() allows passing NULL for the ops argument,
you don't need this empty ops struct.
> +static void drain_packets(struct sk_buff_head *list)
> +{
> + struct sk_buff *skb;
> +
> + while (!skb_queue_empty(list)) {
> + skb = skb_dequeue(list);
> + dev_consume_skb_any(skb);
> + }
> +}
> +
> +static void mctp_cleanup_netdev(void *data)
> +{
> + struct mctp_pcc_ndev *mctp_pcc_ndev;
> + struct net_device *ndev = data;
> +
> + mctp_pcc_ndev = netdev_priv(ndev);
> + drain_packets(&mctp_pcc_ndev->outbox.packets);
> + drain_packets(&mctp_pcc_ndev->inbox.packets);
You're draining these queues on device removal. I suspect you may want
to do this on ndo_stop instead.
This would prevent stale skbs being transferred over a change in network
namespace, which is probably a good thing, but may depend on details
of the PCC channel behaviour.
Cheers,
Jeremy
Powered by blists - more mailing lists