linux-kernel - Re: [PATCH net-next v27 1/1] mctp pcc: Implement MCTP over PCC Transport

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e28eeb4f-98a4-4db6-af96-c576019d3af1@amperemail.onmicrosoft.com>
Date: Fri, 29 Aug 2025 11:59:03 -0400
From: Adam Young <admiyo@...eremail.onmicrosoft.com>
To: Jeremy Kerr <jk@...econstruct.com.au>,
 Adam Young <admiyo@...amperecomputing.com>,
 Matt Johnston <matt@...econstruct.com.au>,
 Andrew Lunn <andrew+netdev@...n.ch>, "David S. Miller"
 <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>,
 Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>
Cc: netdev@...r.kernel.org, linux-kernel@...r.kernel.org,
 Sudeep Holla <sudeep.holla@....com>,
 Jonathan Cameron <Jonathan.Cameron@...wei.com>,
 Huisong Li <lihuisong@...wei.com>
Subject: Re: [PATCH net-next v27 1/1] mctp pcc: Implement MCTP over PCC
 Transport


On 8/29/25 05:16, Jeremy Kerr wrote:
> Hi Adam,
>
> Still some issues with skb management / synchronisation in the cleanup
> path; comments inline (as well as some minor things, but mostly optional
> on addressing those).
>
> You seem to be sending follow-up patches fairly quickly, rather than
> responding to queries about the code, or discussing the approach. If you
> have no points to discuss, that's fine, but please do feel free to chat
> about options, or ask for clarifications, before jumping into the next
> patch revision. Even letting us know why you may have rejected a
> suggestion is helpful for me in the review process too.

I thought I was  addressing all issues.  Perhaps I do need to slow 
down....different work styles: I am used to responding to code reviews 
as quickly as possible.

more comments in-line.



>
>> +static void mctp_pcc_tx_done(struct mbox_client *c, void *mssg, int r)
>> +{
>> +       struct mctp_pcc_ndev *mctp_pcc_ndev;
>> +       struct mctp_pcc_mailbox *outbox;
>> +       struct sk_buff *skb = NULL;
>> +       struct sk_buff *curr_skb;
>> +
>> +       mctp_pcc_ndev = container_of(c, struct mctp_pcc_ndev, outbox.client);
>> +       outbox = container_of(c, struct mctp_pcc_mailbox, client);
>> +       spin_lock(&outbox->packets.lock);
>> +       skb_queue_walk(&outbox->packets, curr_skb) {
>> +               if (curr_skb->data == mssg) {
>> +                       skb = curr_skb;
>> +                       __skb_unlink(skb, &outbox->packets);
>> +                       break;
>> +               }
>> +       }
>> +       spin_unlock(&outbox->packets.lock);
>> +       if (skb)
>> +               dev_consume_skb_any(skb);
>> +       netif_wake_queue(mctp_pcc_ndev->ndev);
>> +}
>> +
>> +static netdev_tx_t mctp_pcc_tx(struct sk_buff *skb, struct net_device *ndev)
>> +{
>> +       struct mctp_pcc_ndev *mpnd = netdev_priv(ndev);
>> +       struct pcc_header *pcc_header;
>> +       int len = skb->len;
>> +       int rc;
>> +
>> +       rc = skb_cow_head(skb, sizeof(*pcc_header));
>> +       if (rc) {
>> +               dev_dstats_tx_dropped(ndev);
>> +               kfree_skb(skb);
>> +               return NETDEV_TX_OK;
>> +       }
>> +
>> +       pcc_header = skb_push(skb, sizeof(*pcc_header));
>> +       pcc_header->signature = PCC_SIGNATURE | mpnd->outbox.index;
>> +       pcc_header->flags = PCC_CMD_COMPLETION_NOTIFY;
>> +       memcpy(&pcc_header->command, MCTP_SIGNATURE, MCTP_SIGNATURE_LENGTH);
>> +       pcc_header->length = len + MCTP_SIGNATURE_LENGTH;
>> +
>> +       skb_queue_head(&mpnd->outbox.packets, skb);
>> +
>> +       rc = mbox_send_message(mpnd->outbox.chan->mchan, skb->data);
>> +
>> +       if (rc < 0) {
>> +               netif_stop_queue(ndev);
> Nice!

Yeah.  Had a bit of an internal discussion about this one. Ideally, we 
would stop the queue one packet earlier, and not drop anything.  The 
mbox_send_message only returns the index of the next slot in the ring 
buffer and since it is circular, that does not give us a sense  of the 
space left.



>
>> +               skb_unlink(skb, &mpnd->outbox.packets);
>> +               return NETDEV_TX_BUSY;
>> +       }
>> +
>> +       dev_dstats_tx_add(ndev, len);
>> +       return NETDEV_TX_OK;
>> +}
>> +
>> +static void drain_packets(struct sk_buff_head *list)
>> +{
>> +       struct sk_buff *skb;
>> +
>> +       while (!skb_queue_empty(list)) {
>> +               skb = skb_dequeue(list);
>> +               dev_consume_skb_any(skb);
>> +       }
>> +}
>> +
>> +static int mctp_pcc_ndo_stop(struct net_device *ndev)
>> +{
>> +       struct mctp_pcc_ndev *mctp_pcc_ndev =
>> +           netdev_priv(ndev);
> Minor: Unneeded wrapping here, and it seems to be suppressing the
> warning about a blank line after declarations.

The Reverse XMasstree format checker I am using seems overly strict.  I 
will try to unwrap all of these.  Is it better to do a separate variable 
initialization?  Seems a bad coding practice for just a format decision 
that is purely aesthetic. But this one is simple to fix.


>
>> +       drain_packets(&mctp_pcc_ndev->outbox.packets);
>> +       drain_packets(&mctp_pcc_ndev->inbox.packets);
> Now that you're no longer doing the pcc_mbox_free_channel() in ndo_stop,
> nothing has quiesced the pcc channels at this point, right? In which
> case you now have a race between the channels' accesses to skb->data and
> freeing the skbs here.

(I have written and rewritten this section multiple times, so apoliges 
if soemthing is unclear or awkward...it might reflect and earlier 
thought...)

OK, I think I do need to call pcc_mbox_free_channel here, which means I 
need to allocate them in the pairing function. The ring buffer will 
still have pointers to the sk_buffs, but they will never be looked at 
again: the ring buffer will ger reinitialized if another client binds to 
it. Which means that I need the .start function to create the client 
again, and all the other changes that come along with that. The removal 
was to deal with the setting of the MTU, which requires a channel to 
read the size of the shared buffer.

        mctp_pcc_mtu = mctp_pcc_ndev->outbox.chan->shmem_size -
                sizeof(struct pcc_header);

I could create a channel, read  the value, and release it.  The Value I 
need is in the ACPI PCC-table but I don't have direct access to it. 
Perhaps it would be better to initialize the value to -1 and use that to 
optionally reset it to the max value on ndo open.


Check  me here: The channel has a value ring msg_count that keeps track 
of the number of entires in the ring buffer.  This needs to be set to0 
in order for it to think the buffer is empty.  It is initialized in  
__mbox_bind_client, called from mbox_bind_client which is in turn called 
from mbox_request_channel

The networking infra calls stop_ndo, so it must stop sending packets to 
it first.  I can netif_stop_queue(ndev) of course, but that seems 
counterintuitive? Assume i don't need to do that, but can't find the 
calling code.


>
> Is there a mbox facility to (synchronously) stop processing the inbound
> channel, and completing the outbound channel?

There is mbox_free_channel which calls shutdown, and that removed the 
IRQ handler, so no more  messages will be processed.  That should be 
sufficient.

>
>> +       return 0;
>> +}
>> +
>> +static const struct net_device_ops mctp_pcc_netdev_ops = {
>> +       .ndo_stop = mctp_pcc_ndo_stop,
>> +       .ndo_start_xmit = mctp_pcc_tx,
>> +};
>> +
>> +static void mctp_pcc_setup(struct net_device *ndev)
>> +{
>> +       ndev->type = ARPHRD_MCTP;
>> +       ndev->hard_header_len = 0;
>> +       ndev->tx_queue_len = 0;
>> +       ndev->flags = IFF_NOARP;
>> +       ndev->netdev_ops = &mctp_pcc_netdev_ops;
>> +       ndev->needs_free_netdev = true;
>> +       ndev->pcpu_stat_type = NETDEV_PCPU_STAT_DSTATS;
>> +}
>> +
>> +struct mctp_pcc_lookup_context {
>> +       int index;
>> +       u32 inbox_index;
>> +       u32 outbox_index;
>> +};
>> +
>> +static acpi_status lookup_pcct_indices(struct acpi_resource *ares,
>> +                                      void *context)
>> +{
>> +       struct mctp_pcc_lookup_context *luc = context;
>> +       struct acpi_resource_address32 *addr;
>> +
>> +       if (ares->type != PCC_DWORD_TYPE)
>> +               return AE_OK;
>> +
>> +       addr = ACPI_CAST_PTR(struct acpi_resource_address32, &ares->data);
>> +       switch (luc->index) {
>> +       case 0:
>> +               luc->outbox_index = addr[0].address.minimum;
>> +               break;
>> +       case 1:
>> +               luc->inbox_index = addr[0].address.minimum;
>> +               break;
>> +       }
>> +       luc->index++;
>> +       return AE_OK;
>> +}
>> +
>> +static void mctp_cleanup_channel(void *data)
>> +{
>> +       struct pcc_mbox_chan *chan = data;
>> +
>> +       pcc_mbox_free_channel(chan);
>> +}
>> +
>> +static int mctp_pcc_initialize_mailbox(struct device *dev,
>> +                                      struct mctp_pcc_mailbox *box, u32 index)
>> +{
>> +       box->index = index;
>> +       skb_queue_head_init(&box->packets);
>> +       box->client.dev = dev;
>> +       box->chan = pcc_mbox_request_channel(&box->client, index);
>> +       if (IS_ERR(box->chan))
>> +               return PTR_ERR(box->chan);
>> +       return devm_add_action_or_reset(dev, mctp_cleanup_channel, box->chan);
>> +}
>> +
>> +static void mctp_cleanup_netdev(void *data)
>> +{
>> +       struct net_device *ndev = data;
>> +
>> +       mctp_unregister_netdev(ndev);
>> +}
>> +
>> +static int mctp_pcc_driver_add(struct acpi_device *acpi_dev)
>> +{
>> +       struct mctp_pcc_lookup_context context = {0};
>> +       struct mctp_pcc_ndev *mctp_pcc_ndev;
>> +       struct device *dev = &acpi_dev->dev;
>> +       struct net_device *ndev;
>> +       acpi_handle dev_handle;
>> +       acpi_status status;
>> +       int mctp_pcc_mtu;
>> +       char name[32];
>> +       int rc;
>> +
>> +       dev_dbg(dev, "Adding mctp_pcc device for HID %s\n",
>> +               acpi_device_hid(acpi_dev));
>> +       dev_handle = acpi_device_handle(acpi_dev);
>> +       status = acpi_walk_resources(dev_handle, "_CRS", lookup_pcct_indices,
>> +                                    &context);
>> +       if (!ACPI_SUCCESS(status)) {
>> +               dev_err(dev, "FAILURE to lookup PCC indexes from CRS\n");
>> +               return -EINVAL;
>> +       }
>> +
>> +       snprintf(name, sizeof(name), "mctppcc%d", context.inbox_index);
>> +       ndev = alloc_netdev(sizeof(*mctp_pcc_ndev), name, NET_NAME_PREDICTABLE,
>> +                           mctp_pcc_setup);
>> +       if (!ndev)
>> +               return -ENOMEM;
>> +
>> +       mctp_pcc_ndev = netdev_priv(ndev);
>> +
>> +       /* inbox initialization */
>> +       rc = mctp_pcc_initialize_mailbox(dev, &mctp_pcc_ndev->inbox,
>> +                                        context.inbox_index);
>> +       if (rc)
>> +               goto free_netdev;
>> +
>> +       mctp_pcc_ndev->inbox.client.rx_callback = mctp_pcc_client_rx_callback;
>> +       mctp_pcc_ndev->inbox.chan->rx_alloc = mctp_pcc_rx_alloc;
>> +
>> +       /* outbox initialization */
>> +       rc = mctp_pcc_initialize_mailbox(dev, &mctp_pcc_ndev->outbox,
>> +                                        context.outbox_index);
>> +       if (rc)
>> +               goto free_netdev;
>> +
>> +       mctp_pcc_ndev->outbox.client.tx_done = mctp_pcc_tx_done;
>> +       mctp_pcc_ndev->acpi_device = acpi_dev;
>> +       mctp_pcc_ndev->ndev = ndev;
>> +       acpi_dev->driver_data = mctp_pcc_ndev;
>> +
>> +       mctp_pcc_ndev->outbox.chan->manage_writes = true;
>> +
>> +       /* initialize MTU values */
>> +       mctp_pcc_mtu = mctp_pcc_ndev->outbox.chan->shmem_size
>> +               - sizeof(struct pcc_header);
> Minor: no need for this temporary var.
>
> (Or the comment really - we can see this is initialising MTU values from
> the fact that it's initialising values with mtu in their name :) )

I think I am going to set this to -1 here, and then initialize in ndo 
open if it is still -1.


>
>> +       ndev->mtu = MCTP_MIN_MTU;
>> +       ndev->max_mtu = mctp_pcc_mtu;
>> +       ndev->min_mtu = MCTP_MIN_MTU;
>> +
>> +       rc = mctp_register_netdev(ndev, NULL, MCTP_PHYS_BINDING_PCC);
>> +       if (rc)
>> +               goto free_netdev;
>> +
>> +       return devm_add_action_or_reset(dev, mctp_cleanup_netdev, ndev);
> As has been mentioned elsewhere, using the devm cleanup mechanism is a
> bit unconventional here. You have the device remove callback available,
> which lets you do the same, and that way you can demonstrate symmetry
> between the add and remove implementations.

This has gone through a few  iterations and I thought I had it clear.

I was trying  to make use of automated cleanup as much as possible.


>
> I'm okay with the approach, but you may want to consider the remove
> callback if that gives you an implementation that reads more clearly.
>
> Are you doing much testing here? I'd recommend running with KASAN and
> device removal & link status change under active transfers.
Will do.  I have not done that yet. I was unaware it existed.
>
> Cheers,

THanks so much for the detailed review and explanations.


>
>
> Jeremy