netdev - Re: [PATCH net-next v27 1/1] mctp pcc: Implement MCTP over PCC Transport

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3d30c216-e49e-4d85-8f1b-581306033909@amperemail.onmicrosoft.com>
Date: Tue, 2 Sep 2025 18:45:10 -0400
From: Adam Young <admiyo@...eremail.onmicrosoft.com>
To: Jeremy Kerr <jk@...econstruct.com.au>,
 Adam Young <admiyo@...amperecomputing.com>,
 Matt Johnston <matt@...econstruct.com.au>,
 Andrew Lunn <andrew+netdev@...n.ch>, "David S. Miller"
 <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>,
 Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>
Cc: netdev@...r.kernel.org, linux-kernel@...r.kernel.org,
 Sudeep Holla <sudeep.holla@....com>,
 Jonathan Cameron <Jonathan.Cameron@...wei.com>,
 Huisong Li <lihuisong@...wei.com>
Subject: Re: [PATCH net-next v27 1/1] mctp pcc: Implement MCTP over PCC
 Transport


On 8/31/25 22:31, Jeremy Kerr wrote:
> Hi Adam,
>
>>> Nice!
>> Yeah.  Had a bit of an internal discussion about this one. Ideally, we
>> would stop the queue one packet earlier, and not drop anything.  The
>> mbox_send_message only returns the index of the next slot in the ring
>> buffer and since it is circular, that does not give us a sense  of the
>> space left.
> I think that's okay as-is. If you have access to the tail pointer of the
> ring buffer too, you may be able to calculate space, but that's
> 1) optional, and 2) best left to a later change.
>
>>>> +static int mctp_pcc_ndo_stop(struct net_device *ndev)
>>>> +{
>>>> +       struct mctp_pcc_ndev *mctp_pcc_ndev =
>>>> +           netdev_priv(ndev);
>>> Minor: Unneeded wrapping here, and it seems to be suppressing the
>>> warning about a blank line after declarations.
>> The Reverse XMasstree format checker I am using seems overly strict.  I
>> will try to unwrap all of these.  Is it better to do a separate variable
>> initialization?  Seems a bad coding practice for just a format decision
>> that is purely aesthetic. But this one is simple to fix.
> That shouldn't be tripping any RCT checks here, as it's just one
> variable init?
>
> 	mctp_pcc_ndev *mctp_pcc_ndev = netdev_priv(ndev);
>
> Keep it in one if possible (as you have done).
>
>>>> +       drain_packets(&mctp_pcc_ndev->outbox.packets);
>>>> +       drain_packets(&mctp_pcc_ndev->inbox.packets);
>>> Now that you're no longer doing the pcc_mbox_free_channel() in ndo_stop,
>>> nothing has quiesced the pcc channels at this point, right? In which
>>> case you now have a race between the channels' accesses to skb->data and
>>> freeing the skbs here.
>> (I have written and rewritten this section multiple times, so apoliges
>> if soemthing is unclear or awkward...it might reflect and earlier
>> thought...)
>>
>> OK, I think I do need to call pcc_mbox_free_channel here,
> You should ensure that the packet processing has stopped on
> ndo_stop (and has not started before ndo_open). Without doing that, you
> have two issues:
>
>   - the RX packets will continue while the interface is down; and
>   - you cannot free the lists
>
> If there's a way to keep the channel allocated, but suspend the
> processing of messages, that would seem like a good option (and sounds
> like it would solve the MTU complexity).
>
> However, on a cursory look through the pcc/mailbox infrastructure, it
> seems like the pcc_mbox_request_channel starts processing immediately -
> so it looks like you can not have the channel in an allocated-but-idle
> state, since the request_channel does the bind_client, which does a
> pcc_startup.
>
> So, I figure you have two options:
>
>   - only request the channel until the interface is up; or
>
>   - implement your own quiescing in the callbacks - keeping the channels
>     allocated, but check if the netdev is operational (ie, ndo_open has
>     been called) before processing an RX message

This is how I went:  during Add, I quickly request the outbound channel, 
get the MTU, and free it.  There should be no reason to get any messages 
during this window.  If we do, it is spurious an likely a Firmware error.

The channels are requested  during ndo_open and freed during ndo_stop.  
After the free, the packets get drained.  A restart of the device drops 
all messages in the ring buffer.


>
>> which means I need to allocate them in the pairing function. The ring
>> buffer will still have pointers to the sk_buffs, but they will never
>> be looked at again: the ring buffer will ger reinitialized if another
>> client binds to it.
> OK, but the skbs will remain allocated between those operations, which
> has other side-effects (eg, socket mem accounting). You'll want to drain
> the queue (as you are doing) if the queue is no longer making progress.
I think this approach will minimize the lifespan of the sk_buffs: they 
will either be waiting to be sent by the mailbox, or they will get freed 
immediately upon driver down.
>
>> The removal was to deal with the setting of the MTU, which requires a
>> channel to read the size of the shared buffer.
>>
>>          mctp_pcc_mtu = mctp_pcc_ndev->outbox.chan->shmem_size -
>>                  sizeof(struct pcc_header);
>>
>> I could create a channel, read  the value, and release it.  The Value I
>> need is in the ACPI PCC-table but I don't have direct access to it.
>> Perhaps it would be better to initialize the value to -1 and use that to
>> optionally reset it to the max value on ndo open.
> If you cannot create the channel until ndo_open, I'd you will also need
> to check the current mtu value on open, and failing if it exceeds the
> limit of the channel.
>
> If you have some way to extract this from the ACPI tables, that may be
> preferable. With the -1 approach, you'll also need to ensure that the
> *current* MTU is not larger than the channel max, on ndo_open. For
> example:
>
>    $ mctp link set mctppcc0 mtu 1000
>       # sets current mtu, no max currently specified
>    $ mctp link set mctppcc0 up
>       # driver discovers max is (say) 500, now we have an invalid mtu

Note that I tested this and it now works as expected.

>
> So, if you're able to parse the max from ACPI on initial bind, then you
> can detect the error at the time of occurrence (the `link set mtu`)
> rather than later (the `link set up`, and then ndo_open failing).
Yep.  Makes sense.  Setting it on driver add is essential.
>
>> Check  me here: The channel has a value ring msg_count that keeps track
>> of the number of entires in the ring buffer.  This needs to be set to0
>> in order for it to think the buffer is empty.  It is initialized in
>> __mbox_bind_client, called from mbox_bind_client which is in turn called
>> from mbox_request_channel
>>
>> The networking infra calls stop_ndo, so it must stop sending packets to
>> it first.  I can netif_stop_queue(ndev) of course, but that seems
>> counterintuitive? Assume i don't need to do that, but can't find the
>> calling code.
> You won't have any further ->ndo_start_xmit calls at the point that
> ->ndo_stop is called.
>
>>> Is there a mbox facility to (synchronously) stop processing the inbound
>>> channel, and completing the outbound channel?
>> There is mbox_free_channel which calls shutdown, and that removed the
>> IRQ handler, so no more  messages will be processed.  That should be
>> sufficient.
> OK, as above, this will depend on the approach you on allocating and
> releasing the channels.
>
>>>> +       ndev->mtu = MCTP_MIN_MTU;
>>>> +       ndev->max_mtu = mctp_pcc_mtu;
>>>> +       ndev->min_mtu = MCTP_MIN_MTU;
>>>> +
>>>> +       rc = mctp_register_netdev(ndev, NULL, MCTP_PHYS_BINDING_PCC);
>>>> +       if (rc)
>>>> +               goto free_netdev;
>>>> +
>>>> +       return devm_add_action_or_reset(dev, mctp_cleanup_netdev, ndev);
>>> As has been mentioned elsewhere, using the devm cleanup mechanism is a
>>> bit unconventional here. You have the device remove callback available,
>>> which lets you do the same, and that way you can demonstrate symmetry
>>> between the add and remove implementations.
>> This has gone through a few  iterations and I thought I had it clear.
>>
>> I was trying  to make use of automated cleanup as much as possible.
> OK, your call there. Using ->remove allows you to be explicit about the
> matching ordering, which would be my approach, but that's certainly not
> the only correct one.

The last of the devm_add_action_or_reset will now go away: I don't have 
to free the channel when I clean up the device, as it will be done when 
stop is called.


>
> Cheers,
>
>
> Jeremy