lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1a38b394-30ae-42c6-b363-9f3a00166259@amperemail.onmicrosoft.com>
Date: Mon, 3 Jun 2024 13:53:29 -0400
From: Adam Young <admiyo@...eremail.onmicrosoft.com>
To: Jeremy Kerr <jk@...econstruct.com.au>, admiyo@...amperecomputing.com,
 Matt Johnston <matt@...econstruct.com.au>,
 "David S. Miller" <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>,
 Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>
Cc: netdev@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 3/3] mctp pcc: Implement MCTP over PCC Transport

Pretty sure I have all these corrections in the next one.  Some inline 
comments.



On 5/28/24 23:02, Jeremy Kerr wrote:
> Hi Adam,
>
> Thanks for the v2! Progress looks good, this seems simpler now too. Some
> comments inline.
>
>> From: Adam Young <admiyo@...erecomputing.com>
>>
>> Implementation of DMTF DSP:0292
>> Management Control Transport Protocol(MCTP)  over
>> Platform Communication Channel(PCC)
>>
>> MCTP devices are specified by entries in DSDT/SDST and
>> reference channels specified in the PCCT.
>>
>> Communication with other devices use the PCC based
>> doorbell mechanism.
> Signed-off-by?
>
> And can you include a brief summary of changes since the prior version
> you have sent? (see
> https://lore.kernel.org/netdev/20240220081053.1439104-1-jk@codeconstruct.com.au/
> for an example, the marker lines means that the changes don't get
> included in the commit log; Jakub may also have other preferences around
> this...)

They are all in the header patch.

>
>> diff --git a/drivers/net/mctp/Kconfig b/drivers/net/mctp/Kconfig
>> index ce9d2d2ccf3b..ff4effd8e99c 100644
>> --- a/drivers/net/mctp/Kconfig
>> +++ b/drivers/net/mctp/Kconfig
>> @@ -42,6 +42,19 @@ config MCTP_TRANSPORT_I3C
>>            A MCTP protocol network device is created for each I3C bus
>>            having a "mctp-controller" devicetree property.
>>   
>> +config MCTP_TRANSPORT_PCC
>> +       tristate "MCTP  PCC transport"
> Super minor: you have two spaces between "MCTP" and "PCC"
>
>> +       select ACPI
>> +       help
>> +         Provides a driver to access MCTP devices over PCC transport,
>> +         A MCTP protocol network device is created via ACPI for each
>> +         entry in the DST/SDST that matches the identifier. The Platform
>> +         commuinucation channels are selected from the corresponding
>> +         entries in the PCCT.
>> +
>> +         Say y here if you need to connect to MCTP endpoints over PCC. To
>> +         compile as a module, use m; the module will be called mctp-pcc.
>> +
>>   endmenu
>>   
>>   endif
>> diff --git a/drivers/net/mctp/Makefile b/drivers/net/mctp/Makefile
>> index e1cb99ced54a..492a9e47638f 100644
>> --- a/drivers/net/mctp/Makefile
>> +++ b/drivers/net/mctp/Makefile
>> @@ -1,3 +1,4 @@
>> +obj-$(CONFIG_MCTP_TRANSPORT_PCC) += mctp-pcc.o
>>   obj-$(CONFIG_MCTP_SERIAL) += mctp-serial.o
>>   obj-$(CONFIG_MCTP_TRANSPORT_I2C) += mctp-i2c.o
>>   obj-$(CONFIG_MCTP_TRANSPORT_I3C) += mctp-i3c.o
>> diff --git a/drivers/net/mctp/mctp-pcc.c b/drivers/net/mctp/mctp-pcc.c
>> new file mode 100644
>> index 000000000000..d97f40789fd8
>> --- /dev/null
>> +++ b/drivers/net/mctp/mctp-pcc.c
>> @@ -0,0 +1,361 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +/*
>> + * mctp-pcc.c - Driver for MCTP over PCC.
>> + * Copyright (c) 2024, Ampere Computing LLC
>> + */
>> +
>> +#include <linux/acpi.h>
>> +#include <linux/if_arp.h>
>> +#include <linux/init.h>
>> +#include <linux/kernel.h>
>> +#include <linux/module.h>
>> +#include <linux/netdevice.h>
>> +#include <linux/platform_device.h>
>> +#include <linux/string.h>
>> +
>> +#include <acpi/acpi_bus.h>
>> +#include <acpi/acpi_drivers.h>
>> +#include <acpi/acrestyp.h>
>> +#include <acpi/actbl.h>
>> +#include <net/mctp.h>
>> +#include <net/mctpdevice.h>
>> +#include <acpi/pcc.h>
>> +#include <net/pkt_sched.h>
>> +
>> +#define SPDM_VERSION_OFFSET 1
>> +#define SPDM_REQ_RESP_OFFSET 2
>> +#define MCTP_PAYLOAD_LENGTH 256
>> +#define MCTP_CMD_LENGTH 4
>> +#define MCTP_PCC_VERSION     0x1 /* DSP0253 defines a single version: 1 */
>> +#define MCTP_SIGNATURE "MCTP"
>> +#define SIGNATURE_LENGTH 4
>> +#define MCTP_HEADER_LENGTH 12
>> +#define MCTP_MIN_MTU 68
>> +#define PCC_MAGIC 0x50434300
>> +#define PCC_DWORD_TYPE 0x0c
>> +
>> +struct mctp_pcc_hdr {
>> +       u32 signature;
>> +       u32 flags;
>> +       u32 length;
>> +       char mctp_signature[4];
>> +};
> The usage of this struct isn't really consistent; you'll at least want
> endian annotations on these members. More on that below though.




>
>> +
>> +struct mctp_pcc_hw_addr {
>> +       u32 inbox_index;
>> +       u32 outbox_index;
>> +};
>> +
>> +/* The netdev structure. One of these per PCC adapter. */
>> +struct mctp_pcc_ndev {
>> +       struct list_head next;
>> +       /* spinlock to serialize access to pcc buffer and registers*/
>> +       spinlock_t lock;
>> +       struct mctp_dev mdev;
>> +       struct acpi_device *acpi_device;
>> +       struct pcc_mbox_chan *in_chan;
>> +       struct pcc_mbox_chan *out_chan;
>> +       struct mbox_client outbox_client;
>> +       struct mbox_client inbox_client;
>> +       void __iomem *pcc_comm_inbox_addr;
>> +       void __iomem *pcc_comm_outbox_addr;
>> +       struct mctp_pcc_hw_addr hw_addr;
>> +       void (*cleanup_channel)(struct pcc_mbox_chan *in_chan);
> Why this as a callback? There's only one possible function it can be.

Vestige of code designed to work with multiple mailbox implementations. 
Will remove.


>
>> +};
>> +
>> +static struct list_head mctp_pcc_ndevs;
> I'm not clear on what this list is doing; it seems to be for freeing
> devices on module unload (or device remove).
>
> However, the module will be refcounted while there are devices bound, so
> module unload shouldn't be possible in that state. So the only time
> you'll be iterating this list to free everything will be when it's
> empty.
>
> You could replace this with the mctp_pcc_driver_remove() just removing the
> device passed in the argument, rather than doing any list iteration.
>
> ... unless I've missed something?

There is no requirement that all the devices  be unloaded in order for 
the module to get unloaded.

It someone wants to disable the MCTP devices, they can unload the 
module, and it gets cleaned up.

With ACPI, the devices never go away, they are defined in a table read 
at start up and stay there.  So without this change there is no way to 
unload the module.  Maybe it is just a convenience for development, but 
I think most modules behave this way.


>
>
>> +
>> +static void mctp_pcc_client_rx_callback(struct mbox_client *c, void *buffer)
>> +{
>> +       struct mctp_pcc_ndev *mctp_pcc_dev;
>> +       struct mctp_skb_cb *cb;
>> +       struct sk_buff *skb;
>> +       u32 length_offset;
>> +       u32 flags_offset;
>> +       void *skb_buf;
>> +       u32 data_len;
>> +       u32 flags;
>> +
>> +       mctp_pcc_dev = container_of(c, struct mctp_pcc_ndev, inbox_client);
>> +       length_offset = offsetof(struct mctp_pcc_hdr, length);
>> +       data_len = readl(mctp_pcc_dev->pcc_comm_inbox_addr + length_offset) +
>> +                  MCTP_HEADER_LENGTH;
> Doing this using offsetof with separate readl()s is a bit clunky. Can
> you memcpy_fromio the whole header, and use the appropriate endian
> accessors?
I was overthinking this.  You are right.
>
> (this would match the behaviour in the tx path)
>
> Also, maybe check that data_len is sensible before allocating.
Good idea
>
>> +
>> +       skb = netdev_alloc_skb(mctp_pcc_dev->mdev.dev, data_len);
>> +       if (!skb) {
>> +               mctp_pcc_dev->mdev.dev->stats.rx_dropped++;
>> +               return;
>> +       }
>> +       mctp_pcc_dev->mdev.dev->stats.rx_packets++;
>> +       mctp_pcc_dev->mdev.dev->stats.rx_bytes += data_len;
>> +       skb->protocol = htons(ETH_P_MCTP);
>> +       skb_buf = skb_put(skb, data_len);
>> +       memcpy_fromio(skb_buf, mctp_pcc_dev->pcc_comm_inbox_addr, data_len);
>> +       skb_reset_mac_header(skb);
>> +       skb_pull(skb, sizeof(struct mctp_pcc_hdr));
> Any benefit in including the pcc_hdr in the skb?
>
> (not necessarily an issue, just asking...)
It shows up in  tracing of the packet.  Useful for debugging.
>
>> +       skb_reset_network_header(skb);
>> +       cb = __mctp_cb(skb);
>> +       cb->halen = 0;
>> +       skb->dev =  mctp_pcc_dev->mdev.dev;
>> +       netif_rx(skb);
>> +
>> +       flags_offset = offsetof(struct mctp_pcc_hdr, flags);
>> +       flags = readl(mctp_pcc_dev->pcc_comm_inbox_addr + flags_offset);
>> +       mctp_pcc_dev->in_chan->ack_rx = (flags & 1) > 0;
> Might be best to define what the flags bits mean, rather than
> magic-numbering this.
I'll make a constant for the mask.
>
> Does anything need to tell the mailbox driver to do that ack after
> setting ack_rx?

Yes.  It is in the previous patch, in the pcc_mailbox code.  I 
originally had it as a follow on, but reordered to make it a pre-req.  
That allows me to inline this logic, making the driver easier to review 
(I hope).

>
>> +static netdev_tx_t mctp_pcc_tx(struct sk_buff *skb, struct net_device *ndev)
>> +{
>> +       struct mctp_pcc_hdr pcc_header;
>> +       struct mctp_pcc_ndev *mpnd;
>> +       void __iomem *buffer;
>> +       unsigned long flags;
>> +       int rc;
>> +
>> +       netif_stop_queue(ndev);
> Do you need to stop and restart the queue? Your handling is atomic.
I guess not.  This was just from following the examples of others. Will 
remove.
>
>> +       ndev->stats.tx_bytes += skb->len;
>> +       ndev->stats.tx_packets++;
>> +       mpnd = (struct mctp_pcc_ndev *)netdev_priv(ndev);
> no need for this cast, netdev_priv() returns void *
>
>> +
>> +       spin_lock_irqsave(&mpnd->lock, flags);
>> +       buffer = mpnd->pcc_comm_outbox_addr;
>> +       pcc_header.signature = PCC_MAGIC;
>> +       pcc_header.flags = 0x1;
> Magic numbers for flags here too
>
>> +       memcpy(pcc_header.mctp_signature, MCTP_SIGNATURE, SIGNATURE_LENGTH);
>> +       pcc_header.length = skb->len + SIGNATURE_LENGTH;
>> +       memcpy_toio(buffer, &pcc_header, sizeof(struct mctp_pcc_hdr));
>> +       memcpy_toio(buffer + sizeof(struct mctp_pcc_hdr), skb->data, skb->len);
>> +       rc = mpnd->out_chan->mchan->mbox->ops->send_data(mpnd->out_chan->mchan,
>> +                                                        NULL);
>> +       spin_unlock_irqrestore(&mpnd->lock, flags);
>> +
>> +       dev_consume_skb_any(skb);
>> +       netif_start_queue(ndev);
>> +       if (!rc)
>> +               return NETDEV_TX_OK;
>> +       return NETDEV_TX_BUSY;
> I think you want to return NETDEV_TX_OK unconditionally here, or at
> least you need to change the queue handling; see the comment for the
> ndo_start_xmit callback.
Will Do. Thanks for pointing out that comment, would never have seen it.
>
>> +}
>> +
>> +static const struct net_device_ops mctp_pcc_netdev_ops = {
>> +       .ndo_start_xmit = mctp_pcc_tx,
>> +       .ndo_uninit = NULL
> No need for this assignment.
>
>> +};
>> +
>> +static void  mctp_pcc_setup(struct net_device *ndev)
>> +{
>> +       ndev->type = ARPHRD_MCTP;
>> +       ndev->hard_header_len = 0;
>> +       ndev->addr_len = 0;
>> +       ndev->tx_queue_len = DEFAULT_TX_QUEUE_LEN;
>> +       ndev->flags = IFF_NOARP;
>> +       ndev->netdev_ops = &mctp_pcc_netdev_ops;
>> +       ndev->needs_free_netdev = true;
>> +}
>> +
>> +static int create_mctp_pcc_netdev(struct acpi_device *acpi_dev,
>> +                                 struct device *dev, int inbox_index,
>> +                                 int outbox_index)
>> +{
>> +       struct mctp_pcc_ndev *mctp_pcc_dev;
>> +       struct net_device *ndev;
>> +       int mctp_pcc_mtu;
>> +       char name[32];
>> +       int rc;
>> +
>> +       snprintf(name, sizeof(name), "mctpipcc%d", inbox_index);
>> +       ndev = alloc_netdev(sizeof(struct mctp_pcc_ndev), name, NET_NAME_ENUM,
>> +                           mctp_pcc_setup);
>> +       if (!ndev)
>> +               return -ENOMEM;
>> +       mctp_pcc_dev = (struct mctp_pcc_ndev *)netdev_priv(ndev);
>> +       INIT_LIST_HEAD(&mctp_pcc_dev->next);
>> +       spin_lock_init(&mctp_pcc_dev->lock);
>> +
>> +       mctp_pcc_dev->hw_addr.inbox_index = inbox_index;
>> +       mctp_pcc_dev->hw_addr.outbox_index = outbox_index;
>> +       mctp_pcc_dev->inbox_client.rx_callback = mctp_pcc_client_rx_callback;
>> +       mctp_pcc_dev->cleanup_channel = pcc_mbox_free_channel;
>> +       mctp_pcc_dev->out_chan =
>> +               pcc_mbox_request_channel(&mctp_pcc_dev->outbox_client,
>> +                                        outbox_index);
>> +       if (IS_ERR(mctp_pcc_dev->out_chan)) {
>> +               rc = PTR_ERR(mctp_pcc_dev->out_chan);
>> +               goto free_netdev;
>> +       }
>> +       mctp_pcc_dev->in_chan =
>> +               pcc_mbox_request_channel(&mctp_pcc_dev->inbox_client,
>> +                                        inbox_index);
>> +       if (IS_ERR(mctp_pcc_dev->in_chan)) {
>> +               rc = PTR_ERR(mctp_pcc_dev->in_chan);
>> +               goto cleanup_out_channel;
>> +       }
>> +       mctp_pcc_dev->pcc_comm_inbox_addr =
>> +               devm_ioremap(dev, mctp_pcc_dev->in_chan->shmem_base_addr,
>> +                            mctp_pcc_dev->in_chan->shmem_size);
>> +       if (!mctp_pcc_dev->pcc_comm_inbox_addr) {
>> +               rc = -EINVAL;
>> +               goto cleanup_in_channel;
>> +       }
>> +       mctp_pcc_dev->pcc_comm_outbox_addr =
>> +               devm_ioremap(dev, mctp_pcc_dev->out_chan->shmem_base_addr,
>> +                            mctp_pcc_dev->out_chan->shmem_size);
>> +       if (!mctp_pcc_dev->pcc_comm_outbox_addr) {
>> +               rc = -EINVAL;
>> +               goto cleanup_in_channel;
>> +       }
>> +       mctp_pcc_dev->acpi_device = acpi_dev;
> You probably want the link back too:
>
>            acpi_dev->driver_data = mctp_pcc_dev;
>
>
>> +       mctp_pcc_dev->inbox_client.dev = dev;
>> +       mctp_pcc_dev->outbox_client.dev = dev;
>> +       mctp_pcc_dev->mdev.dev = ndev;
>> +
>> +/* There is no clean way to pass the MTU to the callback function
>> + * used for registration, so set the values ahead of time.
>> + */
> Super minor, but keep this aligned with the code indent.
>
>> +       mctp_pcc_mtu = mctp_pcc_dev->out_chan->shmem_size -
>> +               sizeof(struct mctp_pcc_hdr);
>> +       ndev->mtu = mctp_pcc_mtu;
>> +       ndev->max_mtu = mctp_pcc_mtu;
>> +       ndev->min_mtu = MCTP_MIN_MTU;
> Same as last review: I'd recommend setting the MTU to the minimum, and
> leaving it up to userspace to do a `mctp link mctppcc0 mtu <whatever>`.
> This means you don't break things if/when you ever encounter remote
> endpoints that don't support the max mtu.
>
> If the driver could reliably detect the remote max MTU, then what you
> have is fine (and the driver would then set the appropriate MTU
> here).
>
> However, if that probing *cannot* be done at the driver level, or needs
> any userspace interaction to do so (say, a higher level protocol), then
> you've set a MTU that could be unusable in this initial state. In that
> case, setting it low to start with means that the link is usable by
> default.
>
> Userspace can then hard-code a higher MTU if you like, and that can be
> adjusted later as appropriate.
>
> (this is a lesson learnt from the i2c transport...)

Ah...makes sense.  I was not sure if our impl,enetation could handle 
that yet, but it appears to,

If it can't, it is a bug we have to fix.


>
>> +/* pass in adev=NULL to remove all devices
>> + */
>> +static void mctp_pcc_driver_remove(struct acpi_device *adev)
>> +{
>> +       struct mctp_pcc_ndev *mctp_pcc_dev = NULL;
>> +       struct list_head *ptr;
>> +       struct list_head *tmp;
>> +
>> +       list_for_each_safe(ptr, tmp, &mctp_pcc_ndevs) {
>> +               struct net_device *ndev;
>> +
>> +               mctp_pcc_dev = list_entry(ptr, struct mctp_pcc_ndev, next);
>> +               if (adev && mctp_pcc_dev->acpi_device == adev)
>> +                       continue;
>> +
>> +               mctp_pcc_dev->cleanup_channel(mctp_pcc_dev->out_chan);
>> +               mctp_pcc_dev->cleanup_channel(mctp_pcc_dev->in_chan);
>> +               ndev = mctp_pcc_dev->mdev.dev;
>> +               if (ndev)
>> +                       mctp_unregister_netdev(ndev);
>> +               list_del(ptr);
>> +               if (adev)
>> +                       break;
>> +       }
>> +};
> Assuming we don't need the free-everything case, how about something
> like:
>
>      static void mctp_pcc_driver_remove(struct acpi_device *adev)
>      {
>            struct mctp_pcc_ndev *mctp_pcc_dev = acpi_driver_data(adev);
>
>            mctp_pcc_dev->cleanup_channel(mctp_pcc_dev->out_chan);
>            mctp_pcc_dev->cleanup_channel(mctp_pcc_dev->in_chan);
>            mctp_unregister_netdev(mctp_pcc_dev->mdev.dev);
>       }
>
> If you do need the list iteration: you're currently doing the cleanup on
> every adev *except* the one you want:
>
>> +               if (adev && mctp_pcc_dev->acpi_device == adev)
>> +                       continue;
> I think you meant '!=' instead of '=='?
Yes.  Yes I did.  This is code that has to be there for completeness, 
but I don't really have a way to test, except for the "delete all" case.
>
>> +
>> +static const struct acpi_device_id mctp_pcc_device_ids[] = {
>> +       { "DMT0001", 0},
>> +       { "", 0},
>> +};
>> +
>> +static struct acpi_driver mctp_pcc_driver = {
>> +       .name = "mctp_pcc",
>> +       .class = "Unknown",
>> +       .ids = mctp_pcc_device_ids,
>> +       .ops = {
>> +               .add = mctp_pcc_driver_add,
>> +               .remove = mctp_pcc_driver_remove,
>> +               .notify = NULL,
> Minor: don't need the zero assignment here.
>
>> +       },
>> +       .owner = THIS_MODULE,
>> +
>> +};
>> +
> [...]
>
>> +static int __init mctp_pcc_mod_init(void)
>> +{
>> +       int rc;
>> +
>> +       pr_debug("Initializing MCTP over PCC transport driver\n");
>> +       INIT_LIST_HEAD(&mctp_pcc_ndevs);
>> +       rc = acpi_bus_register_driver(&mctp_pcc_driver);
>> +       if (rc < 0)
>> +               ACPI_DEBUG_PRINT((ACPI_DB_ERROR, "Error registering driver\n"));
>> +       return rc;
>> +}
>> +
>> +static __exit void mctp_pcc_mod_exit(void)
>> +{
>> +       pr_debug("Removing MCTP over PCC transport driver\n");
>> +       mctp_pcc_driver_remove(NULL);
>> +       acpi_bus_unregister_driver(&mctp_pcc_driver);
>> +}
>> +
>> +module_init(mctp_pcc_mod_init);
>> +module_exit(mctp_pcc_mod_exit);
> If you end up removing the mctp_pcc_ndevs list, these can all be
> replaced with module_acpi_driver(mctp_pcc_driver);

Yeah, I can't get away with that.  The ACPI devices may still be there 
when some one calls rmmod, and so we need to clean up the ndevs.  It is 
the only case  where the module will be removed, as the ACPI devices 
never go away.


>
> Cheers,
>
>
> Jeremy

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ