linux-kernel - Re: [PATCH V4 11/12] bnxt_en: Add TPH support in BNXT driver

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b02f2e6e-5ad2-2e7b-86a5-644f44ecdb6d@amd.com>
Date: Wed, 18 Sep 2024 18:31:27 +0100
From: Alejandro Lucero Palau <alucerop@....com>
To: Wei Huang <wei.huang2@....com>, linux-pci@...r.kernel.org,
 linux-kernel@...r.kernel.org, linux-doc@...r.kernel.org,
 netdev@...r.kernel.org
Cc: Jonathan.Cameron@...wei.com, helgaas@...nel.org, corbet@....net,
 davem@...emloft.net, edumazet@...gle.com, kuba@...nel.org,
 pabeni@...hat.com, alex.williamson@...hat.com, gospo@...adcom.com,
 michael.chan@...adcom.com, ajit.khaparde@...adcom.com,
 somnath.kotur@...adcom.com, andrew.gospodarek@...adcom.com,
 manoj.panicker2@....com, Eric.VanTassell@....com, vadim.fedorenko@...ux.dev,
 horms@...nel.org, bagasdotme@...il.com, bhelgaas@...gle.com,
 lukas@...ner.de, paul.e.luse@...el.com, jing2.liu@...el.com
Subject: Re: [PATCH V4 11/12] bnxt_en: Add TPH support in BNXT driver


On 9/16/24 19:55, Wei Huang wrote:
>
>
> On 9/11/24 10:37 AM, Alejandro Lucero Palau wrote:
>>
>> On 8/22/24 21:41, Wei Huang wrote:
>>> From: Manoj Panicker <manoj.panicker2@....com>
>>>
>>> Implement TPH support in Broadcom BNXT device driver. The driver uses
>>> TPH functions to retrieve and configure the device's Steering Tags when
>>> its interrupt affinity is being changed.
>>>
>>> Co-developed-by: Wei Huang <wei.huang2@....com>
>>> Signed-off-by: Wei Huang <wei.huang2@....com>
>>> Signed-off-by: Manoj Panicker <manoj.panicker2@....com>
>>> Reviewed-by: Ajit Khaparde <ajit.khaparde@...adcom.com>
>>> Reviewed-by: Somnath Kotur <somnath.kotur@...adcom.com>
>>> Reviewed-by: Andy Gospodarek <andrew.gospodarek@...adcom.com>
>>> ---
>>>    drivers/net/ethernet/broadcom/bnxt/bnxt.c | 78 
>>> +++++++++++++++++++++++
>>>    drivers/net/ethernet/broadcom/bnxt/bnxt.h |  4 ++
>>>    2 files changed, 82 insertions(+)
>>>
>>> diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c 
>>> b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
>>> index ffa74c26ee53..5903cd36b54d 100644
>>> --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
>>> +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
>>> @@ -55,6 +55,7 @@
>>>    #include <net/page_pool/helpers.h>
>>>    #include <linux/align.h>
>>>    #include <net/netdev_queues.h>
>>> +#include <linux/pci-tph.h>
>>>       #include "bnxt_hsi.h"
>>>    #include "bnxt.h"
>>> @@ -10821,6 +10822,58 @@ int bnxt_reserve_rings(struct bnxt *bp, 
>>> bool irq_re_init)
>>>        return 0;
>>>    }
>>>    +static void __bnxt_irq_affinity_notify(struct 
>>> irq_affinity_notify *notify,
>>> +                       const cpumask_t *mask)
>>> +{
>>> +    struct bnxt_irq *irq;
>>> +    u16 tag;
>>> +
>>> +    irq = container_of(notify, struct bnxt_irq, affinity_notify);
>>> +    cpumask_copy(irq->cpu_mask, mask);
>>> +
>>> +    if (pcie_tph_get_cpu_st(irq->bp->pdev, TPH_MEM_TYPE_VM,
>>> +                cpumask_first(irq->cpu_mask), &tag))
>>
>>
>> I understand just one cpu from the mask has to be used, but I wonder if
>> some check should be done for ensuring the mask is not mad.
>>
>> This is control path and the related queue is going to be restarted, so
>> maybe a sanity check for ensuring all the cpus in the mask are from the
>> same CCX complex?
>
> I don't think this is always true and we shouldn't warn when this 
> happens. There is only one ST can be supported, so the driver need to 
> make a good judgement on which ST to be used. But no matter what, ST 
> is just a hint - it shouldn't cause any correctness issues in HW, even 
> when it is not the optimal target CPU. So warning is unnecessary.
>

1) You can use a "mad" mask for avoiding a specific interrupt to disturb 
a specific execution is those cores not part of the mask. But I argue 
the ST hint should not be set then.


2) Someone, maybe an automatic script, could try to get the best 
performance possible, and a "mad" mask could preclude such outcome 
inadvertently.


I agree a warning could not be a good idea because 1, but I would say 
adding some way of traceability here could be interesting. A tracepoint 
or a new ST field for last hint set for that interrupt/queue.


>>
>> That would be an iteration checking the tag is the same one for all of
>> them. If not, at least a warning stating the tag/CCX/cpu used.
>>
>>
>>> +        return;
>>> +
>>> +    if (pcie_tph_set_st_entry(irq->bp->pdev, irq->msix_nr, tag))
>>> +        return;
>>> +
>>> +    if (netif_running(irq->bp->dev)) {
>>> +        rtnl_lock();
>>> +        bnxt_close_nic(irq->bp, false, false);
>>> +        bnxt_open_nic(irq->bp, false, false);
>>> +        rtnl_unlock();
>>> +    }
>>> +}
>>> +
>>> +static void __bnxt_irq_affinity_release(struct kref __always_unused 
>>> *ref)
>>> +{
>>> +}
>>> +
>>> +static void bnxt_release_irq_notifier(struct bnxt_irq *irq)
>>> +{
>>> +    irq_set_affinity_notifier(irq->vector, NULL);
>>> +}
>>> +
>>> +static void bnxt_register_irq_notifier(struct bnxt *bp, struct 
>>> bnxt_irq *irq)
>>> +{
>>> +    struct irq_affinity_notify *notify;
>>> +
>>> +    /* Nothing to do if TPH is not enabled */
>>> +    if (!pcie_tph_enabled(bp->pdev))
>>> +        return;
>>> +
>>> +    irq->bp = bp;
>>> +
>>> +    /* Register IRQ affinility notifier */
>>> +    notify = &irq->affinity_notify;
>>> +    notify->irq = irq->vector;
>>> +    notify->notify = __bnxt_irq_affinity_notify;
>>> +    notify->release = __bnxt_irq_affinity_release;
>>> +
>>> +    irq_set_affinity_notifier(irq->vector, notify);
>>> +}
>>> +
>>>    static void bnxt_free_irq(struct bnxt *bp)
>>>    {
>>>        struct bnxt_irq *irq;
>>> @@ -10843,11 +10896,17 @@ static void bnxt_free_irq(struct bnxt *bp)
>>>                    free_cpumask_var(irq->cpu_mask);
>>>                    irq->have_cpumask = 0;
>>>                }
>>> +
>>> +            bnxt_release_irq_notifier(irq);
>>> +
>>>                free_irq(irq->vector, bp->bnapi[i]);
>>>            }
>>>               irq->requested = 0;
>>>        }
>>> +
>>> +    /* Disable TPH support */
>>> +    pcie_disable_tph(bp->pdev);
>>>    }
>>>       static int bnxt_request_irq(struct bnxt *bp)
>>> @@ -10870,6 +10929,13 @@ static int bnxt_request_irq(struct bnxt *bp)
>>>        if (!(bp->flags & BNXT_FLAG_USING_MSIX))
>>>            flags = IRQF_SHARED;
>>>    +    /* Enable TPH support as part of IRQ request */
>>> +    if (pcie_tph_modes(bp->pdev) & PCI_TPH_CAP_INT_VEC) {
>>> +        rc = pcie_enable_tph(bp->pdev, PCI_TPH_CAP_INT_VEC);
>>> +        if (rc)
>>> +            netdev_warn(bp->dev, "failed enabling TPH support\n");
>>> +    }
>>> +
>>>        for (i = 0, j = 0; i < bp->cp_nr_rings; i++) {
>>>            int map_idx = bnxt_cp_num_to_irq_num(bp, i);
>>>            struct bnxt_irq *irq = &bp->irq_tbl[map_idx];
>>> @@ -10893,8 +10959,10 @@ static int bnxt_request_irq(struct bnxt *bp)
>>>               if (zalloc_cpumask_var(&irq->cpu_mask, GFP_KERNEL)) {
>>>                int numa_node = dev_to_node(&bp->pdev->dev);
>>> +            u16 tag;
>>>                   irq->have_cpumask = 1;
>>> +            irq->msix_nr = map_idx;
>>>                cpumask_set_cpu(cpumask_local_spread(i, numa_node),
>>>                        irq->cpu_mask);
>>>                rc = irq_set_affinity_hint(irq->vector, irq->cpu_mask);
>>> @@ -10904,6 +10972,16 @@ static int bnxt_request_irq(struct bnxt *bp)
>>>                            irq->vector);
>>>                    break;
>>>                }
>>> +
>>> +            bnxt_register_irq_notifier(bp, irq);
>>> +
>>> +            /* Init ST table entry */
>>> +            if (pcie_tph_get_cpu_st(irq->bp->pdev, TPH_MEM_TYPE_VM,
>>> +                        cpumask_first(irq->cpu_mask),
>>> +                        &tag))
>>> +                break;
>>> +
>>> +            pcie_tph_set_st_entry(irq->bp->pdev, irq->msix_nr, tag);
>>>            }
>>>        }
>>>        return rc;
>>> diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h 
>>> b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
>>> index 6bbdc718c3a7..ae1abcc1bddf 100644
>>> --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
>>> +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
>>> @@ -1224,6 +1224,10 @@ struct bnxt_irq {
>>>        u8        have_cpumask:1;
>>>        char        name[IFNAMSIZ + 2];
>>>        cpumask_var_t    cpu_mask;
>>> +
>>> +    struct bnxt    *bp;
>>> +    int        msix_nr;
>>> +    struct irq_affinity_notify affinity_notify;
>>>    };
>>>       #define HWRM_RING_ALLOC_TX    0x1