netdev - Re: [PATCH] net: bcmgenet: Return not supported if we don't have a WoL IRQ

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <de377891-c220-64f8-a0c2-69976d0c8513@gmail.com>
Date:   Fri, 4 Mar 2022 12:12:42 -0800
From:   Florian Fainelli <f.fainelli@...il.com>
To:     Jeremy Linton <jeremy.linton@....com>,
        Javier Martinez Canillas <javierm@...hat.com>,
        Jakub Kicinski <kuba@...nel.org>
Cc:     Peter Robinson <pbrobinson@...il.com>,
        Doug Berger <opendmb@...il.com>,
        "David S. Miller" <davem@...emloft.net>,
        bcm-kernel-feedback-list@...adcom.com, netdev@...r.kernel.org
Subject: Re: [PATCH] net: bcmgenet: Return not supported if we don't have a
 WoL IRQ



On 3/4/2022 9:33 AM, Jeremy Linton wrote:
> Hi,
> 
> On 3/3/22 14:04, Javier Martinez Canillas wrote:
>> Hello Jeremy,
>>
>> On 3/3/22 21:00, Jeremy Linton wrote:
>>> Hi,
>>>
>>> On 2/23/22 16:48, Jakub Kicinski wrote:
>>>> On Wed, 23 Feb 2022 09:54:26 -0800 Florian Fainelli wrote:
>>>>>> I have no problems working with you to improve the driver, the 
>>>>>> problem
>>>>>> I have is this is currently a regression in 5.17 so I would like to
>>>>>> see something land, whether it's reverting the other patch, landing
>>>>>> thing one or another straight forward fix and then maybe revisit as
>>>>>> whole in 5.18.
>>>>>
>>>>> Understood and I won't require you or me to complete this 
>>>>> investigating
>>>>> before fixing the regression, this is just so we understand where it
>>>>> stemmed from and possibly fix the IRQ layer if need be. Given what I
>>>>> just wrote, do you think you can sprinkle debug prints throughout the
>>>>> kernel to figure out whether enable_irq_wake() somehow messes up the
>>>>> interrupt descriptor of interrupt and test that theory? We can do that
>>>>> offline if you want.
>>>>
>>>> Let me mark v2 as Deferred for now, then. I'm not really sure if that's
>>>> what's intended but we have 3 weeks or so until 5.17 is cut so we can
>>>> afford a few days of investigating.
>>>>
>>>> I'm likely missing the point but sounds like the IRQ subsystem treats
>>>> IRQ numbers as unsigned so if we pass a negative value "fun" is sort
>>>> of expected. Isn't the problem that device somehow comes with wakeup
>>>> capable being set already? Isn't it better to make sure device is not
>>>> wake capable if there's no WoL irq instead of adding second check?
>>>>
>>>> diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.c 
>>>> b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
>>>> index cfe09117fe6c..7dea44803beb 100644
>>>> --- a/drivers/net/ethernet/broadcom/genet/bcmgenet.c
>>>> +++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
>>>> @@ -4020,12 +4020,12 @@ static int bcmgenet_probe(struct 
>>>> platform_device *pdev)
>>>>        /* Request the WOL interrupt and advertise suspend if 
>>>> available */
>>>>        priv->wol_irq_disabled = true;
>>>> -    if (priv->wol_irq > 0) {
>>>> +    if (priv->wol_irq > 0)
>>>>            err = devm_request_irq(&pdev->dev, priv->wol_irq,
>>>>                           bcmgenet_wol_isr, 0, dev->name, priv);
>>>> -        if (!err)
>>>> -            device_set_wakeup_capable(&pdev->dev, 1);
>>>> -    }
>>>> +    else
>>>> +        err = -ENOENT;
>>>> +    device_set_wakeup_capable(&pdev->dev, !err);
>>>>        /* Set the needed headroom to account for any possible
>>>>         * features enabling/disabling at runtime
>>>>
>>>
>>>
>>> I duplicated the problem on rpi4/ACPI by moving to gcc12, so I have a/b
>>> config that is close as I can achieve using gcc11 vs 12 and the one
>>> built with gcc12 fails pretty consistently while the gcc11 works.
>>>
>>
>> Did Peter's patch instead of this one help ?
>>
> 
> No, it seems to be the same problem. The second irq is registered, but 
> never seems to fire. There are a couple odd compiler warnings about 
> infinite recursion in memcpy()/etc I was looking at, but nothing really 
> pops out. Its like the adapter never gets the command submissions 
> (although link/up/down appear to be working/etc).

There are two "main" interrupt lines which are required and an optional 
third interrupt line which is the side band Wake-on-LAN interrupt from 
the second level interrupt controller that aggregates all wake-up sources.

The first interrupt line collects the the default RX/TX queue interrupts 
(ring 16) as well as the MAC link up/down and other interrupts that we 
are not using. The second interrupt line is only for the TX queues 
(rings 0 through 3) transmit done completion signaling. Because the 
driver is multi-queue aware and enabled, the network stack will chose 
any of those 5 queues before transmitting packets based upon a hash, so 
if you want to reliably prove/disprove that the second interrupt line is 
non-functional, you would need to force a given type of packet(s) to use 
that queue specifically. There is an example on how to do that here:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/networking/multiqueue.rst#n47

With that said, please try the following debug patch so we can get more 
understanding of how we managed to prevent the second interrupt line 
from getting its interrupt handler serviced. Thanks
-- 
Florian
View attachment "debug.diff" of type "text/plain" (3982 bytes)