[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4da6b03a-603f-c5e8-2356-e7ecd9756508@arm.com>
Date: Mon, 7 Mar 2022 12:27:24 -0600
From: Jeremy Linton <jeremy.linton@....com>
To: Florian Fainelli <f.fainelli@...il.com>,
Javier Martinez Canillas <javierm@...hat.com>,
Jakub Kicinski <kuba@...nel.org>
Cc: Peter Robinson <pbrobinson@...il.com>,
Doug Berger <opendmb@...il.com>,
"David S. Miller" <davem@...emloft.net>,
bcm-kernel-feedback-list@...adcom.com, netdev@...r.kernel.org
Subject: Re: [PATCH] net: bcmgenet: Return not supported if we don't have a
WoL IRQ
Hi,
Sorry about the delay, i'm flipping between a couple different things here.
On 3/4/22 14:12, Florian Fainelli wrote:
>
>
> On 3/4/2022 9:33 AM, Jeremy Linton wrote:
>> Hi,
>>
>> On 3/3/22 14:04, Javier Martinez Canillas wrote:
>>> Hello Jeremy,
>>>
>>> On 3/3/22 21:00, Jeremy Linton wrote:
>>>> Hi,
>>>>
>>>> On 2/23/22 16:48, Jakub Kicinski wrote:
>>>>> On Wed, 23 Feb 2022 09:54:26 -0800 Florian Fainelli wrote:
>>>>>>> I have no problems working with you to improve the driver, the
>>>>>>> problem
>>>>>>> I have is this is currently a regression in 5.17 so I would like to
>>>>>>> see something land, whether it's reverting the other patch, landing
>>>>>>> thing one or another straight forward fix and then maybe revisit as
>>>>>>> whole in 5.18.
>>>>>>
>>>>>> Understood and I won't require you or me to complete this
>>>>>> investigating
>>>>>> before fixing the regression, this is just so we understand where it
>>>>>> stemmed from and possibly fix the IRQ layer if need be. Given what I
>>>>>> just wrote, do you think you can sprinkle debug prints throughout the
>>>>>> kernel to figure out whether enable_irq_wake() somehow messes up the
>>>>>> interrupt descriptor of interrupt and test that theory? We can do
>>>>>> that
>>>>>> offline if you want.
>>>>>
>>>>> Let me mark v2 as Deferred for now, then. I'm not really sure if
>>>>> that's
>>>>> what's intended but we have 3 weeks or so until 5.17 is cut so we can
>>>>> afford a few days of investigating.
>>>>>
>>>>> I'm likely missing the point but sounds like the IRQ subsystem treats
>>>>> IRQ numbers as unsigned so if we pass a negative value "fun" is sort
>>>>> of expected. Isn't the problem that device somehow comes with wakeup
>>>>> capable being set already? Isn't it better to make sure device is not
>>>>> wake capable if there's no WoL irq instead of adding second check?
>>>>>
>>>>> diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.c
>>>>> b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
>>>>> index cfe09117fe6c..7dea44803beb 100644
>>>>> --- a/drivers/net/ethernet/broadcom/genet/bcmgenet.c
>>>>> +++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
>>>>> @@ -4020,12 +4020,12 @@ static int bcmgenet_probe(struct
>>>>> platform_device *pdev)
>>>>> /* Request the WOL interrupt and advertise suspend if
>>>>> available */
>>>>> priv->wol_irq_disabled = true;
>>>>> - if (priv->wol_irq > 0) {
>>>>> + if (priv->wol_irq > 0)
>>>>> err = devm_request_irq(&pdev->dev, priv->wol_irq,
>>>>> bcmgenet_wol_isr, 0, dev->name, priv);
>>>>> - if (!err)
>>>>> - device_set_wakeup_capable(&pdev->dev, 1);
>>>>> - }
>>>>> + else
>>>>> + err = -ENOENT;
>>>>> + device_set_wakeup_capable(&pdev->dev, !err);
>>>>> /* Set the needed headroom to account for any possible
>>>>> * features enabling/disabling at runtime
>>>>>
>>>>
>>>>
>>>> I duplicated the problem on rpi4/ACPI by moving to gcc12, so I have a/b
>>>> config that is close as I can achieve using gcc11 vs 12 and the one
>>>> built with gcc12 fails pretty consistently while the gcc11 works.
>>>>
>>>
>>> Did Peter's patch instead of this one help ?
>>>
>>
>> No, it seems to be the same problem. The second irq is registered, but
>> never seems to fire. There are a couple odd compiler warnings about
>> infinite recursion in memcpy()/etc I was looking at, but nothing
>> really pops out. Its like the adapter never gets the command
>> submissions (although link/up/down appear to be working/etc).
>
> There are two "main" interrupt lines which are required and an optional
> third interrupt line which is the side band Wake-on-LAN interrupt from
> the second level interrupt controller that aggregates all wake-up sources.
>
> The first interrupt line collects the the default RX/TX queue interrupts
> (ring 16) as well as the MAC link up/down and other interrupts that we
> are not using. The second interrupt line is only for the TX queues
> (rings 0 through 3) transmit done completion signaling. Because the
> driver is multi-queue aware and enabled, the network stack will chose
> any of those 5 queues before transmitting packets based upon a hash, so
> if you want to reliably prove/disprove that the second interrupt line is
> non-functional, you would need to force a given type of packet(s) to use
> that queue specifically. There is an example on how to do that here:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/networking/multiqueue.rst#n47
>
>
> With that said, please try the following debug patch so we can get more
> understanding of how we managed to prevent the second interrupt line
> from getting its interrupt handler serviced. Thanks
Right, I applied your patch to rc7, and it prints the following
(trimming uninterresting bits)
[ 7.044681] bcmgenet BCM6E4E:00: IRQ0: 28 (29), IRQ1: -6 (28), Wol
IRQ: 29 (4294967290)
[ 7.064731] bcmgenet BCM6E4E:00: GENET 5.0 EPHY: 0x0000
[ 8.533639] bcmgenet BCM6E4E:00 enabcm6e4ei0: renamed from eth0
[ 56.803894] bcmgenet BCM6E4E:00: configuring instance for external
RGMII (RX delay)
[ 56.896851] bcmgenet BCM6E4E:00 enabcm6e4ei0: Link is Down
[ 60.045071] bcmgenet BCM6E4E:00 enabcm6e4ei0: Link is Up - 1Gbps/Full
- flow control off
[ 60.055872] IPv6: ADDRCONF(NETDEV_CHANGE): enabcm6e4ei0: link becomes
ready
[ 62.283525] ------------[ cut here ]------------
[ 62.290811] NETDEV WATCHDOG: enabcm6e4ei0 (bcmgenet): transmit queue
2 timed out
[ 62.301080] WARNING: CPU: 3 PID: 0 at net/sched/sch_generic.c:529
dev_watchdog+0x234/0x240
[ 62.312220] Modules linked in: nft_fib_inet nft_fib_ipv4 nft_fib_ipv6
nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct
nft_chain_nat nf_nat ns
[ 62.370353] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 5.17.0-rc7G12+ #58
[ 62.380052] Hardware name: Raspberry Pi Foundation Raspberry Pi 4
Model B/Raspberry Pi 4 Model B, BIOS EDK2-DEV 02/08/2022
[ 62.394151] pstate: 40400005 (nZcv daif +PAN -UAO -TCO -DIT -SSBS
BTYPE=--)
[ 62.404211] pc : dev_watchdog+0x234/0x240
[ 62.411304] lr : dev_watchdog+0x234/0x240
[ 62.418371] sp : ffff8000080b3a40
[ 62.424715] x29: ffff8000080b3a40 x28: ffffdd4ed64c7000 x27:
ffff8000080b3b20
[ 62.434899] x26: ffffdd4ed5f40000 x25: 0000000000000000 x24:
ffffdd4ed64cec08
[ 62.445095] x23: 0000000000000100 x22: ffffdd4ed64c7000 x21:
ffff1bd254e58000
[ 62.455259] x20: 0000000000000002 x19: ffff1bd254e584c8 x18:
ffffffffffffffff
[ 62.465439] x17: 64656d6974203220 x16: 0000000000000001 x15:
6d736e617274203a
[ 62.475615] x14: 2974656e65676d63 x13: ffffdd4ed51700d8 x12:
ffffdd4ed65bd5f0
[ 62.485787] x11: 00000000ffffffff x10: ffffdd4ed65bd5f0 x9 :
ffffdd4ed420c0fc
[ 62.495978] x8 : 00000000ffffdfff x7 : ffffdd4ed65bd5f0 x6 :
0000000000000001
[ 62.506173] x5 : 0000000000000000 x4 : ffff1bd2fb7b1408 x3 :
ffff1bd2fb7bddb0
[ 62.516334] x2 : ffff1bd2fb7b1408 x1 : ffff3e842586e000 x0 :
0000000000000044
[ 62.526520] Call trace:
[ 62.531969] dev_watchdog+0x234/0x240
[ 62.538671] call_timer_fn+0x3c/0x15c
[ 62.545331] __run_timers.part.0+0x288/0x310
[ 62.552579] run_timer_softirq+0x48/0x80
[ 62.559466] __do_softirq+0x128/0x360
[ 62.566055] __irq_exit_rcu+0x138/0x140
[ 62.572823] irq_exit_rcu+0x1c/0x30
[ 62.580799] el1_interrupt+0x38/0x54
[ 62.580817] el1h_64_irq_handler+0x18/0x24
[ 62.580822] el1h_64_irq+0x7c/0x80
[ 62.580827] arch_cpu_idle+0x18/0x2c
[ 62.580832] default_idle_call+0x4c/0x140
[ 62.580836] cpuidle_idle_call+0x14c/0x1a0
[ 62.580844] do_idle+0xb0/0x100
[ 62.580849] cpu_startup_entry+0x30/0x8c
[ 62.580854] secondary_start_kernel+0xe4/0x110
[ 62.580862] __secondary_switched+0x94/0x98
[ 62.580871] ---[ end trace 0000000000000000 ]---
It should be noted that the irq0/1/2 numbers are a bit messed up in the
patch, but the general idea should be visible here.
The full log is here https://pastebin.centos.org/view/22c2aede
The WOL paths aren't required to trigger this, which is why I questioned
the other patch.
Powered by blists - more mailing lists