lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 8 Feb 2022 10:24:01 -0600
From:   Tom Lendacky <thomas.lendacky@....com>
To:     Thomas Kupper <thomas@...per.org>,
        Shyam Sundar S K <Shyam-sundar.S-k@....com>
Cc:     netdev@...r.kernel.org
Subject: Re: AMD XGBE "phy irq request failed" kernel v5.17-rc2 on V1500B
 based board

On 2/7/22 11:59, Thomas Kupper wrote:
> 
> Am 07.02.22 um 16:19 schrieb Shyam Sundar S K:
>>
>> On 2/7/2022 8:02 PM, Tom Lendacky wrote:
>>> On 2/5/22 12:14, Thomas Kupper wrote:
>>>> Am 05.02.22 um 16:51 schrieb Tom Lendacky:
>>>>> On 2/5/22 04:06, Thomas Kupper wrote:
>>>>> Reloading the module and specify the dyndbg option to get some
>>>>> additional debug output.
>>>>>
>>>>> I'm adding Shyam to the thread, too, as I'm not familiar with the
>>>>> configuration for this chip.
>>>>>
>>>> Right after boot:
>>>>
>>>> [    5.352977] amd-xgbe 0000:06:00.1 eth0: net device enabled
>>>> [    5.354198] amd-xgbe 0000:06:00.2 eth1: net device enabled
>>>> ...
>>>> [    5.382185] amd-xgbe 0000:06:00.1 enp6s0f1: renamed from eth0
>>>> [    5.426931] amd-xgbe 0000:06:00.2 enp6s0f2: renamed from eth1
>>>> ...
>>>> [    9.701637] amd-xgbe 0000:06:00.2 enp6s0f2: phy powered off
>>>> [    9.701679] amd-xgbe 0000:06:00.2 enp6s0f2: CL73 AN disabled
>>>> [    9.701715] amd-xgbe 0000:06:00.2 enp6s0f2: CL37 AN disabled
>>>> [    9.738191] amd-xgbe 0000:06:00.2 enp6s0f2: starting PHY
>>>> [    9.738219] amd-xgbe 0000:06:00.2 enp6s0f2: starting I2C
>>>> ...
>>>> [   10.742622] amd-xgbe 0000:06:00.2 enp6s0f2: firmware mailbox
>>>> command did not complete
>>>> [   10.742710] amd-xgbe 0000:06:00.2 enp6s0f2: firmware mailbox reset
>>>> performed
>>>> [   10.750813] amd-xgbe 0000:06:00.2 enp6s0f2: 10GbE SFI mode set
>>>> [   10.768366] amd-xgbe 0000:06:00.2 enp6s0f2: 10GbE SFI mode set
>>>> [   10.768371] amd-xgbe 0000:06:00.2 enp6s0f2: fixed PHY configuration
>>>>
>>>> Then after 'ifconfig enp6s0f2 up':
>>>>
>>>> [  189.184928] amd-xgbe 0000:06:00.2 enp6s0f2: phy powered off
>>>> [  189.191828] amd-xgbe 0000:06:00.2 enp6s0f2: 10GbE SFI mode set
>>>> [  189.191863] amd-xgbe 0000:06:00.2 enp6s0f2: CL73 AN disabled
>>>> [  189.191894] amd-xgbe 0000:06:00.2 enp6s0f2: CL37 AN disabled
>>>> [  189.196338] amd-xgbe 0000:06:00.2 enp6s0f2: starting PHY
>>>> [  189.198792] amd-xgbe 0000:06:00.2 enp6s0f2: 10GbE SFI mode set
>>>> [  189.212036] genirq: Flags mismatch irq 69. 00000000 (enp6s0f2-pcs)
>>>> vs. 00000000 (enp6s0f2-pcs)
>>>> [  189.221700] amd-xgbe 0000:06:00.2 enp6s0f2: phy irq request failed
>>>> [  189.231051] amd-xgbe 0000:06:00.2 enp6s0f2: phy powered off
>>>> [  189.231054] amd-xgbe 0000:06:00.2 enp6s0f2: stopping I2C
>>>>
>>> Please ensure that the ethtool msglvl is on for drv and probe. I was
>>> expecting to see some additional debug messages that I don't see here.
>>>
>>> Also, if you can provide the lspci output for the device (using -nn and
>>> -vv) that might be helpful as well.
>>>
>>> Shyam will be the best one to understand what is going on here.
>> On some other platforms, we have seen similar kind of problems getting
>> reported. There is a fix sent for validation.
>>
>> The root cause is that removal of xgbe driver is causing interrupt storm
>> on the MP2 device (Sensor Fusion Hub).
>>
>> Shall submit a fix soon to upstream once the validation is done, you may
>> give it a try with that and see if that helps.
>>
>> Thanks,
>> Shyam
>>
>>> Thanks,
>>> Tom
> 
> Shyam, I will check the git logs for the relevant commit then from time to 
> time.
> Looking at the code diff from OPNsense and the latest Linux kernel I 
> assumed that there would much more to do then fix a irq strom (but I have 
> no idea about the inner working of the kernel).
> 
> Nevermind: Setting the 'msglvl 0x3' with ethtool the following info can be 
> found in dmesg:
> 
> Running : $ ifconfig enp6s0f2 up
> SIOCSIFFLAGS: Invalid argument
> 
> ... and 'dmesg':
> 
> [   55.177447] amd-xgbe 0000:06:00.2 enp6s0f2: channel-0: cpu=0, node=0
> [   55.177456] amd-xgbe 0000:06:00.2 enp6s0f2: channel-0: 
> dma_regs=00000000d11bf3f1, dma_irq=74, tx=00000000dd57b5c4, 
> rx=00000000d73e70f8
> [   55.177464] amd-xgbe 0000:06:00.2 enp6s0f2: channel-1: cpu=1, node=0
> [   55.177467] amd-xgbe 0000:06:00.2 enp6s0f2: channel-1: 
> dma_regs=000000000d972dd7, dma_irq=75, tx=00000000573bcff8, 
> rx=000000003d9a6f65
> [   55.177473] amd-xgbe 0000:06:00.2 enp6s0f2: channel-2: cpu=2, node=0
> [   55.177476] amd-xgbe 0000:06:00.2 enp6s0f2: channel-2: 
> dma_regs=0000000046f71179, dma_irq=76, tx=00000000897116c9, 
> rx=0000000004ba17e7
> [   55.177480] amd-xgbe 0000:06:00.2 enp6s0f2: channel-0 - Tx ring:
> [   55.177502] amd-xgbe 0000:06:00.2 enp6s0f2: rdesc=00000000794657ba, 
> rdesc_dma=0x000000010fad8000, rdata=0000000008ace7d8, node=0
> [   55.177507] amd-xgbe 0000:06:00.2 enp6s0f2: channel-0 - Rx ring:
> [   55.177523] amd-xgbe 0000:06:00.2 enp6s0f2: rdesc=000000009313d9b3, 
> rdesc_dma=0x0000000114538000, rdata=00000000510e3b77, node=0
> [   55.177527] amd-xgbe 0000:06:00.2 enp6s0f2: channel-1 - Tx ring:
> [   55.177543] amd-xgbe 0000:06:00.2 enp6s0f2: rdesc=00000000d26d9194, 
> rdesc_dma=0x000000010a774000, rdata=00000000b9419829, node=0
> [   55.177547] amd-xgbe 0000:06:00.2 enp6s0f2: channel-1 - Rx ring:
> [   55.177564] amd-xgbe 0000:06:00.2 enp6s0f2: rdesc=0000000007bf60dd, 
> rdesc_dma=0x000000010fb84000, rdata=00000000aa48e8c0, node=0
> [   55.177568] amd-xgbe 0000:06:00.2 enp6s0f2: channel-2 - Tx ring:
> [   55.177584] amd-xgbe 0000:06:00.2 enp6s0f2: rdesc=00000000e7e6c52e, 
> rdesc_dma=0x000000010fa2a000, rdata=0000000017b5d85c, node=0
> [   55.177587] amd-xgbe 0000:06:00.2 enp6s0f2: channel-2 - Rx ring:
> [   55.177603] amd-xgbe 0000:06:00.2 enp6s0f2: rdesc=000000000898fbf4, 
> rdesc_dma=0x0000000101f08000, rdata=00000000aded7d4c, node=0
> [   55.182366] amd-xgbe 0000:06:00.2 enp6s0f2: TXq0 mapped to TC0
> [   55.182381] amd-xgbe 0000:06:00.2 enp6s0f2: TXq1 mapped to TC1
> [   55.182388] amd-xgbe 0000:06:00.2 enp6s0f2: TXq2 mapped to TC2
> [   55.182395] amd-xgbe 0000:06:00.2 enp6s0f2: PRIO0 mapped to RXq0
> [   55.182400] amd-xgbe 0000:06:00.2 enp6s0f2: PRIO1 mapped to RXq0
> [   55.182405] amd-xgbe 0000:06:00.2 enp6s0f2: PRIO2 mapped to RXq0
> [   55.182410] amd-xgbe 0000:06:00.2 enp6s0f2: PRIO3 mapped to RXq1
> [   55.182414] amd-xgbe 0000:06:00.2 enp6s0f2: PRIO4 mapped to RXq1
> [   55.182418] amd-xgbe 0000:06:00.2 enp6s0f2: PRIO5 mapped to RXq1
> [   55.182423] amd-xgbe 0000:06:00.2 enp6s0f2: PRIO6 mapped to RXq2
> [   55.182427] amd-xgbe 0000:06:00.2 enp6s0f2: PRIO7 mapped to RXq2
> [   55.182473] amd-xgbe 0000:06:00.2 enp6s0f2: 3 Tx hardware queues, 21760 
> byte fifo per queue
> [   55.182501] amd-xgbe 0000:06:00.2 enp6s0f2: 3 Rx hardware queues, 21760 
> byte fifo per queue
> [   55.182544] amd-xgbe 0000:06:00.2 enp6s0f2: flow control enabled for RXq0
> [   55.182550] amd-xgbe 0000:06:00.2 enp6s0f2: flow control enabled for RXq1
> [   55.182556] amd-xgbe 0000:06:00.2 enp6s0f2: flow control enabled for RXq2
> [   56.178946] amd-xgbe 0000:06:00.2 enp6s0f2: SFP detected:
> [   56.178954] amd-xgbe 0000:06:00.2 enp6s0f2:   vendor: MikroTik
> [   56.178958] amd-xgbe 0000:06:00.2 enp6s0f2:   part number: S+AO0005
> [   56.178961] amd-xgbe 0000:06:00.2 enp6s0f2:   revision level: 1.0
> [   56.178963] amd-xgbe 0000:06:00.2 enp6s0f2:   serial number: 
> STST050B1900001
> 

Ah, it's been a while since I've had to use the debug support. Could you 
also set the module debug parameter to 0x37 (debug=0x37) when loading the 
module. That will capture some of the debug messages that are issued on 
driver load. Sorry about that...

Thanks,
Tom

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ