linux-kernel - Re: lost interrupts when running sabrelite images (v4.15+) in qemu

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <9ee66efa-397c-898a-5cad-64eabc7752b9@roeck-us.net>
Date:   Tue, 6 Mar 2018 06:25:15 -0800
From:   Guenter Roeck <linux@...ck-us.net>
To:     Troy Kisky <troy.kisky@...ndarydevices.com>
Cc:     Fugang Duan <fugang.duan@....com>, netdev@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re: lost interrupts when running sabrelite images (v4.15+) in qemu

On 03/05/2018 09:30 AM, Troy Kisky wrote:
> On 3/3/2018 1:12 PM, Guenter Roeck wrote:
>> On 03/03/2018 12:48 PM, Guenter Roeck wrote:
>>> On 03/03/2018 11:07 AM, Troy Kisky wrote:
>>>> On 3/3/2018 8:32 AM, Guenter Roeck wrote:
>>>>> Hi,
>>>>>
>>>>> since v4.15, I get the following runtime warning when running sabrelite images
>>>>> in qemu.
>>>>>
>>>>> irq 65: nobody cared (try booting with the "irqpoll" option)
>>>>> ...
>>>>> handlers:
>>>>> [<26292474>] fec_pps_interrupt
>>>>> Disabling IRQ #65
>>>>> fec 2188000.ethernet (unnamed net_device) (uninitialized): MDIO read timeout
>>>>>
>>>>> Bisect points to commit 4ad1ceec05e491 ("net: fec: Let fec_ptp have its
>>>>> own interrupt routine"). Analysis shows that platform_irq_count()
>>>>> returns 2, which is reduced to 1 by fec_enet_get_irq_cnt().
>>>>> If I let fec_enet_get_irq_cnt() return 2, the problem is gone.
>>>>> Reverting commit 4ad1ceec05e491 also fixes the problem.
>>>>>
>>>>> Bisect log is attached.
>>>>>
>>>>
>>>> Sounds like you found a bug with qemu. I just booted sabrelite over nfs fine.
>>>> My interrupts look like this.
>>>>
>>>>
>>>>    64:      98767          0          0          0     GIC-0 150 Level     2188000.ethernet
>>>>    65:          0          0          0          0     GIC-0 151 Level     2188000.ethernet
>>>> ___________
>>>> Irq 65 is only for ptp interrrupts now. If qemu is signaling an tx/rx frame interrupt on 65,
>>>> then qemu is wrong. Of course, I've never used qemu so feel free to ignore me if I make no sense.
>>>>
>>>
>>> Thanks for checking with real hardware.
>>>
>>> This is what I see (with your patch reverted):
>>>
>>>    64:          0     GIC-0 150 Level     2188000.ethernet
>>>    65:         64     GIC-0 151 Level     2188000.ethernet
>>>
>>> Looking into the qemu source, I see:
>>>
>>> #define FSL_IMX6_ENET_MAC_1588_IRQ 118
>>> #define FSL_IMX6_ENET_MAC_IRQ 119
>>>
>>> FSL_IMX6_ENET_MAC_IRQ is then connected to fec interrupt index 0, and FSL_IMX6_ENET_MAC_1588_IRQ
>>> is connected to fec interrupt index 1.
>>>
>>> This may suggest that the defines are reversed. I'll see what happens if I swap them.
>>>
>>
>> Confirmed. If I swap the above defines, everything works fine. At the same time,
>> the modified qemu works with older kernels.
>>
>> Thanks a lot for the hint, and sorry for the noise.
>>
>> Guenter
>>
> It definitely was not noise. I bet it helps people searching the mailing list in the future.
> Thanks for posting the resolution.
> 

Turns out "works" as I stated above is not entirely accurate.

- v4.13 and later work
- In v4.12 and earlier, the Ethernet interface fails to instantiate with
	fec 2188000.ethernet (unnamed net_device) (uninitialized): MDIO read timeout
	fec: probe of 2188000.ethernet failed with error -5
   I have not found the reason yet. Unmodified qemu works fine.
- v4.1 and earlier crash. The crash is fixed by commit 32cba57ba74be ("net: fec:
   introduce fec_ptp_stop and use in probe fail path")

There is also a matching bug at lauchpad:

https://bugs.launchpad.net/qemu/+bug/1753309

Guenter