linux-kernel - Re: [PATCH] irqchip: omap-intc: fix spurious irq handling

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <56605EBD.9050001@ti.com>
Date:	Thu, 3 Dec 2015 20:54:45 +0530
From:	Sekhar Nori <nsekhar@...com>
To:	Tony Lindgren <tony@...mide.com>
CC:	John Ogness <john.ogness@...utronix.de>,
	Thomas Gleixner <tglx@...utronix.de>,
	Jason Cooper <jason@...edaemon.net>,
	Marc Zyngier <marc.zyngier@....com>,
	Felipe Balbi <balbi@...com>,
	Linux OMAP Mailing List <linux-omap@...r.kernel.org>,
	<linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] irqchip: omap-intc: fix spurious irq handling

On Thursday 03 December 2015 08:32 PM, Tony Lindgren wrote:
> * Sekhar Nori <nsekhar@...com> [151203 03:29]:
>> On Tuesday 20 October 2015 08:22 PM, Tony Lindgren wrote:
>>>
>>> OK thanks for testing. My guess from the above list would be EDMA
>>> or CPSW missing a flush of posted write. Maybe try adding a readback
>>> of the related device revision register after acking the interrupt into
>>> TPCC interrupt handler and CPSW interrupt handler(s)?
>>
>> I could get back to debugging this only now. I have converted
>> __raw_writel to writel() and also added readback from the same register
>> in both EDMA and CPSW drivers. But I am still able to reproduce the
>> spurious irq reports.
>>
>>> The timer2 and uart0 seem to be false positives here naturally.
>>
>> I also added readback in 8250 driver. I haven't touched the timer
>> driver, but I guess if that driver had an issue, it should have come out
>> much earlier.
>>
>> I also saw that sometimes previous irq was the TI LCDC interrupt. Added
>> readback there too. Did not help.
> 
> OK strange, so far all the ones we've seen have been fixable that way.
> 
>>> I would not yet rule out the "previous interrupt" theory until you have
>>> tried that. We really want to know the root cause of the issue, just
>>> printing out spurious interrupt does not fix the problem :)
>>
>> While we cannot rule out a software issue completely, the description in
>> TRM around spurious interrupts suggests it can happen even with no role
>> of software.
> 
> Yes maybe we more than one reason for them.
> 
>> May I suggest we go ahead and add this patch to the kernel after
>> addressing Thomas's comment? At least it will prevent kernel from
>> locking up with flood of prints when a spurious irq happens and allows
>> easier debug by others too.
> 
> Yes we should naturally fix up the kernel locking.

Alright. Thanks!

> 
> Please also add something like "enable debug for more information"
> to the warning. And then print out the current and previous interrupt

So I am unconvinced (based on the debug above) that the previous
interrupt information is actually giving any more useful information
than what can be gleaned from observing /proc/interrupts. It seems
previous interrupt noted can be any interrupt you would expect to occur
during the test case anyway.

> if DEBUG is enabled. And in the comments mention that often the spurious
> interrupts has been fixed by adding a flush of the posted write to the
> previous interrupt handler in the device driver.

I can add the comment, no problem.

> Also, do you have a reproducable test case with mainline kernel I
> could add to my collection of shell scripts?

The way I reproduce this is to run the serial port at 3Mbaud in internal
loopback mode with DMA enabled. The test program I use[1] compares the
data sent and received byte-for-byte. With current mainline, that can
mismatch pretty soon. The test will likely end before you see any
spurious irq. There are some patches John Ogness is working on
(currently included in TI's v4.1 kernel) which helps sustain the test
for long and then actually expose the spurious irq issue.

Thanks,
Sekhar

[1] https://git.breakpoint.cc/cgit/bigeasy/serialcheck.git
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/