linux-kernel - Re: [PATCH v2] serial: 8250_dw: Avoid "too much work" from bogus rx timeout interrupt

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <19bcb511-1417-5d37-7fce-47b66c78d17e@schinagl.nl>
Date:   Wed, 29 Mar 2017 11:45:41 +0200
From:   Olliver Schinagl <o.schinagl@...imaker.com>
To:     Andy Shevchenko <andy.shevchenko@...il.com>,
        Douglas Anderson <dianders@...omium.org>,
        Cal Sullivan <california.l.sullivan@...el.com>
Cc:     "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        linux-rockchip@...ts.infradead.org,
        "linux-serial@...r.kernel.org" <linux-serial@...r.kernel.org>,
        guennadi.liakhovetski@...el.com, jslaby@...e.com,
        Jeffy Chen <jeffy.chen@...k-chips.com>,
        eric.gao@...k-chips.com, briannorris@...omium.org,
        dev@...ux-sunxi.org, linux-rockchip@...ts.infradead.org,
        wangkefeng.wang@...wei.com, noamc@...hip.com,
        heikki.krogerus@...ux.intel.com, jason.uy@...adcom.com,
        ed.blake@...tec.com,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        andriy.shevchenko@...ux.intel.com, guennadi.liakhovetski@...el.com
Subject: Re: [PATCH v2] serial: 8250_dw: Avoid "too much work" from bogus rx
 timeout interrupt

Hey Andy,

On 29-03-17 11:11, Andy Shevchenko wrote:
> On Wed, Mar 29, 2017 at 10:58 AM, Olliver Schinagl <oliver@...inagl.nl> wrote:
>> On 07-02-17 00:30, Douglas Anderson wrote:
>
> First of all I didn't get why people from Cc list are suddenly
> disappeared. Check your mail client settings.
> Returning back some of them.
Appologies, I replied via gmane's news feed to Douglas's initial post as 
I did not have the original post and I failed to check the other 
recipients. My fault. Sorry. I've added the original others as well.

>
>>> It appears that somehow we have a RX Timeout interrupt but there is no
>>> actual data present to receive.  When we're in this state the UART
>>> driver claims that it handled the interrupt but it actually doesn't
>>> really do anything.  This means that we keep getting the interrupt
>>> over and over again.
>
>> I may be running into the same thing on an A20 SoC, but still in the stage
>> of figuring out what is going on, as we get this error very occasionally. Do
>> you have a way to externally induce this behavior other then suspend/resume?
>> As we get it during uart-use and do not have (or I have never tried)
>> suspend/resume on our platform.
>
> On Intel platforms with this IP I can see similar when run loopback
> test on high speeds.
> California may correct me since he did a lot of investigation of the
> issue on x86.
>
>>>  static int dw8250_handle_irq(struct uart_port *p)
>>>  {
>>> +       struct uart_8250_port *up = up_to_u8250p(p);
>>>         struct dw8250_data *d = p->private_data;
>>>         unsigned int iir = p->serial_in(p, UART_IIR);
>>> +       unsigned int status;
>>> +       unsigned long flags;
>>> +
>>> +       /*
>>> +        * There are ways to get Designware-based UARTs into a state where
>>> +        * they are asserting UART_IIR_RX_TIMEOUT but there is no actual
>>> +        * data available.  If we see such a case then we'll do a bogus
>>> +        * read.  If we don't do this then the "RX TIMEOUT" interrupt will
>>> +        * fire forever.
>>
>> I think what you are saying is 'do a bogus read as that is the only way to
>> clear the interrupt, otherwise it will keep firing forever.'?
>
> No, we don't know if this _the only way_. It looks like no one from us
> can tell you a root cause, except may be Synopsys guys.
Has anybody tried to contact synopsis/dw about this issue at all?

true, it is not the only way (maybe only as far as we know for now) but 
it is 'the' way currently.
>
>>> +               spin_lock_irqsave(&p->lock, flags);
>>
>> this is a bit above my knowledge of driver etc, but I don't any spinlocks in
>> the 8250 handle_irq glue drivers, except in the OMAP's case where they are
>> handeling a DMA IRQ. So I ask, because I don't know, why is it needed here?
>
> They serialize IO accessors.
>
> Regarding to the rest comments, the patch is already in upstream, if
> you feel that something should be changed, send an incremental fix.
Ah, I thought I checked, but thought I didn't see it. I'll probably 
forgot to fetch. I'll send a patch for the small mask fix.
>
>> Once I found a way to reproduce the problem (without suspend) I will test
>> this to see if it fixes it for us too.
>
> It would be appreciated, but better to get know the root cause and
> what _hardware_ guys think about solutions.
>
I read over the docs of the IP block (I know a little FPGA programming) 
(dw_apb_uart of 2006) but found nothing yet that would warn for this 
behavior. I suppose hardware/fgpa guys can give more background here 
potentially, but it may also be simply an IP bug?

Olliver