netdev - RE: [PATCH/RFC] net: ethernet: ravb: Try to wake subqueue instead of stop on timeout

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <OSAPR01MB3683625EBE8E3353E99651A0D87B0@OSAPR01MB3683.jpnprd01.prod.outlook.com>
Date:   Mon, 20 Jul 2020 01:48:33 +0000
From:   Yoshihiro Shimoda <yoshihiro.shimoda.uh@...esas.com>
To:     Sergei Shtylyov <sergei.shtylyov@...il.com>,
        Sergei Shtylyov <sergei.shtylyov@...entembedded.com>
CC:     "REE dirk.behme@...bosch.com" <dirk.behme@...bosch.com>,
        "Shashikant.Suguni@...bosch.com" <Shashikant.Suguni@...bosch.com>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        "linux-renesas-soc@...r.kernel.org" 
        <linux-renesas-soc@...r.kernel.org>,
        "davem@...emloft.net" <davem@...emloft.net>,
        "kuba@...nel.org" <kuba@...nel.org>
Subject: RE: [PATCH/RFC] net: ethernet: ravb: Try to wake subqueue instead of
 stop on timeout

Hello!

> From: Sergei Shtylyov, Sent: Monday, July 20, 2020 4:20 AM
> 
> Hello!
> 
>    Sorry about another late reply, was having h/w issues on the new work...

No problem! :) Thank you for your reply!

> On 07/06/2020 12:25 PM, Yoshihiro Shimoda wrote:
<snip>
> >>     Maybe we should just fix those blind assumptions?
> >
> > Maybe I should have described some facts instead of assumptions like below?
> > If so, I should modify the code too.
> >
> > After ravb_stop_dma() was called, the driver assumed any transfers were
> > stopped. However, the current ravb_tx_timeout_work() doesn't check whether
> > the ravb_stop_dma() is succeeded without any error or not. So, we should
> > fix it.
> 
>    Yes. Better a stuck TX queue (with a chance to recover) than kernel oops...

I got it.

<snip>
> >>     Well, I was thinking of polling TCCR and CSR like the current
> >> ravb_stop_dma() does, but if that works...
> >
> > I'm not sure whether polling TCCR and CSR is enough or not.
> > Instead of polling those registers, maybe we should poll whether
> > ravb_stop_dma() is succeeded or not?
> 
>    Yes, if by polling you mean just checking the result of it. :-)

Yes, I intend to check the result of it :)

> > Especially, result of ravb_config() is
> > a key point whether the hardware is really stopped or not.
> > So, I'm thinking that just polling the ravb_stop_dma() in
> > ravb_tx_timeout_work() is better than the per-queue tear-down and
> > re-init now. But, what do you think?
> 
>    I don't think it's better since we're now supposed to handle a per-queue
> TX timeout (still not sure it's possible with this h/w). But of course, it's
> better as it's simple enough for a bug fix.

Thank you for your comment. Yes, I agree it's better to handle a per-queue TX timeout.
However, I think we need refactoring for it. So, I'd like to fix a bug by simple code.

> >>> < Prepare new descriptors and start again >
> >>> 4. Prepare new descriptors.
> >>
> >>     That's where the cause for using the workqueue lies -- the descriptors are
> >> allocated with GFP_KERNEL, not GFP_ATOMIC...
> >
> > IIUC, we can avoid to use the workqueue if re-allocation is not really necessary.
> >
> >> if you have time/desire to
> >> untangle all this, I'd appreciate it; else I'd have to work on this in my
> >> copious free time... :-)
> >
> > If we don't need refactoring, I think I can do it :)
> 
>    Let's go forward with the simple fix (assuming it fixes the original oops).

Thank you for your comment! I'll do that.

Best regards,
Yoshihiro Shimoda

> [...]
> 
> MBR, Sergei