[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date: Mon, 20 Jul 2020 01:48:33 +0000
From: Yoshihiro Shimoda <yoshihiro.shimoda.uh@...esas.com>
To: Sergei Shtylyov <sergei.shtylyov@...il.com>,
Sergei Shtylyov <sergei.shtylyov@...entembedded.com>
CC: "REE dirk.behme@...bosch.com" <dirk.behme@...bosch.com>,
"Shashikant.Suguni@...bosch.com" <Shashikant.Suguni@...bosch.com>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"linux-renesas-soc@...r.kernel.org"
<linux-renesas-soc@...r.kernel.org>,
"davem@...emloft.net" <davem@...emloft.net>,
"kuba@...nel.org" <kuba@...nel.org>
Subject: RE: [PATCH/RFC] net: ethernet: ravb: Try to wake subqueue instead of
stop on timeout
Hello!
> From: Sergei Shtylyov, Sent: Monday, July 20, 2020 4:20 AM
>
> Hello!
>
> Sorry about another late reply, was having h/w issues on the new work...
No problem! :) Thank you for your reply!
> On 07/06/2020 12:25 PM, Yoshihiro Shimoda wrote:
<snip>
> >> Maybe we should just fix those blind assumptions?
> >
> > Maybe I should have described some facts instead of assumptions like below?
> > If so, I should modify the code too.
> >
> > After ravb_stop_dma() was called, the driver assumed any transfers were
> > stopped. However, the current ravb_tx_timeout_work() doesn't check whether
> > the ravb_stop_dma() is succeeded without any error or not. So, we should
> > fix it.
>
> Yes. Better a stuck TX queue (with a chance to recover) than kernel oops...
I got it.
<snip>
> >> Well, I was thinking of polling TCCR and CSR like the current
> >> ravb_stop_dma() does, but if that works...
> >
> > I'm not sure whether polling TCCR and CSR is enough or not.
> > Instead of polling those registers, maybe we should poll whether
> > ravb_stop_dma() is succeeded or not?
>
> Yes, if by polling you mean just checking the result of it. :-)
Yes, I intend to check the result of it :)
> > Especially, result of ravb_config() is
> > a key point whether the hardware is really stopped or not.
> > So, I'm thinking that just polling the ravb_stop_dma() in
> > ravb_tx_timeout_work() is better than the per-queue tear-down and
> > re-init now. But, what do you think?
>
> I don't think it's better since we're now supposed to handle a per-queue
> TX timeout (still not sure it's possible with this h/w). But of course, it's
> better as it's simple enough for a bug fix.
Thank you for your comment. Yes, I agree it's better to handle a per-queue TX timeout.
However, I think we need refactoring for it. So, I'd like to fix a bug by simple code.
> >>> < Prepare new descriptors and start again >
> >>> 4. Prepare new descriptors.
> >>
> >> That's where the cause for using the workqueue lies -- the descriptors are
> >> allocated with GFP_KERNEL, not GFP_ATOMIC...
> >
> > IIUC, we can avoid to use the workqueue if re-allocation is not really necessary.
> >
> >> if you have time/desire to
> >> untangle all this, I'd appreciate it; else I'd have to work on this in my
> >> copious free time... :-)
> >
> > If we don't need refactoring, I think I can do it :)
>
> Let's go forward with the simple fix (assuming it fixes the original oops).
Thank you for your comment! I'll do that.
Best regards,
Yoshihiro Shimoda
> [...]
>
> MBR, Sergei
Powered by blists - more mailing lists