[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <244d34f9e9fd2b948d822e1dffd9dc2b0c8b336c.camel@calian.com>
Date: Wed, 23 Mar 2022 16:42:46 +0000
From: Robert Hancock <robert.hancock@...ian.com>
To: "kuba@...nel.org" <kuba@...nel.org>,
"tomas.melin@...sala.com" <tomas.melin@...sala.com>
CC: "Nicolas.Ferre@...rochip.com" <Nicolas.Ferre@...rochip.com>,
"davem@...emloft.net" <davem@...emloft.net>,
"claudiu.beznea@...rochip.com" <claudiu.beznea@...rochip.com>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>
Subject: Re: [PATCH v3] net: macb: restart tx after tx used bit read
On Wed, 2022-03-23 at 08:43 -0700, Jakub Kicinski wrote:
> On Wed, 23 Mar 2022 10:08:20 +0200 Tomas Melin wrote:
> > > From: <Claudiu.Beznea@...rochip.com>
> > > To: <Nicolas.Ferre@...rochip.com>, <davem@...emloft.net>
> > > Cc: <netdev@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
> > > <Claudiu.Beznea@...rochip.com>
> > > Subject: [PATCH v3] net: macb: restart tx after tx used bit read
> > > Date: Mon, 17 Dec 2018 10:02:42 +0000 [thread overview]
> > > Message-ID: <
> > > 1545040937-6583-1-git-send-email-claudiu.beznea@...rochip.com> (raw)
> > >
> > > From: Claudiu Beznea <claudiu.beznea@...rochip.com>
> > >
> > > On some platforms (currently detected only on SAMA5D4) TX might stuck
> > > even the pachets are still present in DMA memories and TX start was
> > > issued for them. This happens due to race condition between MACB driver
> > > updating next TX buffer descriptor to be used and IP reading the same
> > > descriptor. In such a case, the "TX USED BIT READ" interrupt is asserted.
> > > GEM/MACB user guide specifies that if a "TX USED BIT READ" interrupt
> > > is asserted TX must be restarted. Restart TX if used bit is read and
> > > packets are present in software TX queue. Packets are removed from
> > > software
> > > TX queue if TX was successful for them (see macb_tx_interrupt()).
> > >
> > > Signed-off-by: Claudiu Beznea <claudiu.beznea@...rochip.com>
> >
> > On Xilinx Zynq the above change can cause infinite interrupt loop leading
> > to CPU stall. Seems timing/load needs to be appropriate for this to happen,
> > and currently
> > with 1G ethernet this can be triggered normally within minutes when running
> > stress tests
> > on the network interface.
> >
> > The events leading up to the interrupt looping are similar as the issue
> > described in the
> > commit message. However in our case, restarting TX does not help at all.
> > Instead
> > the controller is stuck on the queue end descriptor generating endless
> > TX_USED
> > interrupts, never breaking out of interrupt routine.
> >
> > Any chance you remember more details about in which situation restarting TX
> > helped for
> > your use case? was tx_qbar at the end of frame or stopped in middle of
> > frame?
>
> Which kernel version are you using? Robert has been working on macb +
> Zynq recently, adding him to CC.
We have been working with ZynqMP and haven't seen such isses in the past, but
I'm not sure we've tried the same type of stress test on those interfaces. If
by Zynq, Tomas means the Zynq-7000 series, that might be a different
version/revision of the IP core than we have as well.
I haven't looked at the TX ring descriptor and register setup on this core in
that much detail, but the fact the controller gets into this "TX used bit read"
state in the first place seems unusual. I'm wondering if something is being
done in the wrong order or if we are missing a memory barrier etc?
--
Robert Hancock
Senior Hardware Designer, Calian Advanced Technologies
www.calian.com
Powered by blists - more mailing lists