netdev - RE: [PATCH v2] net: macb: Restart tx only if queue pointer is lagging

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <BL3PR02MB818774AF335BBB8542F22072C9E99@BL3PR02MB8187.namprd02.prod.outlook.com>
Date:   Fri, 8 Apr 2022 08:47:12 +0000
From:   Harini Katakam <harinik@...inx.com>
To:     "Claudiu.Beznea@...rochip.com" <Claudiu.Beznea@...rochip.com>,
        "tomas.melin@...sala.com" <tomas.melin@...sala.com>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>
CC:     "Nicolas.Ferre@...rochip.com" <Nicolas.Ferre@...rochip.com>,
        "davem@...emloft.net" <davem@...emloft.net>,
        "kuba@...nel.org" <kuba@...nel.org>,
        "pabeni@...hat.com" <pabeni@...hat.com>,
        Shubhrajyoti Datta <shubhraj@...inx.com>,
        Michal Simek <michals@...inx.com>,
        "pthombar@...ence.com" <pthombar@...ence.com>,
        "mparab@...ence.com" <mparab@...ence.com>,
        "rafalo@...ence.com" <rafalo@...ence.com>
Subject: RE: [PATCH v2] net: macb: Restart tx only if queue pointer is lagging

Hi Claudiu, Tomas,

> -----Original Message-----
> From: Claudiu.Beznea@...rochip.com <Claudiu.Beznea@...rochip.com>
> Sent: Friday, April 8, 2022 1:13 PM
> To: tomas.melin@...sala.com; netdev@...r.kernel.org
> Cc: Nicolas.Ferre@...rochip.com; davem@...emloft.net; kuba@...nel.org;
> pabeni@...hat.com; Harini Katakam <harinik@...inx.com>; Shubhrajyoti
> Datta <shubhraj@...inx.com>; Michal Simek <michals@...inx.com>;
> pthombar@...ence.com; mparab@...ence.com; rafalo@...ence.com
> Subject: Re: [PATCH v2] net: macb: Restart tx only if queue pointer is lagging
> 
> Hi, Tomas,
> 
> I'm returning to this new thread.
> 
> Sorry for the long delay. I looked though my emails for the steps to
> reproduce the bug that introduces macb_tx_restart() but haven't found
> them.
> Though the code in this patch should not affect at all SAMA5D4.
> 
> I have tested anyway SAMA5D4 with and without your code and saw no
> issues.
> In case Dave, Jakub want to merge it you can add my
> Tested-by: Claudiu Beznea <claudiu.beznea@...rochip.com>
> Reviewed-by: Claudiu Beznea <claudiu.beznea@...rochip.com>
> 
> The only thing with this patch, as mention earlier, is that freeing of packet N
> may depend on sending packet N+1 and if packet N+1 blocks again the HW
> then the freeing of packets N, N+1 may depend on packet N+2 etc. But from
> your investigation it seems hardware has some bugs.
> 
> FYI, I looked though Xilinx github repository and saw no patches on macb that
> may be related to this issue.
> 
> Anyway, it would be good if there would be some replies from Xilinx or at
> least Cadence people on this (previous thread at [1]).

Sorry for the delayed response.
I saw the condition you described and I'm not able to reproduce it.
But I agree with your assessment that restarting TX will not help in this case.
Also, the original patch restarting TX was also not reproduced on Zynq board
easily. We've had some users report the issue after > 1hr of traffic but that was
on a 4.xx kernel and I'm afraid I don’t have a case where I can reproduce the
original issue Claudiu described on any 5.xx kernel.

Based on the thread, there is one possibility for a HW bug that controller fails to
generate TCOMP when a TXUBR and restart conditions occur because these interrupts
are edge triggered on Zynq.

I'm going to check the errata and let you know if I find anything relevant and also
request Cadence folks to comment.
I'm sorry ask but is this condition reproducible on any later variants of the IP in Xilinx or
non-Xilinx devices?
Zynq US+ MPSoC has the r1p07 while Zynq has the older version IP r1p23 (old versioning)

Regards,
Harini

> 
> Thank you,
> Claudiu Beznea
> 
> [1]
> https://lore.kernel.org/netdev/82276bf7-72a5-6a2e-ff33-
> f8fe0c5e4a90@...rochip.com/T/#m644c84a8709a65c40b8fc15a589e83b24e4
> 8ccfd
> 
> On 07.04.2022 19:16, Tomas Melin wrote:
> > EXTERNAL EMAIL: Do not click links or open attachments unless you know
> > the content is safe
> >
> > commit 4298388574da ("net: macb: restart tx after tx used bit read")
> > added support for restarting transmission. Restarting tx does not work
> > in case controller asserts TXUBR interrupt and TQBP is already at the
> > end of the tx queue. In that situation, restarting tx will immediately
> > cause assertion of another TXUBR interrupt. The driver will end up in
> > an infinite interrupt loop which it cannot break out of.
> >
> > For cases where TQBP is at the end of the tx queue, instead only clear
> > TX_USED interrupt. As more data gets pushed to the queue, transmission
> > will resume.
> >
> > This issue was observed on a Xilinx Zynq-7000 based board.
> > During stress test of the network interface, driver would get stuck on
> > interrupt loop within seconds or minutes causing CPU to stall.
> >
> > Signed-off-by: Tomas Melin <tomas.melin@...sala.com>
> > ---
> > Changes v2:
> > - change referenced commit to use original commit ID instead of stable
> > branch ID
> >
> >  drivers/net/ethernet/cadence/macb_main.c | 8 ++++++++
> >  1 file changed, 8 insertions(+)
> >
> > diff --git a/drivers/net/ethernet/cadence/macb_main.c
> > b/drivers/net/ethernet/cadence/macb_main.c
> > index 800d5ced5800..e475be29845c 100644
> > --- a/drivers/net/ethernet/cadence/macb_main.c
> > +++ b/drivers/net/ethernet/cadence/macb_main.c
> > @@ -1658,6 +1658,7 @@ static void macb_tx_restart(struct macb_queue
> *queue)
> >         unsigned int head = queue->tx_head;
> >         unsigned int tail = queue->tx_tail;
> >         struct macb *bp = queue->bp;
> > +       unsigned int head_idx, tbqp;
> >
> >         if (bp->caps & MACB_CAPS_ISR_CLEAR_ON_WRITE)
> >                 queue_writel(queue, ISR, MACB_BIT(TXUBR)); @@ -1665,6
> > +1666,13 @@ static void macb_tx_restart(struct macb_queue *queue)
> >         if (head == tail)
> >                 return;
> >
> > +       tbqp = queue_readl(queue, TBQP) / macb_dma_desc_get_size(bp);
> > +       tbqp = macb_adj_dma_desc_idx(bp, macb_tx_ring_wrap(bp, tbqp));
> > +       head_idx = macb_adj_dma_desc_idx(bp, macb_tx_ring_wrap(bp,
> > + head));
> > +
> > +       if (tbqp == head_idx)
> > +               return;
> > +
> >         macb_writel(bp, NCR, macb_readl(bp, NCR) | MACB_BIT(TSTART));
> > }
> >
> > --
> > 2.35.1
> >