lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <DB8PR04MB67959938BC55D9ADA00488EAE6239@DB8PR04MB6795.eurprd04.prod.outlook.com>
Date:   Thu, 27 May 2021 13:13:06 +0000
From:   Joakim Zhang <qiangqing.zhang@....com>
To:     Jon Hunter <jonathanh@...dia.com>,
        "f.fainelli@...il.com" <f.fainelli@...il.com>,
        "peppe.cavallaro@...com" <peppe.cavallaro@...com>,
        "alexandre.torgue@...s.st.com" <alexandre.torgue@...s.st.com>,
        "joabreu@...opsys.com" <joabreu@...opsys.com>,
        "davem@...emloft.net" <davem@...emloft.net>,
        "kuba@...nel.org" <kuba@...nel.org>,
        "mcoquelin.stm32@...il.com" <mcoquelin.stm32@...il.com>,
        "andrew@...n.ch" <andrew@...n.ch>,
        "treding@...dia.com" <treding@...dia.com>
CC:     "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        dl-linux-imx <linux-imx@....com>
Subject: RE: [PATCH V1 net-next] net: stmmac: should not modify RX descriptor
 when STMMAC resume


Hi Joh,

> -----Original Message-----
> From: Jon Hunter <jonathanh@...dia.com>
> Sent: 2021年5月27日 20:43
> To: Joakim Zhang <qiangqing.zhang@....com>; f.fainelli@...il.com;
> peppe.cavallaro@...com; alexandre.torgue@...s.st.com;
> joabreu@...opsys.com; davem@...emloft.net; kuba@...nel.org;
> mcoquelin.stm32@...il.com; andrew@...n.ch; treding@...dia.com
> Cc: netdev@...r.kernel.org; dl-linux-imx <linux-imx@....com>
> Subject: Re: [PATCH V1 net-next] net: stmmac: should not modify RX descriptor
> when STMMAC resume
> 
> 
> On 27/05/2021 09:49, Joakim Zhang wrote:
> > When system resume back, STMMAC will clear RX descriptors:
> > stmmac_resume()
> > 	->stmmac_clear_descriptors()
> > 		->stmmac_clear_rx_descriptors()
> > 			->stmmac_init_rx_desc()
> > 				->dwmac4_set_rx_owner()
> > 				//p->des3 |= cpu_to_le32(RDES3_OWN |
> RDES3_BUFFER1_VALID_ADDR); It
> > only asserts OWN and BUF1V bits in desc3 field, doesn't clear desc0/1/2
> fields.
> >
> > Let's take a case into account, when system suspend, it is possible
> > that there are packets have not received yet, so the RX descriptors
> > are wrote back by DMA, e.g.
> > 008 [0x00000000c4310080]: 0x0 0x40 0x0 0x34010040
> >
> > When system resume back, after above process, it became a broken
> > descriptor:
> > 008 [0x00000000c4310080]: 0x0 0x40 0x0 0xb5010040
> >
> > The issue is that it only changes the owner of this descriptor, but do
> > nothing about desc0/1/2 fields. The descriptor of STMMAC a bit
> > special, applicaton prepares RX descriptors for DMA, after DMA recevie
> > the packets, it will write back the descriptors, so the same field of
> > a descriptor have different meanings to application and DMA. It should
> > be a software bug there, and may not easy to reproduce, but there is a
> > certain probability that it will occur.
> >
> > i.MX8MP STMMAC DMA width is 34 bits, so desc0/desc1 indicates the
> > buffer address, after system resume, the buffer address changes to
> > 0x40_00000000. And the correct rx descriptor is 008 [0x00000000c4310080]:
> > 0x6511000 0x1 0x0 0x81000000, the valid buffer address is 0x1_6511000.
> > So when DMA tried to access the invalid address 0x40_00000000 would
> > generate fatal bus error.
> >
> > But for other 32 bits width DMA, DMA still can work when this issue
> > happened, only desc0 indicates buffer address, so the buffer address
> > is 0x00000000 when system resume.
> >
> > There is a NOTE in the Guide:
> > In the Receive Descriptor (Read Format), if the Buffer Address field
> > is all 0s, the module does not transfer data to that buffer and skips
> > to the next buffer or next descriptor.
> >
> > Also a feedback from SYPS:
> > When buffer address field of Rx descriptor is all 0's, DMA skips such
> > descriptor means DMA closes Rx descriptor as Intermediate descriptor
> > with OWN bit set to 0, indicates that the application owns this descriptor.
> >
> > It now appears that this issue seems only can be reproduced on DMA
> > width more than 32 bits, this may be why other SoCs which integrated
> > the same STMMAC IP can't reproduce it.
> >
> > Commit 9c63faaa931e ("net: stmmac: re-init rx buffers when mac resume
> > back") tried to re-init desc0/desc1 (buffer address fields) to fix
> > this issue, but it is not a proper solution, and made regression on Jetson TX2
> boards.
> >
> > It is unreasonable to modify RX descriptors outside of
> > stmmac_rx_refill() function, where it will clear all desc0/desc1/desc2/desc3
> fields together.
> >
> > This patch removes RX descriptors modification when STMMAC resume.
> >
> > Signed-off-by: Joakim Zhang <qiangqing.zhang@....com>
> > ---
> > ChangeLogs:
> > 	V1: remove RFC tag, please come here for RFC discussion:
> >
> > https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore
> > .kernel.org%2Fnetdev%2Fcec17489-2ef9-7862-94c8-202d31507a0c%40nvidia
> .c
> >
> om%2FT%2F&amp;data=04%7C01%7Cqiangqing.zhang%40nxp.com%7C16be3
> 6b4a2584
> >
> a4c208608d9210d09ef%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%
> 7C63757
> >
> 7162155074221%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJ
> QIjoiV2lu
> >
> MzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=BdsNu6l4%2Bt
> Q6WllA
> > tVn%2BP1jD3sRXfpIH2XErmRh%2BjLA%3D&amp;reserved=0
> > ---
> >  drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> > b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> > index bf9fe25fed69..2570d26286ea 100644
> > --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> > +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> > @@ -7187,6 +7187,8 @@ static void stmmac_reset_queues_param(struct
> stmmac_priv *priv)
> >  		tx_q->mss = 0;
> >
> >  		netdev_tx_reset_queue(netdev_get_tx_queue(priv->dev, queue));
> > +
> > +		stmmac_clear_tx_descriptors(priv, queue);
> >  	}
> >  }
> >
> > @@ -7251,7 +7253,6 @@ int stmmac_resume(struct device *dev)
> >  	stmmac_reset_queues_param(priv);
> >
> >  	stmmac_free_tx_skbufs(priv);
> > -	stmmac_clear_descriptors(priv);
> >
> >  	stmmac_hw_setup(ndev, false);
> >  	stmmac_init_coalesce(priv);
> >
> 
> 
> So as previously mentioned this still causing a regression when resuming from
> suspend on Jetson TX2 platform. I am not sure why you are still attempting to
> push this patch as-is when it causes a complete failure for another platform. I
> am quite disappointed that you are ignoring the issue we have reported :-(
I first pushed the RFC and discussed about the issue, I think this patch trigger a potential issue at your side. 
IMHO, you may need try to find the root case why this patch make regression on your platform.

> To summarise we do not see any issues with suspend on Jetson TX2 without
> this patch. I have stressed suspend on this board doing 2000 suspend iterations
> and so no issues. However, this patch completely breaks resuming from
> suspend for us. Therefore, I don't see how we can merge this.
If you read the commit message, you should know you can't reproduce this issue if your DMA bit width is 32 bits.
Please take the commit message seriously, do you admit this is a real bug? After the analysis, this may a common issue.

> Given that this fixes a problem, that appears to be specific to your platform,
> why do you not implement this in away such that this is only done for your
> platform?
I really don't know how to take your case into account, let us add a flag in "struct plat_stmmacenet_data" to 
separate different cases? If maintainer agree with this, I can do.

I agree keep this patch doesn't merge until a way also fit your platform. If there is still no good solutions, I will try to add
a specific flag in platform data.

Best Regards,
Joakim Zhang
> Jon
> 
> --
> nvpublic

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ