lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Tue, 18 May 2021 07:52:11 +0000
From:   Joakim Zhang <qiangqing.zhang@....com>
To:     Thierry Reding <treding@...dia.com>
CC:     Florian Fainelli <f.fainelli@...il.com>,
        Jon Hunter <jonathanh@...dia.com>,
        Jakub Kicinski <kuba@...nel.org>,
        "peppe.cavallaro@...com" <peppe.cavallaro@...com>,
        "alexandre.torgue@...s.st.com" <alexandre.torgue@...s.st.com>,
        "joabreu@...opsys.com" <joabreu@...opsys.com>,
        "davem@...emloft.net" <davem@...emloft.net>,
        "mcoquelin.stm32@...il.com" <mcoquelin.stm32@...il.com>,
        "andrew@...n.ch" <andrew@...n.ch>,
        dl-linux-imx <linux-imx@....com>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>
Subject: RE: [RFC net-next] net: stmmac: should not modify RX descriptor when
 STMMAC resume


Hi Thierry,

> -----Original Message-----
> From: Thierry Reding <treding@...dia.com>
> Sent: 2021年5月17日 15:53
> To: Joakim Zhang <qiangqing.zhang@....com>
> Cc: Florian Fainelli <f.fainelli@...il.com>; Jon Hunter
> <jonathanh@...dia.com>; Jakub Kicinski <kuba@...nel.org>;
> peppe.cavallaro@...com; alexandre.torgue@...s.st.com;
> joabreu@...opsys.com; davem@...emloft.net;
> mcoquelin.stm32@...il.com; andrew@...n.ch; dl-linux-imx
> <linux-imx@....com>; netdev@...r.kernel.org
> Subject: Re: [RFC net-next] net: stmmac: should not modify RX descriptor when
> STMMAC resume
> 
> On Mon, May 10, 2021 at 02:10:21AM +0000, Joakim Zhang wrote:
> >
> > Hi Florian,
> >
> > > -----Original Message-----
> > > From: Florian Fainelli <f.fainelli@...il.com>
> > > Sent: 2021年5月8日 23:42
> > > To: Joakim Zhang <qiangqing.zhang@....com>; Jon Hunter
> > > <jonathanh@...dia.com>; Jakub Kicinski <kuba@...nel.org>
> > > Cc: peppe.cavallaro@...com; alexandre.torgue@...s.st.com;
> > > joabreu@...opsys.com; davem@...emloft.net;
> > > mcoquelin.stm32@...il.com; andrew@...n.ch; dl-linux-imx
> > > <linux-imx@....com>; treding@...dia.com; netdev@...r.kernel.org
> > > Subject: Re: [RFC net-next] net: stmmac: should not modify RX
> > > descriptor when STMMAC resume
> > >
> > >
> > >
> > > On 5/8/2021 4:20 AM, Joakim Zhang wrote:
> > > >
> > > > Hi Jakub,
> > > >
> > > >> -----Original Message-----
> > > >> From: Jon Hunter <jonathanh@...dia.com>
> > > >> Sent: 2021年5月7日 22:22
> > > >> To: Joakim Zhang <qiangqing.zhang@....com>; Jakub Kicinski
> > > >> <kuba@...nel.org>
> > > >> Cc: peppe.cavallaro@...com; alexandre.torgue@...s.st.com;
> > > >> joabreu@...opsys.com; davem@...emloft.net;
> > > mcoquelin.stm32@...il.com;
> > > >> andrew@...n.ch; f.fainelli@...il.com; dl-linux-imx
> > > >> <linux-imx@....com>; treding@...dia.com; netdev@...r.kernel.org
> > > >> Subject: Re: [RFC net-next] net: stmmac: should not modify RX
> > > >> descriptor when STMMAC resume
> > > >>
> > > >> Hi Joakim,
> > > >>
> > > >> On 06/05/2021 07:33, Joakim Zhang wrote:
> > > >>>
> > > >>>> -----Original Message-----
> > > >>>> From: Jon Hunter <jonathanh@...dia.com>
> > > >>>> Sent: 2021年4月23日 21:48
> > > >>>> To: Jakub Kicinski <kuba@...nel.org>; Joakim Zhang
> > > >>>> <qiangqing.zhang@....com>
> > > >>>> Cc: peppe.cavallaro@...com; alexandre.torgue@...s.st.com;
> > > >>>> joabreu@...opsys.com; davem@...emloft.net;
> > > >> mcoquelin.stm32@...il.com;
> > > >>>> andrew@...n.ch; f.fainelli@...il.com; dl-linux-imx
> > > >>>> <linux-imx@....com>; treding@...dia.com; netdev@...r.kernel.org
> > > >>>> Subject: Re: [RFC net-next] net: stmmac: should not modify RX
> > > >>>> descriptor when STMMAC resume
> > > >>>>
> > > >>>>
> > > >>>> On 22/04/2021 16:56, Jakub Kicinski wrote:
> > > >>>>> On Thu, 22 Apr 2021 04:53:08 +0000 Joakim Zhang wrote:
> > > >>>>>> Could you please help review this patch? It's really beyond
> > > >>>>>> my comprehension, why this patch would affect Tegra186 Jetson
> > > >>>>>> TX2
> > > board?
> > > >>>>>
> > > >>>>> Looks okay, please repost as non-RFC.
> > > >>>>
> > > >>>>
> > > >>>> I still have an issue with a board not being able to resume
> > > >>>> from suspend with this patch. Shouldn't we try to resolve that first?
> > > >>>
> > > >>> Hi Jon,
> > > >>>
> > > >>> Any updates about this? Could I repost as non-RFC?
> > > >>
> > > >>
> > > >> Sorry no updates from my end. Again, I don't see how we can post
> > > >> this as it introduces a regression for us. I am sorry that I am
> > > >> not able to help more here, but we have done some extensive
> > > >> testing on the current mainline without your change and I don't
> > > >> see any issues with regard to suspend/resume. Hence, this does
> > > >> not appear to fix any pre-existing issues. It is possible that we are not
> seeing them.
> > > >>
> > > >> At this point I think that we really need someone from Synopsys
> > > >> to help us understand that exact problem that you are
> > > >> experiencing so that we can ensure we have the necessary fix in
> > > >> place and if this is something that is applicable to all devices or not.
> > > >
> > > > This patch only removes modification of Rx descriptors when STMMAC
> > > resume back, IMHO, it should not affect system suspend/resume function.
> > > > Do you have any idea about Joh's issue or any acceptable solution
> > > > to fix the
> > > issue I met? Thanks a lot!
> > >
> > > Joakim, don't you have a support contact at Synopsys who would be
> > > able to help or someone at NXP who was responsible for the MAC
> integration?
> > > We also have Synopsys engineers copied so presumably they could shed
> > > some light.
> >
> > I contacted Synopsys no substantive help was received, and integration guys
> from NXP is unavailable now.
> >
> > But, some hints has came out, seems a bit help. I found that the DMA width
> is 34 bits on i.MX8MP, this may different from many existing SoCs which
> integrated STMMAC.
> >
> > As I described in the commit message:
> > When system suspend: the rx descriptor is 008 [0x00000000c4310080]:
> > 0x0 0x40 0x0 0x34010040 When system resume: the rx descriptor modified
> > to 008 [0x00000000c4310080]: 0x0 0x40 0x0 0xb5010040 Since the DMA is 34
> bits width, so desc0/desc1 indicates the buffer address, after system resume,
> the buffer address changed to 0x4000000000.
> > And the correct rx descriptor is 008 [0x00000000c4310080]: 0x6511000 0x1
> 0x0 0x81000000, the valid buffer address is 0x16511000.
> > So when DMA tried to access 0x4000000000, this valid address, would
> generate fatal bus error.
> 
> Okay, that's interesting. If i.MX8MP supports only 34 address bits but the
> driver tries to set a DMA address of 0x4000000000, that's way out of the valid
> range.
> 
> I suspect what might be happening is that the DMA mask isn't properly set for
> your device. There's in fact some code in the driver that deals with this. If you
> look at the implementation of stmmac_dvr_probe() in
> drivers/net/ethernet/stmicro/stmmac/stmmac_main.c around line 4980,
> there's a comment that actually mentions i.MX8MP and the 34 address bit
> limitation. Can you find out what that priv->plat->addr64 is set to on your
> system?
Yes, I know STMMAC driver has taken DMA width into account, and you can see that DMA width is set
to 34 bits in dwmac-imx.c (.addr_width = 34,).

You also can see that the valid DMA address is 0x106511000, which is within 34bits. But the valid DMA
address is 0x4000000000 after system resume, which is out of 34 bits range. The reason here is that we
modify the rx descriptor when system resume.

> Or alternatively find out what priv->dma_cap.addr64 ends up being set a few
> lines further down? That value is effectively used to set the DMA mask and if
> that's wrong it might explain why the driver is setting a bad DMA address.
As I described above, the DMA address allocated when initialized is large than 32bits, MAC can play
well in normal case. So this should be impossible.

> In fact, maybe that information is already in the kernel log. There's a
> dev_info() there that should print out something like:
> 
> 	Using 34 bits DMA width
Yes, we can this log:
[    2.376903] imx-dwmac 30bf0000.ethernet: Using 34 bits DMA width

> in your case. If that says something other than 34 in there, it's very likely that
> this needs to be correctly set somewhere. Looking at the code in dwmac-imx.c,
> I see that that's already set to 34, so this looks like it should be setting things
> correctly, but better make sure.
Yes, now we make sure the DMA mask is correct.

> > But for other 32 bits width DMA, DMA seems still can work when this issue
> happened, only desc0 indicates buffer address, so the buffer address is 0x0
> when system resume.
> > And there is a NOTE in the guide:
> > In the Receive Descriptor (Read Format), if the Buffer Address field
> > is all 0s, the module does not transfer data to that buffer and skips
> > to the next buffer or next descriptor.
> > For this note, I don't know what could IP actually do, when detect all zeros
> buffer address, it will change the descriptor to application own? If not,
> STMMAC driver seems can't handle this case.
> > I will contact Synopsys guys for more details.
> >
> > It now appears that this issue seems only can be reproduced on DMA width
> more than 32 bits, this may be why other SoCs(e.g. i.MX8DXL) which integrated
> the same STMMAC IP can't reproduce it.
> 
> On Tegra186 and later we support up to 40 address bits. The newer
> Tegra194 has a special quirk where bit 39 has special meaning, so we have to
> override the DMA mask as well. I recall that this was causing issues at some
> point, which is why I suspect something like this could be happening in your
> case as well.
I am not quite understand what you means? Do you mean that our 34bits DMA width also
has a special meaning?

Thanks much Thierry for helping analyzing this issue. As I described in the commit message, we should
not _only_ change the rx descriptors to DMA own and let other parts of rx descriptors not updated.
So could you please help check why this RFC would make regression at you side? Why system can't resume back? 

Best Regards,
Joakim Zhang
> Thierry

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ