lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YRaFQRyD8fwc6PEz@orome.fritz.box>
Date:   Fri, 13 Aug 2021 16:44:17 +0200
From:   Thierry Reding <thierry.reding@...il.com>
To:     Marc Zyngier <maz@...nel.org>
Cc:     Matteo Croce <mcroce@...ux.microsoft.com>, netdev@...r.kernel.org,
        linux-kernel@...r.kernel.org, linux-riscv@...ts.infradead.org,
        Giuseppe Cavallaro <peppe.cavallaro@...com>,
        Alexandre Torgue <alexandre.torgue@...s.st.com>,
        "David S. Miller" <davem@...emloft.net>,
        Jakub Kicinski <kuba@...nel.org>,
        Palmer Dabbelt <palmer@...belt.com>,
        Paul Walmsley <paul.walmsley@...ive.com>,
        Drew Fustini <drew@...gleboard.org>,
        Emil Renner Berthing <kernel@...il.dk>,
        Jon Hunter <jonathanh@...dia.com>,
        Will Deacon <will@...nel.org>
Subject: Re: [PATCH net-next] stmmac: align RX buffers

On Thu, Aug 12, 2021 at 04:26:41PM +0100, Marc Zyngier wrote:
> On Thu, 12 Aug 2021 15:29:06 +0100,
> Thierry Reding <thierry.reding@...il.com> wrote:
> > 
> > On Wed, Aug 11, 2021 at 02:23:10PM +0100, Marc Zyngier wrote:
> 
> [...]
> 
> > > I love this machine... Did this issue occur with the Denver CPUs
> > > disabled?
> > 
> > Interestingly I've been doing some work on a newer device called Jetson
> > TX2 NX (which is kind of a trimmed-down version of Jetson TX2, in the
> > spirit of the Jetson Nano) and I can't seem to reproduce these failures
> > there (tested on next-20210812).
> > 
> > I'll go dig out my Jetson TX2 to run the same tests there, because I've
> > also been using a development version of the bootloader stack and
> > flashing tools and all that, so it's possible that something was fixed
> > at that level. I don't think I've ever tried disabling the Denver CPUs,
> > but then I've also never seen these issues myself.
> > 
> > Just out of curiosity, what version of the BSP have you been using to
> > flash?
> 
> I've only used the BSP for a few weeks when I got the board last
> year. The only thing I use from it is u-boot to chainload an upstream
> u-boot, and boot Debian from there.

That's interesting... have you ever tried to inject a version of
upstream U-Boot into the BSP and have it flash that instead? That should
allow you to drop the chainloading step.

Not that that's likely to have anything to do with this.

> > One other thing that I ran into: there's a known issue with the PHY
> > configuration. We mark the PHY on most devices as "rgmii-id" on most
> > devices and then the Marvell PHY driver needs to be enabled. Jetson TX2
> > has phy-mode = "rgmii", so it /should/ work okay.
> > 
> > Typically what we're seeing with that misconfiguration is that the
> > device fails to get an IP address, but it might still be worth trying to
> > switch Jetson TX2 to rgmii-id and using the Marvell PHY, to see if that
> > improves anything.
> 
> I never failed to get an IP address. Overall, networking has been
> solid on this machine until this patch. I'll try and mess with this
> when I get time, but that's probably going to be next week now.

So I've hooked up my Jetson TX2 and tried various workloads. I wasn't
able to reproduce this on next-20210813. I've tried both the L4T 32.6.1
release and a local development build.

Perhaps one thing to try would be to upgrade your L4T BSP to something
newer. I know that there have occasionally been bugs in the MTS
firmware, which is what's running on the Denver cores, and newer BSPs
can fix those kinds of issues.

If that doesn't help, perhaps try to read out the SoC version numbers so
that we can compare. I know that some newer Tegra186 chips behave
slightly differently, so that's perhaps a difference that would explain
why it's not happening on all devices.

You can read the version and revision from sysfs using something like:

	# cat /sys/devices/soc0/{major,minor,revision}

> [...]
> 
> > > That'd be pretty annoying. Do you know if the Ethernet is a coherent
> > > device on this machine? or does it need active cache maintenance?
> > 
> > I don't think Ethernet is a coherent device on Tegra186. I think
> > Tegra194 had various improvements with regard to coherency, but most
> > devices on Tegra186 do need active cache maintenance.
> > 
> > Let me dig through some old patches and mailing list threads. I vaguely
> > recall prototyping a patch that did something special for outer cache
> > flushing, but that may have been Tegra132, not Tegra186. I also don't
> > think we ended up merging that because it turned out to not be needed.
> 
> ARMv8 forbid any sort of *visible* outer cache, so I really hope this
> is not required. We wouldn't be able to support it.

I couldn't find any trace of this anywhere. So I'm possibly
misremembering. It's also more likely that this was on an earlier SoC
generation, otherwise I'd probably remember more clearly.

Thierry

Download attachment "signature.asc" of type "application/pgp-signature" (834 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ