lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAL_JsqLrErF__GGHfanRFCpfbOh6fvz4-aJv32h8OfDjUeZPSg@mail.gmail.com>
Date:   Fri, 4 Aug 2023 09:52:20 -0600
From:   Rob Herring <robh@...nel.org>
To:     Nick Bowler <nbowler@...conx.ca>
Cc:     linux-kernel@...r.kernel.org, linux-arm-kernel@...ts.infradead.org,
        netdev@...r.kernel.org, regressions@...ts.linux.dev
Subject: Re: PROBLEM: Broken or delayed ethernet on Xilinx ZCU104 since 5.18 (regression)

On Fri, Aug 4, 2023 at 9:27 AM Nick Bowler <nbowler@...conx.ca> wrote:
>
> Hi,
>
> With recent kernels (5.18 and newer) the ethernet is all wonky on my
> ZCU104 board.
>
> There is some behaviour inconsistency between kernel versions identified
> during bisection, so maybe there is more than one issue with the ethernet?
>
>   6.5-rc4: after 10 seconds, the following message is printed:
>
>     [   10.761808] platform ff0e0000.ethernet: deferred probe pending
>
>   but the network device seemingly never appears (I waited about a minute).
>
>   6.1 and 6.4: after 10 seconds, the device suddenly appears and starts
>   working (but this is way too late).

10 sec is probably the deferred probe timeout. You can set this to
less time on the kernel command line.

>   5.18: the device never appears and no unusual messages are printed
>   (I waited ten minutes).
>
> With 5.17 and earlier versions, the eth0 device appears without any delay.
>
> Unfortunately, as bisection closed on the problematic section, all the
> built kernels became untestable as they appear to crash during early
> boot.  Nevertheless, I manually selected a commit that sounded relevant:
>
>   commit e461bd6f43f4e568f7436a8b6bc21c4ce6914c36
>   Author: Robert Hancock <robert.hancock@...ian.com>
>   Date:   Thu Jan 27 10:37:36 2022 -0600
>
>       arm64: dts: zynqmp: Added GEM reset definitions
>
> Reverting this fixes the problem on 5.18.  Reverting this fixes the
> problem on 6.1.  Reverting this fixes the problem on 6.4.  In all of
> these versions, with this change reverted, the network device appears
> without delay.

With the above change, the kernel is going to be waiting for the reset
driver which either didn't exist or wasn't enabled in your config
(maybe kconfig needs to be tweaked to enable it automatically).

There's not really a better solution than the probe timeout when the
DT was incomplete and new dependencies get added.

> Unfortunately, it seems this is not sufficient to correct the problem on
> 6.5-rc4 -- there is no apparent change in behaviour, so maybe there is
> a new, different problem?

Probably. You might check what changed with fw_devlink in that period.
(Offhand, I don't recall many changes)

> I guess I can kick off another bisection to find out when this revert
> stops fixing things...

That always helps.

Rob

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ