lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230926212824.1512665-1-dianders@chromium.org>
Date:   Tue, 26 Sep 2023 14:27:25 -0700
From:   Douglas Anderson <dianders@...omium.org>
To:     Jakub Kicinski <kuba@...nel.org>,
        Hayes Wang <hayeswang@...ltek.com>,
        "David S . Miller" <davem@...emloft.net>
Cc:     linux-usb@...r.kernel.org, Grant Grundler <grundler@...omium.org>,
        Edward Hill <ecgh@...omium.org>,
        Douglas Anderson <dianders@...omium.org>,
        andre.przywara@....com, anton@...it.no, bjorn@...k.no,
        edumazet@...gle.com, gaul@...l.org, horms@...nel.org,
        linux-kernel@...r.kernel.org, netdev@...r.kernel.org,
        pabeni@...hat.com
Subject: [PATCH 0/3] r8152: Avoid writing garbage to the adapter's registers

This 3-patch series is the result of a cooperative debug effort
between Realtek and the ChromeOS team. On ChromeOS, we've noticed that
Realtek Ethernet adapters can sometimes get so wedged that even a
reboot of the host can't get them to enumerate again, assuming that
the adapter was on a powered hub and din't lose power when the host
rebooted. This is sometimes seen in the ChromeOS automated testing
lab. The only way to recover adapters in this state is to manually
power cycle them.

I managed to reproduce one instance of this wedging (unknown if this
is truly related to what the test lab sees) by doing this:
1. Start a flood ping from a host to the device.
2. Drop the device into kdb.
3. Wait 90 seconds.
4. Resume from kdb (the "g" command).
5. Wait another 45 seconds.

Upon analysis, Realtek realized this was happening:

1. The Linux driver was getting a "Tx timeout" after resuming from kdb
   and then trying to reset itself.
2. As part of the reset, the Linux driver was attempting to do a
   read-modify-write of the adapter's registers.
3. The read would fail (due to a timeout) and the driver pretended
   that the register contained all 0xFFs. See commit f53a7ad18959
   ("r8152: Set memory to all 0xFFs on failed reg reads")
4. The driver would take this value of all 0xFFs, modify it, and
   attempt to write it back to the adapter.
5. By this time the USB channel seemed to recover and thus we'd
   successfully write a value that was mostly 0xFFs to the adpater.
6. The adapter didn't like this and would wedge itself.

Another Engineer also managed to reproduce wedging of the Realtek
Ethernet adpater during a reboot test on an AMD Chromebook. In that
case he was sometimes seeing -EPIPE returned from the control
transfers.

This patch series fixes both issues.


Douglas Anderson (3):
  r8152: Increase USB control msg timeout to 5000ms as per spec
  r8152: Retry register reads/writes
  r8152: Block future register access if register access fails

 drivers/net/usb/r8152.c | 124 +++++++++++++++++++++++++++++++++++++---
 1 file changed, 115 insertions(+), 9 deletions(-)

-- 
2.42.0.515.g380fc7ccd1-goog

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ