lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Message-ID: <CAD=FV=UKe33R0z-Qu32F2q4eHdwox88oTprgvDf_bMJZcBM+hQ@mail.gmail.com> Date: Wed, 4 Oct 2023 13:12:17 -0700 From: Doug Anderson <dianders@...omium.org> To: Jakub Kicinski <kuba@...nel.org>, Hayes Wang <hayeswang@...ltek.com>, "David S . Miller" <davem@...emloft.net> Cc: linux-usb@...r.kernel.org, Alan Stern <stern@...land.harvard.edu>, Grant Grundler <grundler@...omium.org>, Edward Hill <ecgh@...omium.org>, Bjørn Mork <bjorn@...k.no>, Eric Dumazet <edumazet@...gle.com>, Paolo Abeni <pabeni@...hat.com>, Simon Horman <horms@...nel.org>, linux-kernel@...r.kernel.org, netdev@...r.kernel.org Subject: Re: [PATCH v2 5/5] r8152: Block future register access if register access fails Hi, On Wed, Oct 4, 2023 at 12:27 PM Douglas Anderson <dianders@...omium.org> wrote: > > Even though the functions to read/write registers can fail, most of > the places in the r8152 driver that read/write register values don't > check error codes. The lack of error code checking is problematic in > at least two ways. > > The first problem is that the r8152 driver often uses code patterns > similar to this: > x = read_register() > x = x | SOME_BIT; > write_register(x); > > ...with the above pattern, if the read_register() fails and returns > garbage then we'll end up trying to write modified garbage back to the > Realtek adapter. If the write_register() succeeds that's bad. Note > that as of commit f53a7ad18959 ("r8152: Set memory to all 0xFFs on > failed reg reads") the "garbage" returned by read_register() will at > least be consistent garbage, but it is still garbage. > > It turns out that this problem is very serious. Writing garbage to > some of the hardware registers on the Ethernet adapter can put the > adapter in such a bad state that it needs to be power cycled (fully > unplugged and plugged in again) before it can enumerate again. > > The second problem is that the r8152 driver generally has functions > that are long sequences of register writes. Assuming everything will > be OK if a random register write fails in the middle isn't a great > assumption. > > One might wonder if the above two problems are real. You could ask if > we would really have a successful write after a failed read. It turns > out that the answer appears to be "yes, this can happen". In fact, > we've seen at least two distinct failure modes where this happens. > > On a sc7180-trogdor Chromebook if you drop into kdb for a while and > then resume, you can see: > 1. We get a "Tx timeout" > 2. The "Tx timeout" queues up a USB reset. > 3. In rtl8152_pre_reset() we try to reinit the hardware. > 4. The first several (2-9) register accesses fail with a timeout, then > things recover. > > The above test case was actually fixed by the patch ("r8152: Increase > USB control msg timeout to 5000ms as per spec") but at least shows > that we really can see successful calls after failed ones. > > On a different (AMD) based Chromebook with a particular adapter, we > found that during reboot tests we'd also sometimes get a transitory > failure. In this case we saw -EPIPE being returned sometimes. Retrying > worked, but retrying is not always safe for all register accesses > since reading/writing some registers might have side effects (like > registers that clear on read). > > Let's fully lock out all register access if a register access fails. > When we do this, we'll try to queue up a USB reset and try to unlock > register access after the reset. This is slightly tricker than it > sounds since the r8152 driver has an optimized reset sequence that > only works reliably after probe happens. In order to handle this, we > avoid the optimized reset if probe didn't finish. > > When locking out access, we'll use the existing infrastructure that > the driver was using when it detected we were unplugged. This keeps us > from getting stuck in delay loops in some parts of the driver. > > - Reset patch no longer based on retry patch, since that was dropped. > - Reset patch should be robust even if failures happen in probe. > - Switched booleans to bits in the "flags" variable. > - Check for -ENODEV instead of "udev->state == USB_STATE_NOTATTACHED" Argh, the above 4 bullet points were supposed to get moved "after the cut" and describe the differences between v1 and v2. Sorry! :( To avoid spamming people, I won't send another version and will wait for feedback. I'm happy to fix and send a new version at any time, though.
Powered by blists - more mailing lists