linux-kernel - Re: xhci_hcd 0000:00:14.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 1 comp

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20240410114601.0e25a46d@foxbook>
Date: Wed, 10 Apr 2024 11:46:01 +0200
From: Michał Pecio <michal.pecio@...il.com>
To: Mathias Nyman <mathias.nyman@...ux.intel.com>
Cc: Paul Menzel <pmenzel@...gen.mpg.de>, Mathias Nyman
 <mathias.nyman@...el.com>, LKML <linux-kernel@...r.kernel.org>,
 linux-usb@...r.kernel.org, Niklas Neronin <niklas.neronin@...ux.intel.com>
Subject: Re: xhci_hcd 0000:00:14.0: ERROR Transfer event TRB DMA ptr not
 part of current TD ep_index 1 comp_code 1

> Driver can cope with these extra events, but if this is common we
> should probably handle it silently and not concern users with that
> ERROR message.

The error message in itself is harmless, it means the driver gets an
event it doesn't know how to handle and ignores it. Further events are
processed normally (in this specific case).

What's problematic is that the controller is apparently still working
on a TD which the driver considers to be finished already. The driver
can overwrite the TD and reuse its data buffer for other transfers,
while the hardware may still need the original TD for proper operation
and, if we are very unlucky, could attempt DMA to/from the data buffer,
causing data corruption or information leak to a malicious USB dongle.

For all we know, Paul's buggy chipset may not only be confirming the
transfer twice, but really performing it twice for some stupid reason.

> We are actually at the moment looking at improving handle_tx_event()
> with Niklas (cc), and will take this into consideration.

Given the number of bugs so far, maybe it would make sense to count
transfer ring slots of the last completed TD as still "in use" until
the next TD is known to at least begin executing.

Unfortunately, "quarantining" URB data buffers in similar manner would
be harder AFAIK.

I recently found one more bug of this kind: the Etron EJ168 controller
produces two events for failed single-TRB isochronous IN transfers -
one event indicating the failure, and then a "success". The full extent
of the bug (does it affect OUT or non-isoch, what happens on multi-TRB)
is unknown because the controller is very prone to crashing under my
workloads, which doesn't help debugging.

Regards,
Michal