linux-kernel - Re: [PATCH 1/1] Avoid usb reset crashes by making tty

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <55B16EA9.3020609@kynesim.co.uk>
Date:	Thu, 23 Jul 2015 23:46:01 +0100
From:	Richard Watts <rrw@...esim.co.uk>
To:	Greg KH <gregkh@...uxfoundation.org>
CC:	linux-serial@...r.kernel.org, linux-usb@...r.kernel.org,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH 1/1] Avoid usb reset crashes by making tty_io cdevs truly
 dynamic

Hi,

  Sure - sorry, my description was a little .. basic.

  So, I have a client who was having problems with machines hanging in
the field. Very rare, associated with a h/w change that introduced
more cores. Kernel dumps implied that the timer list was getting
corrupted.

  This configuration of machine is an SBC on a board which communicates
with the SBC (partly) via a USB CDC device, which pops up as
/dev/ttyACM0.

  So one of the things we turned on was CONFIG_DEBUG_KOBJECT_RELEASE.
One of the side-effects of this is to delay kobject destruction.

  When we did that, we could reproduce the crash by performing a
USB reset on the CDC device -  and logs suggest that this was
happening in the field too.

  When the USB reset happens, we get a bunch of complaints from the
kernel.

  Some of these are to do with races on the kobjects associated with the
sysfs entries for the ttyACM0 device. They turn out not to be fatal,
and have their own patch series ('Attempt to cope with device changes
and delayed kobject deallocation' on linux-kernel).

  The fatal one turns out to be an execution path that goes like this:

  1 USB device declares itself to be CDC
  2 tty driver fires up and allocates a cdev for the relevant tty.
  3 driver->cdevs[0].kobj gets initialised as part of the cdev_alloc()
  4 USB reset happens, queueing driver->cdevs[0].kobj for release.
  5 The tty driver calls cdev_init(&driver->cdevs[0]), which
      reinitialises driver->cdevs[0].kobj with a refcount of 1.
  6 tty driver starts using that new cdev, queueing an operation on it.
     This causes a timer entry to be added including the kobj.
  7 At this point, the release we scheduled in (4) happens and the
     members of kobj are deallocated.
  8 Someone allocates the newly released memory for one of the members of
      cdriver->cdevs[0].kobj somewhere else and overwrites it.
  9 The timer goes off.
10 Boom

  My patch (ham-fistedly) fixes this by ensuring that because we
never reuse the cdev pointer, we are never fooled into reinitialising
a kobject queued for deletion.

  I'm not all that familiar with how the locking should go here, and
there is a definite argument that under non CONFIG_DEBUG_KOBJECT_RELEASE
conditions, the kobject_release() would have happened by 5, and
therefore this situation should never exist "for real".

  .. but (a) that makes it rather hard to test kernels with
CONFIG_DEBUG_KOBJECT_RELEASE, and (b) my customer's crashes have
(allegedly) now gone away even without CONFIG_DEBUG_KOBJECT_RELEASE
set.

  Does that help at all? I've attached my 0/1, just in case that
got lost somewhere.

Richard.

Download attachment "Attached Message" of type "message/rfc822" (1555 bytes)