linux-kernel - Re: [announce] vfs-scale git tree update

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <AANLkTimdV1WtdPXeZ8JO40gkC=2dt27bqKxGORuHVyrn@mail.gmail.com>
Date:	Thu, 6 Jan 2011 17:41:39 -0800
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	Chris Ball <cjb@...top.org>
Cc:	Nick Piggin <npiggin@...il.com>,
	Jongman Heo <jongman.heo@...il.com>,
	linux-kernel@...r.kernel.org,
	Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [announce] vfs-scale git tree update

On Thu, Jan 6, 2011 at 4:59 PM, Chris Ball <cjb@...top.org> wrote:
>
> In my case, the hang happens when microcode.ko is modprobed and calls
> out for device firmware via request_firmware(), and then udev also calls
> microcode_ctl, which attempts to open(2) /dev/cpu/microcode to write
> microcode into it.  (The request_firmware() interface is the preferred
> one, and opening /dev/cpu/microcode is an older compatibility interface.)

Hmm. That modprobe seems to be hung on 'sysdev_drivers_lock'.

Which in turn seems to be _held_ by the first modprobe, which is
waiting for a request_firmware:

  [  256.980052] modprobe        D 00000000ffff4f88     0   372      1
0x00000000
  [  256.981227]  ffff88022206dc58 0000000000000086 0000000000000292
00000000ffffffff
  [  256.982415]  0000000000013840 0000000000013840 0000000000013840
ffff88022620dc40
  [  256.983692]  0000000000013840 ffff88022206dfd8 0000000000013840
0000000000013840
  [  256.984979] Call Trace:
  [  256.986306]  [<ffffffff81463a41>] schedule_timeout+0x36/0xe3
  [  256.987615]  [<ffffffff8110ad4c>] ? kfree+0xc9/0xd6
  [  256.988893]  [<ffffffff8103d243>] ? need_resched+0x23/0x2d
  [  256.990337]  [<ffffffff81463824>] wait_for_common+0xad/0x102
  [  256.991637]  [<ffffffff8104757f>] ? default_wake_function+0x0/0x14
  [  256.992954]  [<ffffffff81463931>] wait_for_completion+0x1d/0x1f
  [  256.994360]  [<ffffffff812f42df>] _request_firmware+0x2df/0x39a
  [  256.999744]  [<ffffffffa00f6358>] microcode_init_cpu+0xc4/0x115 [microcode]
  [  257.001112]  [<ffffffffa00f6409>] mc_sysdev_add+0x60/0x76 [microcode]
  [  257.002458]  [<ffffffff812e9772>] sysdev_driver_register+0xc0/0x11b

and everybody else is in the open path for the microcode. And that
request_firmware holds the lock, because it's done through the ->add()
function of another sysdev_driver_register().

I'm wondering if this is a previously existing race condition leading
to a deadlock. One that previously would have been serialized enough
by the dcache lock that you'd never have that happen.

It might be interesting to re-run it with mutex debugging and lockdep
enabled, to see if that reports anything. Although it probably won't,
because it's not about a plain lock dependency, but ends up being
deadlocked on the uevent being finished (but you have the modprobe and
the request_firmware ones waiting on each other).

I dunno. I haven't really though that fully through. But we've had
cases roughly like that before, and yes, they can be exposed by some
independent serialization going away - long-standing potential bugs,
that simply never happened in practice before.

                      Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/