lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Sun, 3 Apr 2016 06:54:08 -0500
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	Peter Hurley <peter@...leysoftware.com>
Cc:	Ulf Hansson <ulf.hansson@...aro.org>,
	linux-mmc <linux-mmc@...r.kernel.org>,
	Adrian Hunter <adrian.hunter@...el.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	Jaehoon Chung <jh80.chung@...sung.com>
Subject: Re: [bisect] Merge tag 'mmc-v4.6' of git://git.linaro.org/people/ulf.hansson/mmc
 (was [GIT PULL] MMC for v.4.6)

On Sat, Apr 2, 2016 at 9:56 PM, Peter Hurley <peter@...leysoftware.com> wrote:
>
> Note how mmc1 => mmcblk0 and mmc0 => mmcblk1.
>
> This produces a failure to boot as the wrong partition is mounted as
> root (/dev/mmcblk0p2 is now on the wrong mmc).

It *looks* very much like somebody is doing asynchronous probing of
the bus, meaning that the devices get probed in random order.

And that "random order" is admittedly probably usually fairly static
on any particular hardware platform, but then something happens to
change timing, and...

This is why you should never probe the actual *bus* asynchronously,
just do the end-point setup async. For example, you'd enumerate ports
(and assign devices to the ports) synchronously, but then after device
assignment the actual device probing can be async.

> The bisect tried all the mmc tree patches which were all good.
> I double-checked by cloning the mmc tree and building both mmc-v4.6
> and v4.5-rc6, and both tested good.
>
> I interpret that to mean some change in mmc + some new behavior elsewhere
> for v4.6 is causing this. Any ideas?

Hmm. If it really is just timing, it could have been around forever,
and just hidden by the fact that normally mmc0 gets probed before
mmc1, but then some other probing thing slowed down or the exact
details of the async workqueue  scheduling changed, and now mmc1 just
*happens* to get probed first..

The thing that changed scheduling order could easily have come from
some non-mmc change.

NOTE! I have nothing to back this up except that (a) we've had
problems like this before and (b) it does look from your dmesg that
mmcX is simply probed in the "wrong" order. I didn't look at exactly
what mmc does or who does the probing.

Maybe Ulf can explain what it is that is _supposed_ to keep the mmc
probe order stable. Ulf?

             Linus

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ