lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1337813770.3013.37.camel@dabdike.int.hansenpartnership.com>
Date:	Wed, 23 May 2012 23:56:10 +0100
From:	James Bottomley <jejbbe@...senpartnership.com>
To:	David Miller <davem@...emloft.net>
Cc:	mroos@...ux.ee, linux-kernel@...r.kernel.org,
	linux-scsi@...r.kernel.org, dan.j.williams@...el.com,
	stern@...land.harvard.edu
Subject: Re: 3.4.0-02580-g72c04af regression on sparc64 - partitions not
 recognized

On Wed, 2012-05-23 at 14:04 -0400, David Miller wrote:
> From: Meelis Roos <mroos@...ux.ee>
> Date: Wed, 23 May 2012 19:46:46 +0300 (EEST)
> 
> CC:'ing interested parties.
> 
> >> > Just tested 3.4.0-02580-g72c04af on about 10 machines. While most of 
> >> > them work (including 3 different sparc64 machines with real scsi disks), 
> >> > Sun Netra X1 with pata_ali and IDE disk consistently fails to boot. sda 
> >> > is recognized but no partitions. 3.3.0 works fine, as did something 
> >> > around 3.4-rc7 (plain 3.4 not tested yet). No other IDE machines tested 
> >> > yet since I have none with remote console at the moment.
> >> 
> >> If 3.4.0-final is OK, start bisecting from v3.4.0 until 72c04af.  One
> >> possibility could be the sparc64 NOBOOTMEM conversion that went into
> >> the merge window.
> > 
> > Bisecting leads to this commit:
> > 
> > a7a20d103994fd760766e6c9d494daa569cbfe06 is the first bad commit
> > commit a7a20d103994fd760766e6c9d494daa569cbfe06
> > Author: Dan Williams <dan.j.williams@...el.com>
> > Date:   Thu Mar 22 17:05:11 2012 -0700
> > 
> >     [SCSI] sd: limit the scope of the async probe domain

My theory is that this is an init problem: The assumption in a lot of
our code is that async_synchronize_full() waits for everything ... even
the domain specific async schedules, which isn't true.

The code in init that makes this assumption is wait_for_device_probe().
There's also a fun async_synchronize_full() in init_post() that assumes
it can free the init memory after, which would fail badly if anything in
init used an async domain.

So either we fix the assumptions or we can't use domain specific async
schedules.

James


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ