lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 23 May 2012 19:22:20 -0700
From:	Dan Williams <dan.j.williams@...el.com>
To:	James Bottomley <jejbbe@...senpartnership.com>
Cc:	David Miller <davem@...emloft.net>, mroos@...ux.ee,
	linux-kernel@...r.kernel.org, linux-scsi@...r.kernel.org,
	stern@...land.harvard.edu
Subject: Re: 3.4.0-02580-g72c04af regression on sparc64 - partitions not
 recognized

On Wed, 2012-05-23 at 23:56 +0100, James Bottomley wrote:
> On Wed, 2012-05-23 at 14:04 -0400, David Miller wrote:
> > From: Meelis Roos <mroos@...ux.ee>
> > Date: Wed, 23 May 2012 19:46:46 +0300 (EEST)
> > 
> > CC:'ing interested parties.
> > 
> > >> > Just tested 3.4.0-02580-g72c04af on about 10 machines. While most of 
> > >> > them work (including 3 different sparc64 machines with real scsi disks), 
> > >> > Sun Netra X1 with pata_ali and IDE disk consistently fails to boot. sda 
> > >> > is recognized but no partitions. 3.3.0 works fine, as did something 
> > >> > around 3.4-rc7 (plain 3.4 not tested yet). No other IDE machines tested 
> > >> > yet since I have none with remote console at the moment.
> > >> 
> > >> If 3.4.0-final is OK, start bisecting from v3.4.0 until 72c04af.  One
> > >> possibility could be the sparc64 NOBOOTMEM conversion that went into
> > >> the merge window.
> > > 
> > > Bisecting leads to this commit:
> > > 
> > > a7a20d103994fd760766e6c9d494daa569cbfe06 is the first bad commit
> > > commit a7a20d103994fd760766e6c9d494daa569cbfe06
> > > Author: Dan Williams <dan.j.williams@...el.com>
> > > Date:   Thu Mar 22 17:05:11 2012 -0700
> > > 
> > >     [SCSI] sd: limit the scope of the async probe domain
> 
> My theory is that this is an init problem: The assumption in a lot of
> our code is that async_synchronize_full() waits for everything ... even
> the domain specific async schedules, which isn't true.
> 
> The code in init that makes this assumption is wait_for_device_probe().
> There's also a fun async_synchronize_full() in init_post() that assumes
> it can free the init memory after, which would fail badly if anything in
> init used an async domain.
> 
> So either we fix the assumptions or we can't use domain specific async
> schedules.
> 

Hm, we already have cases of code not trusting the semantics of
wait_for_device_probe(), especially as it relates to async scanning like
in kernel/power/hibernate.c:

                /*
                 * Some device discovery might still be in progress; we need
                 * to wait for this to finish.
                 */
                wait_for_device_probe();

                if (resume_wait) {
                        while ((swsusp_resume_device = name_to_dev_t(resume_file)) == 0)
                                msleep(10);
                        async_synchronize_full();
                }

                /*
                 * We can't depend on SCSI devices being available after loading
                 * one of their modules until scsi_complete_async_scans() is
                 * called and the resume device usually is a SCSI one.
                 */
                scsi_complete_async_scans();


...so it seems scsi_complete_async_scans() should take care to flush sd
probe actions as well... here is a test patch:

--- snip ---

diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
index 8906557..05a92d3 100644
--- a/drivers/scsi/scsi_scan.c
+++ b/drivers/scsi/scsi_scan.c
@@ -141,13 +141,13 @@ struct async_scan_data {
  * started scanning after this function was called may or may not have
  * finished.
  */
-int scsi_complete_async_scans(void)
+static void __scsi_complete_async_scans(void)
 {
        struct async_scan_data *data;
 
        do {
                if (list_empty(&scanning_hosts))
-                       return 0;
+                       return;
                /* If we can't get memory immediately, that's OK.  Just
                 * sleep a little.  Even if we never get memory, the async
                 * scans will finish eventually.
@@ -181,6 +181,13 @@ int scsi_complete_async_scans(void)
        spin_unlock(&async_scan_lock);
 
        kfree(data);
+}
+
+int scsi_complete_async_scans(void)
+{
+       __scsi_complete_async_scans();
+       async_synchronize_full_domain(&scsi_sd_probe_domain);
+
        return 0;
 }
 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists