linux-kernel - Re: [GIT PULL] block fixes for 3.1-rc

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <1317221858.19034.29.camel@dabdike.hansenpartnership.com>
Date:	Wed, 28 Sep 2011 14:57:41 +0000
From:	James Bottomley <jbottomley@...allels.com>
To:	Alan Stern <stern@...land.harvard.edu>
CC:	Jens Axboe <axboe@...nel.dk>,
	Rocko Requin <rockorequin@...mail.com>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Hannes Reinecke <hare@...e.de>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [GIT PULL] block fixes for 3.1-rc

On Wed, 2011-09-28 at 10:52 -0400, Alan Stern wrote:
> On Tue, 27 Sep 2011, Jens Axboe wrote:
> 
> > On 2011-09-27 17:52, Linus Torvalds wrote:
> > > On Wed, Sep 21, 2011 at 5:19 AM, Jens Axboe <axboe@...nel.dk> wrote:
> > >>
> > >> Final round of patches for 3.1.
> > > 
> > > Apparently better not.
> > > 
> > > The "block layer oopses on USB device removal" is still there, it seems.
> > > 
> > > I can even find a patch from it from Alan Stern:
> > > 
> > >    https://lkml.org/lkml/2011/9/18/63
> > > 
> > > and the reason I found that was that my wife's machine just saw what
> > > looks very much like that bug in elv_put_request().
> > > 
> > > The call chain on that particular machine was:
> > > 
> > >  - __blk_put_request
> > >   blk_put_request
> > >   scsi_execute
> > >   scsi_execute_req
> > >   sd_check_events
> > >   disk_events_workfn
> > >   process_one_work
> > > 
> > > in one of the kthread helpers. It sounds like something either
> > > generates disk events after the unplug event (despite a "safely
> > > remove" thing), or doesn't properly wait for the disk events to have
> > > flushed before the elevator is cleared.
> > > 
> > > The "things go oops at USB removal" reports have been with us for a
> > > *loong* time now. Can we please get this fixed already, and have
> > > somebody really look at it?
> > > 
> > > And if you can't figure out why it happens, at least apply Alan's
> > > patch (or ack it).
> > 
> > The whole thing is a bit of a mess, it was introduced by changes meant
> > to clean it up, which didn't get to the root of the problem (and
> > seemingly only made it worse). We need the queue clearly referenced and
> > released, not just pointed to. That would be the more invasive and real
> > fix. I will apply Alan's fix for a happier 3.1.
> 
> You guys should be asking the person who first reported the most recent 
> version of this bug and is able to reproduce it easily.
> 
> Rocko has already tested Hannes's patch in
> 
> 	http://marc.info/?l=linux-scsi&m=131669751909474&w=2
> 
> successfully.  The only difference between it and James's patch in
> 
> 	http://marc.info/?l=linux-kernel&m=131300594629839
> 
> is the assignment to q->queue_lock, which doesn't appear to be
> essential in the SCSI case.  (Furthermore, Hannes's patch makes an
> unnecessary test before doing the assignment, which is inelegant.)

I'm not so sure about the inelegance:  The CPU guys have been drilling
into us that for the if (a!=b) a=b; is useful to avoid dirtying cache
lines (provided a==b is likely to be true for some of the cases).

James