linux-kernel - Re: [GIT PULL] block fixes for 3.1-rc

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Wed, 28 Sep 2011 10:52:23 -0400 (EDT)
From:	Alan Stern <stern@...land.harvard.edu>
To:	Jens Axboe <axboe@...nel.dk>,
	Rocko Requin <rockorequin@...mail.com>
cc:	Linus Torvalds <torvalds@...ux-foundation.org>,
	James Bottomley <jbottomley@...allels.com>,
	Hannes Reinecke <hare@...e.de>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [GIT PULL] block fixes for 3.1-rc

On Tue, 27 Sep 2011, Jens Axboe wrote:

> On 2011-09-27 17:52, Linus Torvalds wrote:
> > On Wed, Sep 21, 2011 at 5:19 AM, Jens Axboe <axboe@...nel.dk> wrote:
> >>
> >> Final round of patches for 3.1.
> > 
> > Apparently better not.
> > 
> > The "block layer oopses on USB device removal" is still there, it seems.
> > 
> > I can even find a patch from it from Alan Stern:
> > 
> >    https://lkml.org/lkml/2011/9/18/63
> > 
> > and the reason I found that was that my wife's machine just saw what
> > looks very much like that bug in elv_put_request().
> > 
> > The call chain on that particular machine was:
> > 
> >  - __blk_put_request
> >   blk_put_request
> >   scsi_execute
> >   scsi_execute_req
> >   sd_check_events
> >   disk_events_workfn
> >   process_one_work
> > 
> > in one of the kthread helpers. It sounds like something either
> > generates disk events after the unplug event (despite a "safely
> > remove" thing), or doesn't properly wait for the disk events to have
> > flushed before the elevator is cleared.
> > 
> > The "things go oops at USB removal" reports have been with us for a
> > *loong* time now. Can we please get this fixed already, and have
> > somebody really look at it?
> > 
> > And if you can't figure out why it happens, at least apply Alan's
> > patch (or ack it).
> 
> The whole thing is a bit of a mess, it was introduced by changes meant
> to clean it up, which didn't get to the root of the problem (and
> seemingly only made it worse). We need the queue clearly referenced and
> released, not just pointed to. That would be the more invasive and real
> fix. I will apply Alan's fix for a happier 3.1.

You guys should be asking the person who first reported the most recent 
version of this bug and is able to reproduce it easily.

Rocko has already tested Hannes's patch in

	http://marc.info/?l=linux-scsi&m=131669751909474&w=2

successfully.  The only difference between it and James's patch in

	http://marc.info/?l=linux-kernel&m=131300594629839

is the assignment to q->queue_lock, which doesn't appear to be
essential in the SCSI case.  (Furthermore, Hannes's patch makes an
unnecessary test before doing the assignment, which is inelegant.)

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/