lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1448030316.4067.18.camel@localhost.localdomain>
Date:	Fri, 20 Nov 2015 09:38:36 -0500
From:	Ewan Milne <emilne@...hat.com>
To:	Hannes Reinecke <hare@...e.de>
Cc:	Christoph Hellwig <hch@...radead.org>,
	Michael Ellerman <mpe@...erman.id.au>,
	Mark Salter <msalter@...hat.com>,
	"James E. J. Bottomley" <JBottomley@...n.com>,
	brking <brking@...ibm.com>, linux-scsi@...r.kernel.org,
	linux-kernel@...r.kernel.org, linuxppc-dev@...ts.ozlabs.org,
	linux-block@...r.kernel.org
Subject: Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!

On Thu, 2015-11-19 at 16:35 +0100, Hannes Reinecke wrote:
> On 11/19/2015 09:23 AM, Christoph Hellwig wrote:
> > It's pretty much guaranteed a block layer bug, most likely in the
> > merge bios to request infrastucture where we don't obey the merging
> > limits properly.
> > 
> > Does either of you have a known good and first known bad kernel?
> 
> Well, I have been fighting a similar issue for several months now,
> albeit with multipath enabled. Haven't had much progress with this,
> sadly.
> Seeing that this is our distro kernel it might or might not be
> related; however, as the symptoms are identical there still is a
> chance that this is actually a generic block-layer problem.
> 
> Cheers,
> 
> Hannes

We have seen this also.  (e.g.  req->nr_phys_segments was 3, but
blk_rq_map_sg() returned 4.)  I was suspicious of the patch:

bio: modify __bio_add_page() to accept pages that don't start a new segment

But we put some debugging code in and didn't hit it.  We haven't
found the problem yet, either, though.  We're still looking.

As Christoph said, it would seem to be a problem with the block layer
merging.

The API for this seems defective, in that blk_rq_map_sg() should
never be returning a value indicating that it overwrote past the
end of the supplied SG array and depend on the caller to check it.
(We could get data corruption on another I/O if it used adjacent
memory for a different SG list, for example.)

-Ewan





--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ