lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20151222034319.GF20661@kmo-pixel>
Date:	Mon, 21 Dec 2015 18:43:19 -0900
From:	Kent Overstreet <kent.overstreet@...il.com>
To:	Tejun Heo <tj@...nel.org>,
	"Artem S. Tashkinov" <t.artem@...lcity.com>
Cc:	Christoph Hellwig <hch@....de>, Ming Lin <ming.l@....samsung.com>,
	Jens Axboe <axboe@...com>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Steven Whitehouse <swhiteho@...hat.com>,
	IDE-ML <linux-ide@...r.kernel.org>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Ming Lei <tom.leiming@...il.com>
Subject: Re: IO errors after "block: remove bio_get_nr_vecs()"

I just reproduced it - Artem, I'll let you know when we have a possible fix but
hopefully there won't be any need for you to beat up your hardware any more :)

On Mon, Dec 21, 2015 at 04:08:11PM -0500, Tejun Heo wrote:
> Hello, again.
> 
> On Mon, Dec 21, 2015 at 03:07:21PM -0500, Tejun Heo wrote:
> > Hello, Artem.
> > 
> > Can you please apply the following patch on top and see whether
> > anything changes?  If it does make the issue go away, can you please
> > revert the ".can_queue" part and test again?
> 
> If the patch doesn't change anything, can you please try the
> followings and see which one makes difference?
> 
> 1. Exclude memory above 4G line with boot param "max_addr=4G".
> 2. Disable highmem with "highmem=0".
> 3. Try booting 64bit kernel.
> 
> At the moment, the only thing I can think of which can explain the PAE
> + bio_get_nr_vecs() situation is that the bio split code which is
> activated by the bio_get_nr_vecs() somehow messes up 64bit or high
> addresses on 32bit kernels.  I scanned for the obvious but at bio
> layer, memory is represented by struct page, so nothing obvious seems
> broken.
> 
> Note that for now I'm ignoring the debug dumps from the ahci debug
> patch which indicates that the passed in addresses are all fine.  It
> is possible that the controller gets confused with certain receiving
> addresses and reports failure on later commands or maybe there is a
> different sequence of events which can encompass both that we don't
> know of yet.

It repros pretty easily, I should be able to just write some test code that
sends down single specific IOs so no matter what the controller is doing we know
exactly what IO triggered the error.

I'm checking all the 64 bit/pae/etc. stuff now.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ