linux-kernel - Re: IO errors after "block: remove bio_get_nr

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Mon, 21 Dec 2015 16:08:11 -0500
From:	Tejun Heo <tj@...nel.org>
To:	"Artem S. Tashkinov" <t.artem@...os.com>
Cc:	"Artem S. Tashkinov" <t.artem@...lcity.com>,
	Kent Overstreet <kent.overstreet@...il.com>,
	Christoph Hellwig <hch@....de>,
	Ming Lin <ming.l@....samsung.com>, Jens Axboe <axboe@...com>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Steven Whitehouse <swhiteho@...hat.com>,
	IDE-ML <linux-ide@...r.kernel.org>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Ming Lei <tom.leiming@...il.com>
Subject: Re: IO errors after "block: remove bio_get_nr_vecs()"

Hello, again.

On Mon, Dec 21, 2015 at 03:07:21PM -0500, Tejun Heo wrote:
> Hello, Artem.
> 
> Can you please apply the following patch on top and see whether
> anything changes?  If it does make the issue go away, can you please
> revert the ".can_queue" part and test again?

If the patch doesn't change anything, can you please try the
followings and see which one makes difference?

1. Exclude memory above 4G line with boot param "max_addr=4G".
2. Disable highmem with "highmem=0".
3. Try booting 64bit kernel.

At the moment, the only thing I can think of which can explain the PAE
+ bio_get_nr_vecs() situation is that the bio split code which is
activated by the bio_get_nr_vecs() somehow messes up 64bit or high
addresses on 32bit kernels.  I scanned for the obvious but at bio
layer, memory is represented by struct page, so nothing obvious seems
broken.

Note that for now I'm ignoring the debug dumps from the ahci debug
patch which indicates that the passed in addresses are all fine.  It
is possible that the controller gets confused with certain receiving
addresses and reports failure on later commands or maybe there is a
different sequence of events which can encompass both that we don't
know of yet.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/