linux-kernel - Re: NULL deref around blkmq in v4.0-rc1

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.LSU.2.20.1504091949370.4218@nerf40.vanv.qr>
Date:	Thu, 9 Apr 2015 20:24:30 +0200 (CEST)
From:	Jan Engelhardt <jengelh@...i.de>
To:	Linus Torvalds <torvalds@...ux-foundation.org>
cc:	"Rafael J. Wysocki" <rafael.j.wysocki@...el.com>,
	Jens Axboe <axboe@...nel.dk>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: NULL deref around blkmq in v4.0-rc1–rc7


On Thursday 2015-04-09 19:38, Linus Torvalds wrote:
>>
>> I reran bisect just to be sure.
>> It now shows v4.0-rc1~9 is bad, v4.0-rc1~9^1 is ok, and v4.0-rc~9^2 is
>> ok too. So this means that the combination of the both ~9 childs work
>> badly together.
>
>Ok, that's just _odd_.
>[...]
>So I get the feeling that the oops you are seeing is likely not
>consistent, and may depend on allocation patterns or similar.

It's fairly consistent (reproducible?). Only 1 in 15 or so (have not kept track
really) attempts does it not die.

With frame pointers:
BUG: unable to handle kernel paging request at 0000000000001000
IP: [<ffffffff812853c9>] scsi_init_cmd_errh+0x2a/0x62
PGD 0 
Oops: 0002 [#1] SMP 
Modules linked in: xfs crc32c_generic libcrc32c dm_crypt xts gf128mul algif_skcipher af_alg sd_mod mptsas scsi_transport_sas mptscsih mptbase dm_mod sg ipv6
CPU: 0 PID: 403 Comm: kworker/u2:1 Not tainted 4.0.0-rc7+ #55
Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
task: ffff88007b686f60 ti: ffff88007bcb4000 task.ti: ffff88007bcb4000
RIP: 0010:[<ffffffff812853c9>]  [<ffffffff812853c9>] scsi_init_cmd_errh+0x2a/0x62
RSP: 0018:ffff88007bcb77a8  EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff88007bf8d800 RCX: 0000000000000018
RDX: ffff88007bf7ab70 RSI: 0000000000000000 RDI: 0000000000001000
RBP: ffff88007bcb77a8 R08: ffff88007beb9c40 R09: 0000000000000000
R10: 0000000000000000 R11: ffffea0001fe17c0 R12: ffff88007bf7ab70
R13: 0000000000000000 R14: ffff88007bf8d800 R15: ffff88007bf7aa00
FS:  0000000000000000(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000001000 CR3: 000000007cb0d000 CR4: 00000000000007f0
Stack:
 ffff88007bcb7818 ffffffff81286d59 ffff88007b686f60 ffff88007bc24000
 ffff88007bf7ab78 ffff88007bf8d968 ffff88007be56c00 ffff88007bc24000
 ffff88007cbfb400 ffff88007bcb7850 ffff88007be56c08 ffff88007bf7aa00
Call Trace:
 [<ffffffff81286d59>] scsi_queue_rq+0x2e8/0x3d2
 [<ffffffff8119e64d>] __blk_mq_run_hw_queue+0x19b/0x2a2
 [<ffffffff8119e901>] ? blk_mq_merge_queue_io+0x75/0x147
 [<ffffffffa00fa34a>] ? __xfs_get_blocks+0x2f9/0x2f9 [xfs]
 [<ffffffff8119edeb>] blk_mq_run_hw_queue+0x4f/0x99
 [<ffffffff8119fab9>] blk_sq_make_request+0x163/0x170
 [<ffffffff81196a8b>] generic_make_request+0x97/0xd6
 [<ffffffff81196bd7>] submit_bio+0x10d/0x12c
 [<ffffffff810d5e15>] ? __lru_cache_add+0x1e/0x3f
 [<ffffffff81142af5>] mpage_bio_submit+0x25/0x2c
 [<ffffffff8114387d>] mpage_readpages+0xf8/0x10c
 [<ffffffffa00fa34a>] ? __xfs_get_blocks+0x2f9/0x2f9 [xfs]
 [<ffffffffa00f9c45>] xfs_vm_readpages+0x18/0x1a [xfs]
 [<ffffffff810d4e5c>] __do_page_cache_readahead+0x137/0x1d3
 [<ffffffff810d5102>] ondemand_readahead+0x20a/0x21b
 [<ffffffff810d5262>] page_cache_sync_readahead+0x38/0x3a
 [<ffffffff810cd1c5>] generic_file_read_iter+0x191/0x4fb
 [<ffffffffa010b2cf>] ? xfs_ilock+0x32/0x5d [xfs]
 [<ffffffffa01023c2>] xfs_file_read_iter+0x1c2/0x213 [xfs]
 [<ffffffff81118e63>] new_sync_read+0x74/0x98
 [<ffffffff81119aef>] __vfs_read+0x14/0x3b
 [<ffffffff81119b8a>] vfs_read+0x74/0xc1
 [<ffffffff8111d977>] kernel_read+0x3c/0x4a
 [<ffffffff8111dbd2>] prepare_binprm+0x117/0x11f
 [<ffffffff8111f10d>] do_execveat_common.isra.31+0x3b2/0x5d8
 [<ffffffff8111f35a>] do_execve+0x27/0x29
 [<ffffffff81050e07>] ____call_usermodehelper+0x10a/0x138
 [<ffffffff81050cfd>] ? call_usermodehelper+0x49/0x49
 [<ffffffff8133b3d8>] ret_from_fork+0x58/0x90
 [<ffffffff81050cfd>] ? call_usermodehelper+0x49/0x49
Code: c3 55 48 89 fa 48 c7 87 b0 00 00 00 00 00 00 00 c7 87 f4 00 00 00 00 00 00 00 48 8b bf 10 01 00 00 31 c0 b9 18 00 00 00 48 89 e5 <f3> ab 66 83 ba cc 00 00 00 00 75 2a 48 8b 8a d8 00 00 00 8a 01 
RIP  [<ffffffff812853c9>] scsi_init_cmd_errh+0x2a/0x62
 RSP <ffff88007bcb77a8>
CR2: 0000000000001000
---[ end trace fbec0fe487830b1d ]---



>and %rdi is 0x1000. It seems to be simply
>
>         memset(cmd->sense_buffer, 0, SCSI_SENSE_BUFFERSIZE);
>
>where 'cmd->sense_buffer' has some insane value ("PAGE_SIZE" or just a
>flipped bit, or whatever)

Having been observed on two isolated different systems, I don't
think so much that it would be a broken HW-induced bitflip.

Oh yeah, if anybody likes, I can hand out the virtualbox image.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/