linux-kernel - Re: NULL deref around xfs in v4.0-rc1

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CA+55aFwFPnPfwKwaP-vPzF+On5m+6_Fs7zapn=X8C=HG6f4kPA@mail.gmail.com>
Date:	Thu, 9 Apr 2015 10:38:20 -0700
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	Jan Engelhardt <jengelh@...i.de>,
	"Rafael J. Wysocki" <rafael.j.wysocki@...el.com>,
	Jens Axboe <axboe@...nel.dk>
Cc:	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: NULL deref around xfs in v4.0-rc1–rc7

On Wed, Apr 8, 2015 at 8:20 AM, Jan Engelhardt <jengelh@...i.de> wrote:
> On Wednesday 2015-04-08 15:41, Jan Engelhardt wrote:
>
>>Starting somewhere around v4.0-rc1 and persisting through commit
>>v4.0-rc7, there is a new NULL deference apparently happening in
>>conjunction with xfs. This inhibits this machine's booting,
>>as xfs is used for the root filesystem.
>>
>>First bisection points at first-bad commit v4.0-rc1~8, and since that is
>>a merge commit, I'll be investigating some more hand-chosen commits (and
>>then people to Cc) as we speak.
>
> I reran bisect just to be sure.
> It now shows v4.0-rc1~9 is bad, v4.0-rc1~9^1 is ok, and v4.0-rc~9^2 is
> ok too. So this means that the combination of the both ~9 childs work
> badly together.

Ok, that's just _odd_.

That v4.0-rc1~9 is just the pm+acpi merge, and has absolutely nothing
to do with XFS or the block code. In fact, looking at the diff from
it's direct parent, it doesn't even really change any relevant code.

So I get the feeling that the oops you are seeing is likely not
consistent, and may depend on allocation patterns or similar. Because
the bisect doesn't make any sense at all.

It looks much more like a pure block-mq bug, but one that needs some
very special condition to trigger.

Jens, does this look familiar or trigger any ideas:

    BUG: unable to handle kernel paging request at 0000000000001000
    IP: [<ffffffff812718d0>] scsi_init_cmd_errh+0x26/0x5d

(The whole oops is on lkml).

Jan, can you reproduce the oops with frame pointers so that we get a
better call trace? Although it looks fairly normal: the trapping code
is

    rep stos %eax,%es:*(%rdi)

and %rdi is 0x1000. It seems to be simply

         memset(cmd->sense_buffer, 0, SCSI_SENSE_BUFFERSIZE);

where 'cmd->sense_buffer' has some insane value ("PAGE_SIZE" or just a
flipped bit, or whatever)

Jens?

                              Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/