linux-kernel - Re: [cgroup_rmdir] BUG: unable to handle kernel paging request at ffff880210af6000

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20171107160600.ybrxhemvd4k2w7yb@wfg-t540p.sh.intel.com>
Date:   Wed, 8 Nov 2017 00:06:00 +0800
From:   Fengguang Wu <fengguang.wu@...el.com>
To:     Linus Torvalds <torvalds@...ux-foundation.org>
Cc:     Tejun Heo <tj@...nel.org>, Zefan Li <lizefan@...wei.com>,
        Roman Gushchin <guro@...com>, Waiman Long <longman@...hat.com>,
        "David S. Miller" <davem@...emloft.net>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [cgroup_rmdir] BUG: unable to handle kernel paging request at
 ffff880210af6000

On Tue, Nov 07, 2017 at 07:46:46AM -0800, Linus Torvalds wrote:
>On Tue, Nov 7, 2017 at 2:26 AM, Fengguang Wu <fengguang.wu@...el.com> wrote:
>>
>> FYI this happens in v4.14-rc8 -- it's not necessarily a new bug.
>
>.. in fact I don't think it's a bug at all. Not in the kernel, that is.
>
>> [  186.238181] BUG: unable to handle kernel paging request at ffff880210af6000
>> [  186.257107] IP: slob_free+0x1c4/0x276
>
>This looks like the same bug we saw earlier, which is due to a gcc bug.
>
>The trapping code disassembles to:
>
>   0: 8b 45 00              mov    0x0(%rbp),%eax
>   3: 41 be 01 00 00 00    mov    $0x1,%r14d
>   9: 48 89 ef              mov    %rbp,%rdi
>   c: 66 85 c0              test   %ax,%ax
>
>and the thing to note is that: "test %ax,%ax".
>
>It's testing a 16-bit value, but it *loads* a 32-bit one.
>
>It is supposed to load a 16-bit value from the last two bytes of the page:
>
>   RBP: ffff880210af5ffe
>
>but because it has turned the 16-bit load into a 32-bit load, it
>faults when accessing the next page.

That's too bad!

>It's hard to trigger, since you need to have the next page unmapped
>due to DEBUG_PAGEALLOC and have just the right allocations etc to make
>this happen, but clearly the 0day has gotten pretty good at triggering
>it.

0day hits 1 single occurrence by chance out of thousands of boots.
Such random noises have been troublesome for 0day maintenance.

It's good to know the caveats of old gcc -- now we can get rid of some
of our daily annoyance. :)

>Anyway, for now, I'd suggest 0day either:
>
> - upgrade the compiler (this is known to happen with 4.8 and 4.9 but
>apparently not 5.1)

We cover gcc 4.4 all the way up to 6. (Yet to add gcc-7 coverage.)
The old gcc's are kept mainly for test coverage.

So would you suggest to stop testing gcc 4.x? Or do so selectively
for the known broken combinations?

> - not use SLOB in the kernel configurations it tests

eg. disable SLOB for old gcc, or disable SLOB unconditionally?

>Honestly, I'd prefer the former, because apparently you use some
>ancient debian gcc version 4.8.4, and gcc these days is on 7.2.
>
>Apparently the ancient gcc version is causing problems with KASAN too.

Yeah, I just happily disabled KASAN when compiled with gcc < 4.9.

>Anyway, I will be ignoring the slob_free() reports for now, and you
>should too until the gcc version is fixed.

OK. Sorry for the noises and glad to get out of them!

Regards,
Fengguang