[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20171107160600.ybrxhemvd4k2w7yb@wfg-t540p.sh.intel.com>
Date: Wed, 8 Nov 2017 00:06:00 +0800
From: Fengguang Wu <fengguang.wu@...el.com>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Tejun Heo <tj@...nel.org>, Zefan Li <lizefan@...wei.com>,
Roman Gushchin <guro@...com>, Waiman Long <longman@...hat.com>,
"David S. Miller" <davem@...emloft.net>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [cgroup_rmdir] BUG: unable to handle kernel paging request at
ffff880210af6000
On Tue, Nov 07, 2017 at 07:46:46AM -0800, Linus Torvalds wrote:
>On Tue, Nov 7, 2017 at 2:26 AM, Fengguang Wu <fengguang.wu@...el.com> wrote:
>>
>> FYI this happens in v4.14-rc8 -- it's not necessarily a new bug.
>
>.. in fact I don't think it's a bug at all. Not in the kernel, that is.
>
>> [ 186.238181] BUG: unable to handle kernel paging request at ffff880210af6000
>> [ 186.257107] IP: slob_free+0x1c4/0x276
>
>This looks like the same bug we saw earlier, which is due to a gcc bug.
>
>The trapping code disassembles to:
>
> 0: 8b 45 00 mov 0x0(%rbp),%eax
> 3: 41 be 01 00 00 00 mov $0x1,%r14d
> 9: 48 89 ef mov %rbp,%rdi
> c: 66 85 c0 test %ax,%ax
>
>and the thing to note is that: "test %ax,%ax".
>
>It's testing a 16-bit value, but it *loads* a 32-bit one.
>
>It is supposed to load a 16-bit value from the last two bytes of the page:
>
> RBP: ffff880210af5ffe
>
>but because it has turned the 16-bit load into a 32-bit load, it
>faults when accessing the next page.
That's too bad!
>It's hard to trigger, since you need to have the next page unmapped
>due to DEBUG_PAGEALLOC and have just the right allocations etc to make
>this happen, but clearly the 0day has gotten pretty good at triggering
>it.
0day hits 1 single occurrence by chance out of thousands of boots.
Such random noises have been troublesome for 0day maintenance.
It's good to know the caveats of old gcc -- now we can get rid of some
of our daily annoyance. :)
>Anyway, for now, I'd suggest 0day either:
>
> - upgrade the compiler (this is known to happen with 4.8 and 4.9 but
>apparently not 5.1)
We cover gcc 4.4 all the way up to 6. (Yet to add gcc-7 coverage.)
The old gcc's are kept mainly for test coverage.
So would you suggest to stop testing gcc 4.x? Or do so selectively
for the known broken combinations?
> - not use SLOB in the kernel configurations it tests
eg. disable SLOB for old gcc, or disable SLOB unconditionally?
>Honestly, I'd prefer the former, because apparently you use some
>ancient debian gcc version 4.8.4, and gcc these days is on 7.2.
>
>Apparently the ancient gcc version is causing problems with KASAN too.
Yeah, I just happily disabled KASAN when compiled with gcc < 4.9.
>Anyway, I will be ignoring the slob_free() reports for now, and you
>should too until the gcc version is fixed.
OK. Sorry for the noises and glad to get out of them!
Regards,
Fengguang
Powered by blists - more mailing lists