[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAMn1gO5mQ2WPs9B9jN91T90Qxf3k6eK-GeBUhs=YqmkZu4NKFg@mail.gmail.com>
Date: Sat, 3 Apr 2021 13:40:23 -0700
From: Peter Collingbourne <pcc@...gle.com>
To: Marco Elver <elver@...gle.com>
Cc: Dmitry Vyukov <dvyukov@...gle.com>,
Alexander Potapenko <glider@...gle.com>,
Evgenii Stepanov <eugenis@...gle.com>,
Andrey Konovalov <andreyknvl@...il.com>,
Linux Memory Management List <linux-mm@...ck.org>,
LKML <linux-kernel@...r.kernel.org>,
Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [PATCH] kfence: unpoison pool region before use
On Sat, Apr 3, 2021 at 3:03 AM Marco Elver <elver@...gle.com> wrote:
>
> On Sat, 3 Apr 2021 at 07:13, Peter Collingbourne <pcc@...gle.com> wrote:
> > If the memory region allocated by KFENCE had previously been poisoned,
> > any validity checks done using kasan_byte_accessible() will fail. Fix
> > it by unpoisoning the memory before using it as the pool region.
> >
> > Link: https://linux-review.googlesource.com/id/I0af99e9f1c25eaf7e1ec295836b5d148d76940c5
> > Signed-off-by: Peter Collingbourne <pcc@...gle.com>
>
> Thanks, at a high level this seems reasonable, because we always want
> to ensure that KFENCE memory remains unpoisoned with KASAN on. FWIW I
> subjected a config with KFENCE+KASAN (generic, SW_TAGS, and HW_TAGS)
> to syzkaller testing and ran kfence_test:
>
> Tested-by: Marco Elver <elver@...gle.com>
>
>
> However, it is unclear to me under which circumstances we actually
> need this, i.e. something would grab some memblock memory, somehow
> poison it, and then release the memory back during early boot (note,
> kfence_alloc_pool() is called before slab setup). If we can somehow
> understand what actually did this, perhaps it'd help tell us if this
> actually needs fixing in KFENCE or it's the other thing that needs a
> fix.
>
> Given all this is happening during really early boot, I'd expect no or
> very few calls to kasan_poison() until kfence_alloc_pool() is called.
> We can probably debug it more by having kasan_poison() do a "if
> (!__kfence_pool) dump_stack();" somewhere. Can you try this on the
> system where you can repro the problem? I tried this just now on the
> latest mainline kernel, and saw 0 calls until kfence_alloc_pool().
I looked into the issue some more, and it turned out that the memory
wasn't getting poisoned by kasan_poison() but rather by the calls to
kasan_map_populate() in kasan_init_shadow(). Starting with the patch
"kasan: initialize shadow to TAG_INVALID for SW_TAGS",
KASAN_SHADOW_INIT is set to 0xFE rather than 0xFF, which caused the
failure. The Android kernel branch for 5.10 (and the downstream kernel
I was working with) already have this patch, but it isn't in the
mainline kernel yet.
Now that I understand the cause of the issue, I can reproduce it using
the KFENCE unit tests on a db845c board, using both the Android 5.10
and mainline branches if I cherry-pick that change. Here's an example
crash from the unit tests (the failure was originally also observed
from ksize in the downstream kernel):
[ 46.692195][ T175] BUG: KASAN: invalid-access in test_krealloc+0x1c4/0xf98
[ 46.699282][ T175] Read of size 1 at addr ffffff80e9e7b000 by task
kunit_try_catch/175
[ 46.707400][ T175] Pointer tag: [ff], memory tag: [fe]
[ 46.712710][ T175]
[ 46.714955][ T175] CPU: 4 PID: 175 Comm: kunit_try_catch Tainted:
G B 5.12.0-rc5-mainline-09505-ga2ab5b26d445-dirty #1
[ 46.727193][ T175] Hardware name: Thundercomm Dragonboard 845c (DT)
[ 46.733636][ T175] Call trace:
[ 46.736841][ T175] dump_backtrace+0x0/0x2f8
[ 46.741295][ T175] show_stack+0x2c/0x3c
[ 46.745388][ T175] dump_stack+0x124/0x1bc
[ 46.749668][ T175] print_address_description+0x7c/0x308
[ 46.755178][ T175] __kasan_report+0x1a8/0x398
[ 46.759816][ T175] kasan_report+0x50/0x7c
[ 46.764103][ T175] __kasan_check_byte+0x3c/0x54
[ 46.768916][ T175] ksize+0x4c/0x94
[ 46.772573][ T175] test_krealloc+0x1c4/0xf98
[ 46.777108][ T175] kunit_try_run_case+0x94/0x1c4
[ 46.781990][ T175] kunit_generic_run_threadfn_adapter+0x30/0x44
[ 46.788196][ T175] kthread+0x20c/0x234
[ 46.792213][ T175] ret_from_fork+0x10/0x30
Since "kasan: initialize shadow to TAG_INVALID for SW_TAGS" hasn't
landed in mainline yet, it seems like we should insert this patch
before that one rather than adding a Fixes: tag.
Peter
Powered by blists - more mailing lists