linux-kernel - Re: Possible incorrect handling of fault injection inside KMSAN instrumentation

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAG_fn=V57m0om5HUHHFOQr9R9TWHtfm4+jO96Smf+Q+XjRkxtQ@mail.gmail.com>
Date:   Wed, 12 Apr 2023 16:39:05 +0200
From:   Alexander Potapenko <glider@...gle.com>
To:     Dipanjan Das <mail.dipanjan.das@...il.com>
Cc:     Marco Elver <elver@...gle.com>, Dmitry Vyukov <dvyukov@...gle.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        kasan-dev@...glegroups.com, linux-mm@...ck.org,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        syzkaller <syzkaller@...glegroups.com>,
        Marius Fleischer <fleischermarius@...glemail.com>,
        Priyanka Bose <its.priyanka.bose@...il.com>
Subject: Re: Possible incorrect handling of fault injection inside KMSAN instrumentation

On Sat, Apr 8, 2023 at 5:51 PM Dipanjan Das <mail.dipanjan.das@...il.com> wrote:
>
> Hi,

Hi Dipanjan, thanks a lot for the elaborate analysis!


> kmsan's allocation of shadow or origin memory in
> kmsan_vmap_pages_range_noflush() fails silently due to fault injection
> (FI). KMSAN sort of “swallows” the allocation failure, and moves on.
> When either of them is later accessed while updating the metadata,
> there are no checks to test the validity of the respective pointers,
> which results in a page fault.

You are absolutely right.

> Our conclusions/Questions:
>
> - Should KMSAN fail silently? Probably not. Otherwise, the
> instrumentation always needs to check whether shadow/origin memory
> exists.

KMSAN shouldn't fail silently in any case.
kmsan_vmap_pages_range_noflush() used to have KMSAN_WARN_ON() to catch
such cases, but unfortunately I've failed to check the return values
of the kcalloc() calls.

> - Should KMSAN even be tested using fault injection? We are not sure.

At least our deployment of KMSAN on syzbot uses fault injection, so
having the two play well together is important.

> On one hand, the primary purpose of FI should be testing the
> application code. But also, inducing faults inside instrumentation
> clearly helps to find mistakes in that, too.

At first I had an idea of having a special GFP flag that prohibits
fault injections for the tool's allocations.
But this would just shift the allocations failures right, making them
harder to detect, because they will occur less often.
We'd better handle the failures properly instead.

> - What is a fix for this? Should a failure in the KMSAN
> instrumentation be propagated up so that the kernel allocator
> (vzalloc() in this case) can “pretend” to fail, too?

Yes, I think so.
Here are two patches that fix the problem:
 - https://github.com/google/kmsan/commit/b793a6d5a1c1258326b0f53d6e3ac8aa3eeb3499
- for kmsan_vmap_pages_range_noflush();
 - https://github.com/google/kmsan/commit/cb9e33e0cd7ff735bc302ff69c02274f24060cff
- for kmsan_ioremap_page_range()

Can you please try them out?

Alex