linux-kernel - Re: [RFC PATCH v2 5/8] mseal: Check seal flag for munmap(2)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CALmYWFvgM7DOihdUpUC5SREhUMn9t53HYCX+YioeHHhLSD1KHw@mail.gmail.com>
Date:   Wed, 18 Oct 2023 10:14:02 -0700
From:   Jeff Xu <jeffxu@...gle.com>
To:     Linus Torvalds <torvalds@...ux-foundation.org>
Cc:     jeffxu@...omium.org, akpm@...ux-foundation.org,
        keescook@...omium.org, jannh@...gle.com, sroettger@...gle.com,
        willy@...radead.org, gregkh@...uxfoundation.org,
        jorgelo@...omium.org, groeck@...omium.org,
        linux-kernel@...r.kernel.org, linux-kselftest@...r.kernel.org,
        linux-mm@...ck.org, surenb@...gle.com, alex.sierra@....com,
        apopple@...dia.com, aneesh.kumar@...ux.ibm.com,
        axelrasmussen@...gle.com, ben@...adent.org.uk,
        catalin.marinas@....com, david@...hat.com, dwmw@...zon.co.uk,
        ying.huang@...el.com, hughd@...gle.com, joey.gouly@....com,
        corbet@....net, wangkefeng.wang@...wei.com,
        Liam.Howlett@...cle.com, lstoakes@...il.com, mawupeng1@...wei.com,
        linmiaohe@...wei.com, namit@...are.com, peterx@...hat.com,
        peterz@...radead.org, ryan.roberts@....com, shr@...kernel.io,
        vbabka@...e.cz, xiujianfeng@...wei.com, yu.ma@...el.com,
        zhangpeng362@...wei.com, dave.hansen@...el.com, luto@...nel.org,
        linux-hardening@...r.kernel.org
Subject: Re: [RFC PATCH v2 5/8] mseal: Check seal flag for munmap(2)

On Wed, Oct 18, 2023 at 8:08 AM Jeff Xu <jeffxu@...gle.com> wrote:
>
> On Tue, Oct 17, 2023 at 9:54 AM Linus Torvalds
> <torvalds@...ux-foundation.org> wrote:
> >
> > On Tue, 17 Oct 2023 at 02:08, <jeffxu@...omium.org> wrote:
> > >
> > > Of all the call paths that call into do_vmi_munmap(),
> > > this is the only place where checkSeals = MM_SEAL_MUNMAP.
> > > The rest has checkSeals = 0.
> >
> > Why?
> >
> > None of this makes sense.
> >
> > So you say "we can't munmap in this *one* place, but all others ignore
> > the sealing".
> >
> I apologize that previously, I described what this code does, and not reasoning.
>
> In our threat model, as Stephen Röttger point out in [1], and I quote:
>
> V8 exploits typically follow a similar pattern: an initial bug leads
> to memory corruption but often the initial corruption is limited and
> the attacker has to find a way to arbitrarily read/write in the whole
> address space.
>
> The memory correction is in the user space process, e.g. Chrome.
> Attackers will try to modify permission of the memory, by calling
> mprotect,  or munmap then mmap to the same address but with different
> permission, etc.
>
> Sealing blocks mprotect/munmap/mremap/mmap call from the user space
> process, e.g. Chrome.
>
> At time of handling those 4 syscalls, we need to check the seal (
> can_modify_mm), this requires locking the VMA (
> mmap_write_lock_killable), and ideally, after validating the syscall
> input. The reasonable place for can_modify_mm() is from utility
> functions, such as do_mmap(), do_vmi_munmap(), etc.
>
> However, there is no guarantee that do_mmap() and do_vmi_munmap() are
> only reachable from mprotect/munmap/mremap/mmap syscall entry point
> (SYSCALL_DEFINE_XX). In theory,  the kernel can call those in other
> scenarios, and some of them can be perfectly legit. Those other
> scenarios are not covered by our threat model at this time. Therefore,
> we need a flag, passed from the SYSCALL_DEFINE_XX entry , down to
> can_modify_mm(), to differentiate those other scenarios.
>
> Now, back to code, it did some optimization, i.e. doesn't pass the
> flag from SYSCALL_DEFINE_XX  in all cases. If SYSCALL_DEFINE_XX calls
> do_a, and do_a has only one caller, I will set the flag in do_a,
> instead of SYSCALL_DEFINE_XX. Doing this reduces the size of the
> patchset, but it also makes the code less readable indeed. I could
> remove this optimization in V3. I welcome suggestions to improve
> readability on this.
>
> When handing the mmap/munmap/mremap/mmap, once the code passed
> can_modify_mm(), it means the memory area is not sealed, if the code
> continues to call the other utility functions, we don't need to check
> the seal again. This is the case for mremap(), the seal of src address
> and dest address (when applicable) are checked first, later when the
> code calls  do_vmi_munmap(), it no longer needs to check the seal
> again.
>
> [1] https://v8.dev/blog/control-flow-integrity
>
> -Jeff

There is also alternative approach:

For all the places that call do_vmi_munmap(), find out which
case should ignore the sealing flag legitimately, set an ignore_seal
flag and pass it down into do_vmi_munmap(). For the rest case,
use default behavior.

All future API will automatically be covered for sealing, by using default.

The risky side, if I missed a case that requires setting ignore_seal,
there will be a bug.

Also if a driver calls the utility functions to unmap a memory, the
seal will be checked as well. (Driver is not in our threat model,
but Chrome probably doesn't mind it.)

Which of those two approaches are better ? I appreciate the direction on this.

Thanks!
-Jeff


-Jeff