linux-kernel - Re: [PATCH v2 0/6] mm: Optimize mseal checks

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CABi2SkXPRr6Tc_=KQQO3swC78T18wd7S3E=EQ7eD4rbpBpqzNA@mail.gmail.com>
Date: Fri, 16 Aug 2024 10:30:05 -0700
From: Jeff Xu <jeffxu@...omium.org>
To: Pedro Falcato <pedro.falcato@...il.com>
Cc: Jeff Xu <jeffxu@...gle.com>, Andrew Morton <akpm@...ux-foundation.org>, 
	"Liam R. Howlett" <Liam.Howlett@...cle.com>, Vlastimil Babka <vbabka@...e.cz>, 
	Lorenzo Stoakes <lorenzo.stoakes@...cle.com>, linux-mm@...ck.org, 
	linux-kernel@...r.kernel.org, oliver.sang@...el.com, 
	torvalds@...ux-foundation.org, Michael Ellerman <mpe@...erman.id.au>
Subject: Re: [PATCH v2 0/6] mm: Optimize mseal checks

Hi Pedro,

(resent,  previous email has html link)

On Thu, Aug 15, 2024 at 6:58 PM Pedro Falcato <pedro.falcato@...il.com> wrote:
>
> On Thu, Aug 15, 2024 at 11:10 PM Jeff Xu <jeffxu@...omium.org> wrote:
> >
> > Hi Andrew, Pedro
> >
> > On Thu, Aug 8, 2024 at 6:03 PM Jeff Xu <jeffxu@...gle.com> wrote:
> > >
> > > On Thu, Aug 8, 2024 at 5:34 PM Pedro Falcato <pedro.falcato@...il.com> wrote:
> > > >
> > > > On Fri, Aug 9, 2024 at 12:12 AM Andrew Morton <akpm@...ux-foundation.org> wrote:
> > > > >
> > > > > On Wed,  7 Aug 2024 22:13:03 +0100 Pedro Falcato <pedro.falcato@...il.com> wrote:
> > > > >
> > > > > > This series also depends on (and will eventually very slightly conflict with)
> > > > > > the powerpc series that removes arch_unmap[2].
> > > > >
> > > > > That's awkward.  Please describe the dependency?
> > > >
> > > > One of the transformations done in this patch series (patch 2) assumes
> > > > that arch_unmap either doesn't exist or does nothing.
> > > > PPC is the only architecture with an arch_unmap implementation, and
> > > > through the series I linked they're going to make it work via
> > > > ->close().
> > > >
> > > > What's the easiest way to deal with this? Can the PPC series go
> > > > through the mm tree?
> > > >
> > > This patch can't be merged until arch_unmap() is all removed (ppc change)
> > >
> > > Also I'm still doing a test/reviewing for this patch,  perhaps it is
> > > better to wait till my test is done.
> > >
> > Sorry that I'm late for updating this thread.
> >
> > With removing arch_unmap() change landed , there is no dependency for
> > the patch. However: I have other comments:
> >
> > 1. Testing
> > Testing is 90% of work when I modify kernel code and send a patch.  So
> > I'm a little disappointed that this patch doesn't have any test
> > updates or add new tests. Which makes me less confident about the
> > regression risk on mseal itself, i.e. a sealed mapping being
> > overwritten by mprotect/mmap/mremap/munmap.  I have posted the comment
> > in  [1], and I would like to repeat it to stress my point.
> >
> > The V2 series doesn't have selftest change which indicates lack of
> > testing. The out-of-loop check is positioned nearer to the API entry
> > point and separated from internal business logic, thereby minimizing
> > the testing requirements. However, as we move the sealing check
> > further inward and intertwine it with business logic, greater test
> > coverage becomes necessary to ensure  the correctness of  sealing
> > is preserved.
>
> Sorry, cut the crap. Your backhanded concerns are very funny.
> mseal was _your_ feature. You wrote the tests. You either didn't write
> or didn't understand what kinds of tests need/should be done,
> otherwise you would've done them. I wrote some code to fix the awful
> performance hit. It is a _fix_, not a feature. Your previous mseal
> tests should've covered this. If they didn't, that's okay (we all make
> mistakes), but pushing your problems onto me is seriously laughable.
>
I posted the comments about the lack of a test update in V2 last
monday, there is no response from you until Thursday night (which is
OK). If you were expecting me to update the test cases and to support
your patch series, you should explicitly call it out.

So your point here is that you wrote the code, passed  the existing
selftest, fixed the perf, and the job is done. If the test doesn't
cover the new  cases you added, it is not your problem, you only need
to care about perf part.

This is how regression happened in past, e.g.

commit 77795f900e2a07c1cbedc375789aefb43843b6c2
Author: Liam R. Howlett <Liam.Howlett@...cle.com>
Date:   Tue Jun 6 14:29:12 2023 -0400

    mm/mprotect: fix do_mprotect_pkey() limit check

    The return of do_mprotect_pkey() can still be incorrectly returned as
    success if there is a gap that spans to or beyond the end address passed
    in.  Update the check to ensure that the end address has indeed been seen.

    Link: https://lore.kernel.org/all/CABi2SkXjN+5iFoBhxk71t3cmunTk-s=rB4T7qo0UQRh17s49PQ@mail.gmail.com/
    Link: https://lkml.kernel.org/r/20230606182912.586576-1-Liam.Howlett@oracle.com
    Fixes: 82f951340f25 ("mm/mprotect: fix do_mprotect_pkey() return on error")
    Signed-off-by: Liam R. Howlett <Liam.Howlett@...cle.com>
    Reported-by: Jeff Xu <jeffxu@...omium.org>
    Reviewed-by: Lorenzo Stoakes <lstoakes@...il.com>
    Acked-by: David Hildenbrand <david@...hat.com>
    Acked-by: Vlastimil Babka <vbabka@...e.cz>
    Cc: <stable@...r.kernel.org>
    Signed-off-by: Andrew Morton <akpm@...ux-foundation.org>


Had not I wrote selftest to discover this mprotect regression, it
would go unnoticed  and stay that way.

My point is: the existing selftest for mseal  is written for out-loop,
and will not fully test in-loop. Your patch has made a big change to
mseal, more tests are needed.

To move forward from this situation for your patch series, either you
or me will need to update the tests. How about sharing the load ?

> I want to stress this bit: There's no mseal feature, there's no
> business logic. There's mmap, munmap, mprotect, madvise, mremap (among
> others). These are the things people actually care about, and need to
> stay fast. Memory management is a core part of the kernel. All of the
> very pretty abstractions you're talking about aren't applicable to
> kernel code (for any kernel, really) in general. No one wants to pay
> the cost of walking through a data structure 2 or 3 times just to
> "separate the business logic" for a random, minimally simple feature.
>
The testing is about making sure that sealing is preserved after
mmap/mremap/munmap/mprotect call, there is no real software that will
do that kind of testing, even in the future, selftest is the only way.

Security features never come quickly, adding  syscall to the kernel is
the first step to allow apps to use it.

> >
> > Yes. I promised to run some tests, which I did, with the existing self
> > test (that passed),  also I added more tests in the mremap selftest.
> > However I'm bound by the time that I can spend on this  (my other
> > duties and deliverables), I can't test it as much as I like to for
> > in-loop change (in a time frame demanded by a dev in this ml). Because
> > this patch is not getting tested as it should be, my confidence for
> > the V2 patch is low .
>
> Ok so you: tried to explain to me how to run selftests in v1 (when I
> actively did _before sending_, and found a bug in your tests, and
> wrote about it in-depth), pledge to "run some tests", never get back
> to us, push the "the testsuite I wrote is lacking" concern onto me,
> send a whole separate parallel patch set that tries to address _one_
> of the regressions with complete disregard for the work done here,
> complain about a lack of time, and now say a backhanded "your
> confidence for the V2 patch is low".
>
> I seriously have no words.
> I want to stress I have no way to test real software that uses mseal
> because APPARENTLY THERE IS NONE. THE FEATURE ADDED EXPLICITLY FOR
> CHROMIUM IS NOT USED BY UPSTREAM CHROMIUM.
>
> >
> > 2 perf testing
> > stress-ng is not stable in my test with Chromebook, and I'm requesting
> >  Oliver to take more samples [2] . This due diligence assures that
> > this patch accurately fulfills its purpose. The in-loop approach adds
> > complexity to the code, i.e. future dev is harder to understand the
> > sealing logic. Additionally, it sacrifices a security feature that
> > makes it harder for an attacker to modify mapping (currently if an
> > attacker uses munmap with a large address range, if one of the
> > addresses is sealed, the entire range is not modified. In the in-loop
> > approach,  memory will be unmapped till it hits the sealed memory).
>
> Wrong. munmap is atomic and always has been. It's required by POSIX.
>
Please run this test on the latest kernel branch to verify:

static void test_munmap_free_multiple_ranges(bool seal)
{
        void *ptr;
        unsigned long page_size = getpagesize();
        unsigned long size = 8 * page_size;
        int ret;
        int prot;

        setup_single_address(size, &ptr);
        FAIL_TEST_IF_FALSE(ptr != (void *)-1);

        /* unmap one page from beginning. */
        ret = sys_munmap(ptr, page_size);
        FAIL_TEST_IF_FALSE(!ret);

        /* unmap one page from middle. */
        ret = sys_munmap(ptr + 4 * page_size, page_size);
        FAIL_TEST_IF_FALSE(!ret);

        /* seal the last page */
        if (seal) {
                ret = sys_mseal(ptr + 7 * page_size, page_size);
                FAIL_TEST_IF_FALSE(!ret);
        }

        /* munmap all 8  pages from beginning */
        ret = sys_munmap(ptr, 8 * page_size);
        if (seal) {
                FAIL_TEST_IF_FALSE(ret < 0);

                /* verify none of existing pages in  the range are unmapped */
                size = get_vma_size(ptr + page_size, &prot);
                FAIL_TEST_IF_FALSE(size == 3 * page_size);
                FAIL_TEST_IF_FALSE(prot == 4);

                size = get_vma_size(ptr +  5 * page_size, &prot);
                FAIL_TEST_IF_FALSE(size == 2 * page_size);
                FAIL_TEST_IF_FALSE(prot == 4);

                size = get_vma_size(ptr +  7 * page_size, &prot);
                FAIL_TEST_IF_FALSE(size == 1 * page_size);
                FAIL_TEST_IF_FALSE(prot == 4);
        } else {
                FAIL_TEST_IF_FALSE(!ret);

                /* verify all pages are unmapped */
                for (int i = 0; i < 8; i++) {
                        size = get_vma_size(ptr, &prot);
                        FAIL_TEST_IF_FALSE(size == 0);
                }
        }

        REPORT_TEST_PASS();
}

test_munmap_free_multiple_ranges(true)
test_munmap_free_multiple_ranges(false)

> I would also ask you to back these partial mprotect (assuming that's
> what you meant) concerns with a real attack that takes this into
> account, instead of hand waving "security".
> While noting that all of these operations can already partially fail,
> this is not new (and is explicitly permitted by POSIX for
> syscalls-not-named-munmap).
>
As defence gets stronger, with seccomp,selinux,landlock, attackers now
have to find an easier route.

> > Therefore, I would like to ascertain the gain.
>
> The gain is real. I've proven it with will-it-scale, Oliver ran his
> tests and found the regression to be gone (and they do seem to do
> proper statistical analysis).
> I would wager you to find a workload or hardware where *doing
> substantially less work* makes for slower code.
>
> >
> > 3 mremap refactor work.
>
> mremap refactoring is not related to these mmap regressions. In the v3
> I'm cleaning up and sending out tomorrow, I minimally change mremap
> (as Liam wanted).
>
If the test issue is not resolved, V3 will be in the same situation as V2.

Best Regards,
-Jeff

> --
> Pedro