lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <94924bd5-c0e2-4d97-a693-57786d9c7fc9@lucifer.local>
Date: Tue, 11 Feb 2025 10:58:19 +0000
From: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
To: Jeff Xu <jeffxu@...omium.org>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
        Vlastimil Babka <vbabka@...e.cz>,
        "Liam R. Howlett" <Liam.Howlett@...cle.com>,
        Mark Brown <broonie@...nel.org>,
        Shuah Khan <skhan@...uxfoundation.org>,
        LKML <linux-kernel@...r.kernel.org>, linux-hardening@...r.kernel.org,
        linux-kselftest@...r.kernel.org, linux-mm@...ck.org,
        Jorge Lucangeli Obes <jorgelo@...omium.org>,
        Kees Cook <keescook@...omium.org>,
        Pedro Falcato <pedro.falcato@...il.com>,
        Randy Dunlap <rdunlap@...radead.org>, Jann Horn <jannh@...gle.com>
Subject: Re: [RFC PATCH v1 1/1] selftest/mm: refactor mseal_test

On Mon, Feb 10, 2025 at 01:35:52PM -0800, Jeff Xu wrote:
> Hi Lorenzo,
>
> Gentle ping for my clarification questions.
>
> I also tried the new ioctl PROCMAP_QUERY, please see below for details.
>

Hi Jeff,

Sorry I thought you'd send a new version which is why I didn't reply. I will
take a look through and get back to you, it's on my whiteboard now...

Apologies for the delay, this was just an error!

And thanks for taking a look at the ioctl, will take a look and that when I
get a sec also.

Cheers, Lorenzo


> On Wed, Jan 15, 2025 at 12:47 PM Jeff Xu <jeffxu@...omium.org> wrote:
> >
> > Hi Lorenzo,
> >
> > On Thu, Jan 2, 2025 at 9:30 AM Lorenzo Stoakes
> > <lorenzo.stoakes@...cle.com> wrote:
> > >
> > > Sorry for delay, was super busy leading up to xmas break, then had ~2.5
> > > weeks off.
> > >
> > Thanks for reviewing.  There are lots of comments, so it takes some
> > time to go through comments and experiment with some of the suggested
> > changes. Please see below.
> >
> > > And happy new year! :)
> > >
> > Happy new year !
> >
> > > On Wed, Dec 11, 2024 at 05:33:11AM +0000, jeffxu@...omium.org wrote:
> > > > From: Jeff Xu <jeffxu@...omium.org>
> > > >
> > > > This change creates the initial version of memorysealing.c.
> > > >
> > > > The introduction of memorysealing.c, which replaces mseal_test.c and
> > >
> > > I mean I don't want to be a pain but I would kinda prefer to have 'mseal'
> > > everywhere mseal is to avoid confusion vs. memfd seals.
> > >
> > As I tried to explain, the memorysealing.c will eventually replace
> > mseal_test.c, after all the tests are moved to use kselftest_karness.
> > How about mseal_test_new.c ? I'm open to new names if you have one.
> >
> > > Of course we in the kernel absolutely _love_ to overload the meaning of terms
> > > but some traditions are worth breaking :)
> > >
> > > > uses the kselftest_harness, aims to initiate a discussion on using the
> > > > selftest harness for memory sealing tests. Upon approval of this
> > > > approach, the migration of tests from mseal_test.c to memorysealing.c
> > > > can be implemented in a step-by-step manner.
> > >
> > > Well, I for one like this approach so (unsurprisingly) :) but perhaps
> > > sensible to do it iteratively so we can also tweak things as we go.
> > >
> > Yes, that is the idea.
> >
> > > >
> > > > This tests addresses following feedback from previous reviews:
> > > >
> > > > 1> Use kselftest_harness instead of custom macro, such as EXPECT_XX,
> > > > ASSERT_XX, etc.  (Lorenzo Stoakes, Mark Brown) [1]
> > > >
> > > > 2> Use MAP_FAILED to check the return of mmap (Lorenzo Stoakes).
> > >
> > > Thanks!
> > >
> > > >
> > > > 3>  Adding a check for vma size and prot bits. The discussion for
> > > >     this can be found in [2] [3], here is a brief summary:
> > > >     This is to follow up on Pedro’s in-loop change (from
> > > >     can_modify_mm to can_modify_vma). When mseal_test is initially
> > > >     created, they have a common pattern:  setup memory layout,
> > > >     seal the memory, perform a few mm-api steps, verify return code
> > > >     (not zero).  Because of the nature of out-of-loop,  it is sufficient
> > > >     to just verify the error code in a few cases.
> > > >
> > > >     With Pedro's in-loop change, the sealing check happens later in the
> > > >     stack, thus there are more things and scenarios to verify. And there
> > > >     was feedbacks to me that mseal_test should be extensive enough to
> > > >     discover all regressions. Hence I'm adding check for vma size and prot
> > > >     bits.
> > > >
> > > > In this change: we created two fixtures:
> > > >
> > > > Fixture basic:   This creates a single VMA, the VMA has a
> > > > PROT_NONE page at each end to prevent auto-merging.
> > > >
> > > > Fixture wo_vma: Two VMAs back to end, a PROT_NONE page at each
> > > > end to prevent auto-merging.
> > > >
> > > > In addition, I add one test (mprotec) in each fixture to demo the tests.
> > > >
> > > > [1] https://lore.kernel.org/all/20240830180237.1220027-5-jeffxu@chromium.org/
> > > > [2] https://lore.kernel.org/all/CABi2SkUgDZtJtRJe+J9UNdtZn=EQzZcbMB685P=1rR7DUhg=6Q@mail.gmail.com/
> > > > [3] https://lore.kernel.org/all/2qywbjb5ebtgwkh354w3lj3vhaothvubjokxq5fhyri5jeeton@duqngzo3swjz/
> > > >
> > > > Signed-off-by: Jeff Xu <jeffxu@...omium.org>
> > > > ---
> > > >  tools/testing/selftests/mm/.gitignore      |   1 +
> > > >  tools/testing/selftests/mm/Makefile        |   1 +
> > > >  tools/testing/selftests/mm/memorysealing.c | 182 +++++++++++++++++++++
> > > >  tools/testing/selftests/mm/memorysealing.h | 116 +++++++++++++
> > > >  tools/testing/selftests/mm/mseal_test.c    |  67 +-------
> > > >  5 files changed, 301 insertions(+), 66 deletions(-)
> > > >  create mode 100644 tools/testing/selftests/mm/memorysealing.c
> > > >  create mode 100644 tools/testing/selftests/mm/memorysealing.h
> > > >
> > > > diff --git a/tools/testing/selftests/mm/.gitignore b/tools/testing/selftests/mm/.gitignore
> > > > index a51a947b2d1d..982234a99f20 100644
> > > > --- a/tools/testing/selftests/mm/.gitignore
> > > > +++ b/tools/testing/selftests/mm/.gitignore
> > > > @@ -57,3 +57,4 @@ hugetlb_dio
> > > >  pkey_sighandler_tests_32
> > > >  pkey_sighandler_tests_64
> > > >  guard-pages
> > > > +memorysealing
> > > > diff --git a/tools/testing/selftests/mm/Makefile b/tools/testing/selftests/mm/Makefile
> > > > index 93a46ac633df..08876624f46d 100644
> > > > --- a/tools/testing/selftests/mm/Makefile
> > > > +++ b/tools/testing/selftests/mm/Makefile
> > > > @@ -67,6 +67,7 @@ TEST_GEN_FILES += map_populate
> > > >  ifneq (,$(filter $(ARCH),arm64 riscv riscv64 x86 x86_64))
> > > >  TEST_GEN_FILES += memfd_secret
> > > >  endif
> > > > +TEST_GEN_FILES += memorysealing
> > > >  TEST_GEN_FILES += migration
> > > >  TEST_GEN_FILES += mkdirty
> > > >  TEST_GEN_FILES += mlock-random-test
> > > > diff --git a/tools/testing/selftests/mm/memorysealing.c b/tools/testing/selftests/mm/memorysealing.c
> > > > new file mode 100644
> > > > index 000000000000..e10032528b64
> > > > --- /dev/null
> > > > +++ b/tools/testing/selftests/mm/memorysealing.c
> > > > @@ -0,0 +1,182 @@
> > > > +// SPDX-License-Identifier: GPL-2.0-or-later
> > > > +#define _GNU_SOURCE
> > > > +#include "../kselftest_harness.h"
> > > > +#include <asm-generic/unistd.h>
> > > > +#include <errno.h>
> > > > +#include <syscall.h>
> > > > +#include "memorysealing.h"
> > > > +
> > > > +/*
> > > > + * To avoid auto-merging, create a VMA with PROT_NONE pages at each end.
> > > > + * If unsuccessful, return MAP_FAILED.
> > > > + */
> > > > +static void *setup_single_address(int size, int prot)
> > >
> > > Nitty, but size should really be an unsigned long.
> > >
> > sure, thanks
> >
> > > > +{
> > > > +     int ret;
> > > > +     void *ptr;
> > > > +     unsigned long page_size = getpagesize();
> > >
> > > Trivial, but we store page size in self->page_size so why not pass it as a
> > > parameter instead of looking up?
> > >
> > sure.
> >
> > > > +
> > > > +     ptr = mmap(NULL, size + 2 * page_size, prot,
> > > > +             MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
> > > > +
> > > > +     if (ptr == MAP_FAILED)
> > > > +             return MAP_FAILED;
> > > > +
> > > > +     /* To avoid auto-merging, change to PROT_NONE at each end. */
> > > > +     ret = sys_mprotect(ptr, page_size, PROT_NONE);
> > > > +     if (ret != 0)
> > > > +             return MAP_FAILED;
> > > > +
> > > > +     ret = sys_mprotect(ptr + size + page_size, page_size, PROT_NONE);
> > > > +     if (ret != 0)
> > > > +             return MAP_FAILED;
> > > > +
> > > > +     return ptr + page_size;
> > > > +}
> > >
> > > This could be done more easily with a single PROT_NONE, like:
> > >
> > >         static void *setup_single_address(unsigned long size, int prot,
> > >                         unsigned long page_size)
> > >         {
> > >                 void *ptr;
> > >
> > >                 /* Ensure we only merge what we intend to. */
> > >                 ptr = mmap(NULL, size + 2 * page_size, PROT_NONE,
> > >                            MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
> > >                 if (ptr == MAP_FAILED)
> > >                         return MAP_FAILED;
> > >
> > >                 return mmap(&ptr[page_size], size, prot,
> > >                             MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
> > >         }
> > >
> > sure, create PROT_NONE mapping first, then use mprotect to change the
> > middle part.
> >
> > > It's probably worth having an equivalent freeing function to avoid leaks:
> > >
> > >         static int free_single_address(void *ptr, unsigned long size,
> > >                         unsigned long page_size)
> > >         {
> > >                 /* Also free the PROT_NONE sentinels. */
> > >                 return munmap(ptr - page_size, size + 2 * page_size);
> >
> > [resp 1] could you clarify? I think we won't need freeing because
> > mapping will be sealed. You might recall, mseal() blocks munmap(), and
> > munmap() is atomic,  so the above line that you suggested will result
> > in no-action during FIXTURE_TEARDOWN. There will be separate test
> > cases to test munmap under sealing. In addition, the virtual address
> > space is big enough for this test complete without worry about the
> > out-of-memory situation. Do I miss anything ?
> >
> > >         }
> > >
> > > > +
> > > > +FIXTURE(basic)
> > > > +{
> > > > +     unsigned long page_size;
> > > > +     unsigned long size;
> > > > +     void *ptr;
> > > > +};
> > > > +
> > > > +/*
> > > > + * Setup for basic:
> > > > + * Single VMA with 4 pages, prot = (PROT_READ | PROT_WRITE).
> > > > + */
> > > > +FIXTURE_SETUP(basic)
> > > > +{
> > > > +     int prot;
> > > > +
> > > > +     self->page_size = getpagesize();
> > > > +
> > > > +     if (!mseal_supported())
> > > > +             SKIP(return, "mseal is not supported");
> > > > +
> > > > +     /* Create a single VMA with 4 pages, prot as PROT_READ | PROT_WRITE. */
> > > > +     self->size = self->page_size * 4;
> > > > +     self->ptr = setup_single_address(self->size, PROT_READ | PROT_WRITE);
> > > > +     EXPECT_NE(self->ptr, MAP_FAILED);
> > > > +
> > > > +     EXPECT_EQ(self->size, get_vma_size(self->ptr, &prot));
> > >
> > > I mean fine but this check seems a bit overkill here, we don't need to
> > > check to ensure that PROT_NONE works.
> > >
> > [resp 2] Those checks are there,  for the same reason that test cases
> > usually check the return value of mmap(), mprotect() in the
> > FIXTURE_SETUP. You are right that those calls are unlikely to fail,
> > however, test cases generally do those  "housekeeping tasks"  to make
> > sure everything is OK before  the actual test. If you don't want to
> > keep those, what is the downside of keeping them ?
> >
> > > > +     EXPECT_EQ(PROT_READ | PROT_WRITE, prot);
> > >
> > > Nitty, but this seems a bit 'Yoda style', that is placing the expectation
> > > prior to the value whose expectation we are assessing, that is perhaps it'd
> > > be clearer as:
> > >
> > >         EXPECT_EQ(prot, PROT_READ | PROT_WRITE);
> > >
> > [resp 3] Both styles are used in self tests and I believe it is
> > equally clear,  I can accommodate this request for this patch.
> >
> > > And again, I think this is really overkill here, if the kernel is unable to
> > > do an ordinary mapping correctly we have bigger problems than an mseal bug!
> > >
> > please see [resp 2]
> >
> > > > +};
> > > > +
> > > > +FIXTURE_TEARDOWN(basic)
> > > > +{
> > >
> > > Probably worth adding munmap() teardown here.
> > >
> > please see [resp 1]
> >
> > > > +}
> > > > +
> > > > +FIXTURE(two_vma)
> > > > +{
> > > > +     unsigned long page_size;
> > > > +     unsigned long size;
> > > > +     void *ptr;
> > > > +};
> > > > +
> > > > +/*
> > > > + * Setup for two_vma:
> > > > + * Two consecutive VMAs, each with 2 pages.
> > > > + * The first VMA:  prot = PROT_READ.
> > > > + * The second VMA: prot = (PROT_READ | PROT_WRITE).
> > > > + */
> > > > +FIXTURE_SETUP(two_vma)
> > > > +{
> > > > +     int prot;
> > > > +     int ret;
> > > > +
> > > > +     self->page_size = getpagesize();
> > > > +
> > > > +     if (!mseal_supported())
> > > > +             SKIP(return, "mseal is not supported");
> > > > +
> > > > +     /* Create a single VMA with 4 pages, prot as PROT_READ | PROT_WRITE. */
> > > > +     self->size = getpagesize() * 4;
> > > > +     self->ptr = setup_single_address(self->size, PROT_READ | PROT_WRITE);
> > > > +     EXPECT_NE(self->ptr, MAP_FAILED);
> > > > +
> > > > +     /* Use mprotect to split as two VMA. */
> > > > +     ret = sys_mprotect(self->ptr, self->page_size * 2, PROT_READ);
> > > > +     ASSERT_EQ(ret, 0);
> > > > +
> > >
> > > > +     /* Verify the first VMA is 2 pages and prot bits */
> > > > +     EXPECT_EQ(self->page_size * 2, get_vma_size(self->ptr, &prot));
> > > > +     EXPECT_EQ(PROT_READ, prot);
> > > > +
> > > > +     /* Verify the second VMA is 2 pages and prot bits */
> > > > +     EXPECT_EQ(self->page_size * 2,
> > > > +             get_vma_size(self->ptr + self->page_size * 2, &prot));
> > > > +     EXPECT_EQ(PROT_READ | PROT_WRITE, prot);
> > >
> > > Again, as with the prior fixture, it seems a bit overkill, as this is
> > > essentially testing 'does mmap() and mprotect()' work. We should hope so :)
> > Please see [resp 2]
> >
> > > > +};
> > > > +
> > > > +FIXTURE_TEARDOWN(two_vma)
> > > > +{
> > >
> > > Same comment as for the basic fixture - let's clean up after ourselves.
> > >
> > please see [resp 1]
> >
> > > > +}
> > > > +
> > > > +/*
> > > > + * Verify mprotect is blocked.
> > > > + */
> > > > +TEST_F(basic, mprotect_basic)
> > > > +{
> > > > +     int ret;
> > > > +     unsigned long size;
> > > > +     int prot;
> > > > +
> > > > +     /* Seal the mapping. */
> > > > +     ret = sys_mseal(self->ptr, self->size, 0);
> > > > +     ASSERT_EQ(ret, 0);
> > > > +
> > > > +     /* Verify mprotect is blocked. */
> > > > +     ret = sys_mprotect(self->ptr, self->size, PROT_READ);
> > > > +     EXPECT_GT(0, ret);
> > >
> > > Yeah this one is kinda egregious, this is genuinely hard to read. Wouldn't
> > > this be better as:
> > >
> > >         EXPECT_NE(ret, 0);
> > >
> > > As it explicitly says 'not equal to a success case'. Rather than 'wait 0 is
> > > _greater than_ the result which means the result is _less than_ 0 which
> > > means an error but let me check the man page can it be positive, oh ok it's
> > > -1 on error, 0 on success', which was the Lorenzo thought process... :)
> > >
> > [resp 4] The man page says: "On success, mprotect() and
> > pkey_mprotect() return zero.  On error, these system calls return -1,
> > and errno is set to indicate the error."
> >
> > I wll switch EXPECT_EQ(ret, -1) which is precisely what the man page says.
> >
> > > > +     EXPECT_EQ(EPERM, errno);
> > >
> > > Let's invert these, Yoda style isn't really useful here and is a little
> > > harder to read, e.g. we should do:
> > >
> > >         EXPECT_EQ(errno, EPERM);
> > >
> > please see [resp 3]
> >
> > > > +
> > > > +     /* Verify the VMA (sealed) isn't changed */
> > > > +     size = get_vma_size(self->ptr, &prot);
> > > > +     EXPECT_EQ(self->size, size);
> > > > +     EXPECT_EQ(PROT_READ | PROT_WRITE, prot);
> > >
> > > Again, no to Yoda :)
> > >
> > please see [resp 3]
> >
> > > I also wonder how useful this is - I guess belts + braces, and linux _can_
> > > do partial failures, but is that something we would care about in practice?
> > >
> > [resp 5]
> > Those checks will help to detect regression, such as  when Pedro
> > Falcato implemented in-loop change [1]. Liam R. Howlett also commented
> > that mseal_test should be extensive enough to check for size and prot
> > to avoid future regression [2]
> >
> > For your question about partial failure, there will be additional
> > tests for partial-sealed-address range, and those will check the
> > sealed mapping only.
> >
> > > > +}
> > > > +
> > > > +/*
> > > > + * Seal both VMAs in one mseal call.
> > > > + * Verify mprotect is blocked on both VMAs in various cases.
> > > > + */
> > > > +TEST_F(two_vma, mprotect)
> > >
> > > I'm guessing you aren't necessarily covering _all_ mprotect cases here
> > > (though perhaps you intend to later, iteratively?)
> > >
> > Yes.
> >
> > > > +{
> > > > +     int ret;
> > > > +     int prot;
> > > > +     unsigned long size;
> > >
> > > I think 'size' is a bit confusing as it refers to the size ascertained from
> > > the get_vma_size() later on. Perhaps call it actual_size?
> > >
> > To help  readability, I will change size to vma_size, which
> > corresponds to the return of get_vma_size().
> >
> > > Maybe worth adding:
> > >
> > >         void *ptr = self->ptr;
> > >         unsigned long size = self->size;
> > >         unsigned long page_size = self->page_size;
> > >
> > > For convenience?
> > >
> > > This is totally optional though, maybe more effort than it's worth.
> > >
> > I don't think it is worth it.
> >
> > > > +
> > > > +     /* Seal both VMAs in one mseal call. */
> > > > +     ret = sys_mseal(self->ptr, self->size, 0);
> > > > +     ASSERT_EQ(ret, 0);
> > > > +
> > > > +     /* Verify mprotect is rejected on the first VMA. */
> > > > +     ret = sys_mprotect(self->ptr, self->page_size * 2,
> > > > +             PROT_READ | PROT_EXEC);
> > >
> > > Why PROT_READ | PROT_EXEC? Is this arbitrary? To be thorough shouldn't you
> > > loop through all of the prot flags that you think should fail? And perhaps
> > > check what happens if you mprotect() the VMA to set it to what it already
> > > is?
> > >
> > [resp 6]
> > Not arbitrary. As commented in FIXTURE_SETUP, the first VMA is
> > PROT_READ, and the second VMA is PROT_READ | PROT_WRITE.   The test
> > wants to use mprotect to change the proc bit to a different set.
> >
> > For your question about mprotect() the sealed vma to what it already
> > is (doesn't change any attribute), I don't want to solidify the
> > current behavior (which is to block). It should be successful instead.
> > With in-loop check and partial update behavior,  there is no reason to
> > block it, i.e. no security benefit. I already plan to address this.
> >
> > > > +     EXPECT_GT(0, ret);
> > >
> > > Please convert to EXPECT_NE(ret, 0)
> > >
> > please see [resp 4]
> >
> > > > +     EXPECT_EQ(EPERM, errno);
> > >
> > > Please invert to EXPECT_EQ(errno, EPERM)
> > >
> > please see [resp 3]
> >
> > > > +
> > > > +     /* Verify mprotect is rejected on the second VMA. */
> > > > +     ret = sys_mprotect(self->ptr, self->page_size * 2,
> > > > +             PROT_READ | PROT_EXEC);
> > >
> > > Hang on, isn't this just a repeat of the first test? Shouldn't this be:
> > >
> > >         ret = sys_mprotect(&self->ptr[self->page_size * 2],
> > >                 self->page_size * 2, PROT_READ | PROT_EXEC);
> > >
> > [resp 7] Not the same, this test has two VMA, and  the mseal() covers
> > both VMAs in one call. In the first test, i.e. basic, mprotect_basic,
> > mseal is called on single VMA.
> >
> > > > +     EXPECT_GT(0, ret);
> > > > +     EXPECT_EQ(EPERM, errno);
> > >
> > > Same comment as above re: ordering.
> > >
> > please see [resp 3]
> >
> > > > +
> > > > +     /* Attempt of mprotect two VMAs at the same call is blocked */
> > > > +     ret = sys_mprotect(self->ptr, self->size,
> > > > +             PROT_READ | PROT_EXEC);
> > >
> > > Same comment about iterating through prot flags that should fail as above.
> > >
> > please see [resp 7]
> >
> > > > +     EXPECT_GT(0, ret);
> > > > +     EXPECT_EQ(EPERM, errno);
> > >
> > > Same comment as above re: ordering.
> > >
> > please see [resp 3]
> >
> > > > +
> > > > +     /* Verify both VMAs aren't changed. */
> > > > +     size = get_vma_size(self->ptr, &prot);
> > > > +     EXPECT_EQ(self->page_size * 2, size);
> > > > +     EXPECT_EQ(PROT_READ, prot);
> > >
> > > Same comment as above re: ordering.
> > >
> > please see [resp 3]
> >
> > > > +
> > > > +     size = get_vma_size(self->ptr + self->page_size * 2, &prot);
> > > > +     EXPECT_EQ(self->page_size * 2, size);
> > > > +     EXPECT_EQ(PROT_READ | PROT_WRITE, prot);
> > >
> > > Same comment as above re: ordering.
> > >
> > please see [resp 3]
> >
> > > > +}
> > > > +
> > > > +TEST_HARNESS_MAIN
> > > > diff --git a/tools/testing/selftests/mm/memorysealing.h b/tools/testing/selftests/mm/memorysealing.h
> > > > new file mode 100644
> > > > index 000000000000..aee6e6700028
> > > > --- /dev/null
> > > > +++ b/tools/testing/selftests/mm/memorysealing.h
> > > > @@ -0,0 +1,116 @@
> > > > +/* SPDX-License-Identifier: GPL-2.0 */
> > > > +#include <syscall.h>
> > > > +
> > > > +/*
> > > > + * Define sys_xyx to call syscall directly.
> > > > + * This is needed because we want to avoid calling glibc and
> > > > + * test syscall directly.
> > >
> > > I do wonder if this is worth it, but I mean, I guess it's not too egregious
> > > if it's wrapped like this. Same comment for other wrappers.
> > >
> > The self-test should directly test the kernel syscall without any
> > intermediate layer. selftest is not a libc test. It is practical too,
> > glibc's behavior can change across versions which result inconsistent,
> > and Padro Falcato reported one of this and fixed in [3]
> >
> > > > + * The only exception is mmap, which _NR_mmap2 is not defined for
> > > > + * some ARM architecture.
> > > > + */
> > > > +static inline int sys_mseal(void *start, size_t len, unsigned long flags)
> > > > +{
> > > > +     int sret;
> > > > +
> > > > +     errno = 0;
> > >
> > > Why are we setting this to 0? Same comment for all sys_XXX() wrappers.
> > >
> > The test is only interested in the latest error associated with the
> > failed syscall, and I'm not sure the kernel clears the errno for each
> > system call. Note, there is no glibc layer here.
> >
> > > > +     sret = syscall(__NR_mseal, start, len, flags);
> > > > +     return sret;
> > >
> > > Nit, but this seems a bit redundant, why not just drop the sret assignment,
> > > and simply:
> > >
> > >         return syscall(__NR_mseal, start, len, flags);
> > >
> > > Same comment for all sys_XXX() wrappers.
> > >
> > [resp 6] I think this is a different taste in coding style. For the
> > test, I prefer to have local variables so it is easy to show them in
> > the debugger, when needed.
> >
> > > > +}
> > > > +
> > > > +static inline int sys_mprotect(void *ptr, size_t size, unsigned long prot)
> > > > +{
> > > > +     int sret;
> > > > +
> > > > +     errno = 0;
> > > > +     sret = syscall(__NR_mprotect, ptr, size, prot);
> > > > +     return sret;
> > > > +}
> > > > +
> > >
> > > The wrappers below don't seem to be used yet, could we delay putting them
> > > in until they are actually used?
> > >
> > Sure, I will remove sys_mprotect_pkey for this patch.
> >
> > > > +static inline int sys_mprotect_pkey(void *ptr, size_t size,
> > > > +     unsigned long orig_prot, unsigned long pkey)
> > > > +{
> > > > +     int sret;
> > > > +
> > > > +     errno = 0;
> > > > +     sret = syscall(__NR_pkey_mprotect, ptr, size, orig_prot, pkey);
> > > > +     return sret;
> > > > +}
> > > > +
> > > > +static inline int sys_munmap(void *ptr, size_t size)
> > > > +{
> > > > +     int sret;
> > > > +
> > > > +     errno = 0;
> > > > +     sret = syscall(__NR_munmap, ptr, size);
> > > > +     return sret;
> > > > +}
> > > > +
> > > > +static inline int sys_madvise(void *start, size_t len, int types)
> > > > +{
> > > > +     int sret;
> > > > +
> > > > +     errno = 0;
> > > > +     sret = syscall(__NR_madvise, start, len, types);
> > > > +     return sret;
> > > > +}
> > > > +
> > > > +static inline void *sys_mremap(void *addr, size_t old_len, size_t new_len,
> > > > +     unsigned long flags, void *new_addr)
> > > > +{
> > > > +     void *sret;
> > > > +
> > > > +     errno = 0;
> > > > +     sret = (void *) syscall(__NR_mremap, addr, old_len, new_len, flags, new_addr);
> > > > +     return sret;
> > > > +}
> > > > +
> > > > +/*
> > > > + * Parsing /proc/self/maps to get VMA's size and prot bit.
> > > > + */
> > > > +static unsigned long get_vma_size(void *addr, int *prot)
> > > > +{
> > > > +     FILE *maps;
> > > > +     char line[256];
> > > > +     int size = 0;
> > > > +     uintptr_t  addr_start, addr_end;
> > >
> > > Why not add a:
> > >
> > >         uintptr_t addr_val = (uintptr_t)addr;
> > >
> > > To avoid having to constantly cast below?
> > >
> > sure.
> >
> > > > +     char protstr[5];
> > > > +     *prot = 0;
> > >
> > > Nit but not sure there's much point in setting this to 0 even in failure
> > > cases?
> > >
> > > With an error code return you can avoid having to do this.
> > >
> > [resp 8]
> > I don't think the error code return is worth the effort, it currently
> > returns 0 size in case of error or not-found. For easy debug parsing
> > errors, I can add logs.  Notes, the parsing should always be
> > successful.
> >
> > > > +
> > > > +     maps = fopen("/proc/self/maps", "r");
> > > > +     if (!maps)
> > > > +             return 0;
> > >
> > > Might it be better to return an error code or something so we can identify
> > > this case? I guess any sensible call will fail on a 0 but you're sort of
> > > just implicitly hoping for this.
> > >
> > [resp 9] we could add logging in case of error.
> >
> > > > +
> > > > +     while (fgets(line, sizeof(line), maps)) {
> > > > +             if (sscanf(line, "%lx-%lx %4s", &addr_start, &addr_end, protstr) == 3) {
> > > > +                     if (addr_start == (uintptr_t) addr) {
> > >
> > > This is a Yoda comparison again, let's put what we're comparing first, and
> > > the thing we're comparing to afterwards, e.g.
> > >
> > >         if ((uintptr_t)addr == addr_start) ...
> > >
> > addr is the expected value from input and  addr_start changes after
> > parsing each line. The expected value (addr) is in second position,
> > why is this yoda syntax ? IMO, introducing the yoda syntax concept for
> > "=="  is more confusing than useful, A==B is as clear as B==A.
> >
> > > Also I don't think a space after the (type) cast is necesary.
> > >
> > sure
> >
> > > > +                             size = addr_end - addr_start;
> > > > +                             if (protstr[0] == 'r')
> > > > +                                     *prot |= PROT_READ;
> > > > +                             if (protstr[1] == 'w')
> > > > +                                     *prot |= PROT_WRITE;
> > > > +                             if (protstr[2] == 'x')
> > > > +                                     *prot |= PROT_EXEC;
> > > > +                             break;
> > > > +                     }
> > > > +             }
> > > > +     }
> > >
> > > On the impl: This would be neater if you used guard clauses:
> > >
> > >         while (fgets(line, sizeof(line), maps)) {
> > >                 if (sscanf(line, "%lx-%lx %4s", &addr_start, &addr_end, protstr) != 3)
> > >                         continue;
> > >
> > >                 if (addr_val != addr_start)
> > >                         continue;
> > >
> > >                 size = addr_end - addr_start;
> > >                 if (protstr[0] == 'r')
> > >                         *prot |= PROT_READ;
> > >                 if (protstr[1] == 'w')
> > >                         *prot |= PROT_WRITE;
> > >                 if (protstr[2] == 'x')
> > >                         *prot |= PROT_EXEC;
> > >                 break;
> > >         }
> > >
> > > This is a lot more readable?
> > >
> > sure.
> >
> > > On doing this at all - it feels a bit crazy to check this, but I guess it
> > > might be the only way to assert things about merge/split.
> > >
> > > I'm not sure it's _really useful_ to do that though - merges/splits are
> > > internal implementation details in effect (modulo mremap single VMA
> > > requirements), and mseal is an external interface, really the check should
> > > be around 'can forbidden operations be performed on sealed mappings'.
> > >
> > This is related to resp [5].
> > This test not only verifies the size, but also verifies the prot bits.
> > The argument that VMA is an internal behavior is questionable, the
> > /proc/pid/maps is user-facing and applications can read it and parse
> > it.
> >
> > > However, if you do really want to do this, I think it would be much better
> > > to use the new ioctl interface for this [0] which should avoid having to do
> > > unpleasant text wrangling :)
> > >
> > > [0]:https://lore.kernel.org/all/20240627170900.1672542-1-andrii@kernel.org/
> > >
> > I could try this. This is a new feature though, I do think this
> > selftest should be self-standing and has as minimal dependency as
> > possible.  Is there harm of using /proc/pid/maps ? parsing
> > /proc/pid/maps already used by selftest in lots of places (see below),
> > I would prefer testing to use well-established patterns than relying
> > on a newly developed feature.
> >
> > filesystems/overlayfs/dev_in_maps.c
> > mm/mlock2-tests.c
> > mm/mremap_test.c
> > mm/virtual_address_range.c
> > mm/split_huge_page_test.c
> > filesystems/overlayfs/dev_in_maps.c
> > exec/load_address.c
> > bpf/trace_helpers.c
> >
> I have tried new ioctl PROCMAP_QUERY to replace get_vma_size, and
> observed a few facts:
> 1.The ioctl requires the caller to set the correct PROT bits for the
> query to succeed.
> 2. The ioctl returns the VMA containing the query address, which may
> not start at that address.
> 3. The ioctl doesn't return the sealing flag.
>
> What get_vma_size wants is: for a given address, return the VMA
> started from that address, with PROT bits, size, and the sealing flag
> information.
>
> In the next version, I'll add a check for the VMA's seal flag (by
> querying /proc/pid/smaps). Since the ioctl doesn't support returning
> the sealing flag and smaps already contain length/prot bit
> information, using an additional ioctl doesn't provide any benefit.
>
> Thanks
> -Jeff
>
> > >
> > > > +     fclose(maps);
> > > > +     return size;
> > >
> > > If we don't find the line we don't seem to be indicating this, again worth
> > > being able to pass back an error?
> > >
> > It can be normal that we don't find a line, i.e. tests want to verify
> > that a VMA does not exist at address, return 0 size is correct in this
> > case. If a parsing error ever happened, the test would have failed at
> > FIXTURE_SETUP to begin with.
> >
> > > > +}
> > > > +
> > > > +static inline bool mseal_supported(void)
> > > > +{
> > > > +     int ret;
> > > > +     void *ptr;
> > > > +     unsigned long page_size = getpagesize();
> > > > +
> > > > +     ptr = mmap(NULL, page_size, PROT_READ, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
> > > > +     if (ptr == MAP_FAILED)
> > > > +             return false;
> > > > +
> > > > +     ret = sys_mseal(ptr, page_size, 0);
> > > > +     if (ret < 0)
> > > > +             return false;
> > >
> > > This is kind of cute, I mean it speaks to the need I think for us to be
> > > able to expose 'is X available?' programmatically. But I digress.
> > >
> > > I guess this is ok, as this is something that should never fail.
> > >
> > You might recall that mseal is not available on the 32 bits kernel,
> > automation might run the mseal_test on a 32 bit build.
> > mseal_supported() detects this so tests can be skipped.
> >
> > > But we should definitely clean up after ourselves here whether this
> > > succeeds or fails.
> > >
> > please see resp [1].
> >
> > Thanks!
> > -Jeff
> > [1] https://lore.kernel.org/all/CAKbZUD31EK2ah=vWJ46y4nL9OygzAa6ZxPnPHmWYO-sop5t+5Q@mail.gmail.com/
> > [2] https://lore.kernel.org/all/ko3pjllsgufbz33hqvwdpdsyxvgrgzqiodxexnpcng3mssffeh@dfgfkqlg46xa/
> > [3] https://lore.kernel.org/all/20240807153544.2754247-1-jeffxu@chromium.org/
> >
> >
> >
> >
> >
> >
> >
> > > > +
> > > > +     return true;
> > > > +}
> > > > diff --git a/tools/testing/selftests/mm/mseal_test.c b/tools/testing/selftests/mm/mseal_test.c
> > > > index ad17005521a8..8dd20341de3d 100644
> > > > --- a/tools/testing/selftests/mm/mseal_test.c
> > > > +++ b/tools/testing/selftests/mm/mseal_test.c
> > > > @@ -517,30 +517,6 @@ static void test_seal_twice(void)
> > > >       REPORT_TEST_PASS();
> > > >  }
> > > >
> > > > -static void test_seal_mprotect(bool seal)
> > > > -{
> > > > -     void *ptr;
> > > > -     unsigned long page_size = getpagesize();
> > > > -     unsigned long size = 4 * page_size;
> > > > -     int ret;
> > > > -
> > > > -     setup_single_address(size, &ptr);
> > > > -     FAIL_TEST_IF_FALSE(ptr != (void *)-1);
> > > > -
> > > > -     if (seal) {
> > > > -             ret = seal_single_address(ptr, size);
> > > > -             FAIL_TEST_IF_FALSE(!ret);
> > > > -     }
> > > > -
> > > > -     ret = sys_mprotect(ptr, size, PROT_READ | PROT_WRITE);
> > > > -     if (seal)
> > > > -             FAIL_TEST_IF_FALSE(ret < 0);
> > > > -     else
> > > > -             FAIL_TEST_IF_FALSE(!ret);
> > > > -
> > > > -     REPORT_TEST_PASS();
> > > > -}
> > > > -
> > > >  static void test_seal_start_mprotect(bool seal)
> > > >  {
> > > >       void *ptr;
> > > > @@ -658,41 +634,6 @@ static void test_seal_mprotect_unalign_len_variant_2(bool seal)
> > > >       REPORT_TEST_PASS();
> > > >  }
> > > >
> > > > -static void test_seal_mprotect_two_vma(bool seal)
> > > > -{
> > > > -     void *ptr;
> > > > -     unsigned long page_size = getpagesize();
> > > > -     unsigned long size = 4 * page_size;
> > > > -     int ret;
> > > > -
> > > > -     setup_single_address(size, &ptr);
> > > > -     FAIL_TEST_IF_FALSE(ptr != (void *)-1);
> > > > -
> > > > -     /* use mprotect to split */
> > > > -     ret = sys_mprotect(ptr, page_size * 2, PROT_READ | PROT_WRITE);
> > > > -     FAIL_TEST_IF_FALSE(!ret);
> > > > -
> > > > -     if (seal) {
> > > > -             ret = seal_single_address(ptr, page_size * 4);
> > > > -             FAIL_TEST_IF_FALSE(!ret);
> > > > -     }
> > > > -
> > > > -     ret = sys_mprotect(ptr, page_size * 2, PROT_READ | PROT_WRITE);
> > > > -     if (seal)
> > > > -             FAIL_TEST_IF_FALSE(ret < 0);
> > > > -     else
> > > > -             FAIL_TEST_IF_FALSE(!ret);
> > > > -
> > > > -     ret = sys_mprotect(ptr + page_size * 2, page_size * 2,
> > > > -                     PROT_READ | PROT_WRITE);
> > > > -     if (seal)
> > > > -             FAIL_TEST_IF_FALSE(ret < 0);
> > > > -     else
> > > > -             FAIL_TEST_IF_FALSE(!ret);
> > > > -
> > > > -     REPORT_TEST_PASS();
> > > > -}
> > > > -
> > > >  static void test_seal_mprotect_two_vma_with_split(bool seal)
> > > >  {
> > > >       void *ptr;
> > > > @@ -1876,7 +1817,7 @@ int main(void)
> > > >       if (!pkey_supported())
> > > >               ksft_print_msg("PKEY not supported\n");
> > > >
> > > > -     ksft_set_plan(88);
> > > > +     ksft_set_plan(84);
> > > >
> > > >       test_seal_addseal();
> > > >       test_seal_unmapped_start();
> > > > @@ -1889,9 +1830,6 @@ int main(void)
> > > >       test_seal_zero_length();
> > > >       test_seal_twice();
> > > >
> > > > -     test_seal_mprotect(false);
> > > > -     test_seal_mprotect(true);
> > > > -
> > > >       test_seal_start_mprotect(false);
> > > >       test_seal_start_mprotect(true);
> > > >
> > > > @@ -1904,9 +1842,6 @@ int main(void)
> > > >       test_seal_mprotect_unalign_len_variant_2(false);
> > > >       test_seal_mprotect_unalign_len_variant_2(true);
> > > >
> > > > -     test_seal_mprotect_two_vma(false);
> > > > -     test_seal_mprotect_two_vma(true);
> > > > -
> > > >       test_seal_mprotect_two_vma_with_split(false);
> > > >       test_seal_mprotect_two_vma_with_split(true);
> > > >
> > > > --
> > > > 2.47.1.613.gc27f4b7a9f-goog
> > > >

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ