linux-kernel - Re: [PATCH v10 0/5] Introduce mseal

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJuCfpE7_4B2v1Fw+6RHWJwM99+pgOPT80ntRSW-1nSTbDxPyg@mail.gmail.com>
Date: Fri, 19 Apr 2024 16:54:13 +0000
From: Suren Baghdasaryan <surenb@...gle.com>
To: Jeff Xu <jeffxu@...omium.org>
Cc: "Liam R. Howlett" <Liam.Howlett@...cle.com>, akpm@...ux-foundation.org, 
	keescook@...omium.org, jannh@...gle.com, sroettger@...gle.com, 
	willy@...radead.org, gregkh@...uxfoundation.org, 
	torvalds@...ux-foundation.org, usama.anjum@...labora.com, corbet@....net, 
	merimus@...gle.com, rdunlap@...radead.org, jeffxu@...gle.com, 
	jorgelo@...omium.org, groeck@...omium.org, linux-kernel@...r.kernel.org, 
	linux-kselftest@...r.kernel.org, linux-mm@...ck.org, pedro.falcato@...il.com, 
	dave.hansen@...el.com, linux-hardening@...r.kernel.org, deraadt@...nbsd.org
Subject: Re: [PATCH v10 0/5] Introduce mseal

On Fri, Apr 19, 2024 at 3:15 PM Jeff Xu <jeffxu@...omium.org> wrote:
>
> On Fri, Apr 19, 2024 at 7:57 AM Suren Baghdasaryan <surenb@...glecom> wrote:
> >
> > On Thu, Apr 18, 2024 at 6:22 PM Jeff Xu <jeffxu@...omium.org> wrote:
> > >
> > > On Thu, Apr 18, 2024 at 1:19 PM Suren Baghdasaryan <surenb@...gle.com> wrote:
> > > >
> > > > On Tue, Apr 16, 2024 at 12:40 PM Jeff Xu <jeffxu@...omium.org> wrote:
> > > > >
> > > > > On Tue, Apr 16, 2024 at 8:13 AM Liam R. Howlett <Liam.Howlett@...cle.com> wrote:
> > > > > >
> > > > > > * jeffxu@...omium.org <jeffxu@...omium.org> [240415 12:35]:
> > > > > > > From: Jeff Xu <jeffxu@...omium.org>
> > > > > > >
> > > > > > > This is V10 version, it rebases v9 patch to 6.9.rc3.
> > > > > > > We also applied and tested mseal() in chrome and chromebook.
> > > > > > >
> > > > > > > ------------------------------------------------------------------
> > > > > > ...
> > > > > >
> > > > > > > MM perf benchmarks
> > > > > > > ==================
> > > > > > > This patch adds a loop in the mprotect/munmap/madvise(DONTNEED) to
> > > > > > > check the VMAs’ sealing flag, so that no partial update can be made,
> > > > > > > when any segment within the given memory range is sealed.
> > > > > > >
> > > > > > > To measure the performance impact of this loop, two tests are developed.
> > > > > > > [8]
> > > > > > >
> > > > > > > The first is measuring the time taken for a particular system call,
> > > > > > > by using clock_gettime(CLOCK_MONOTONIC). The second is using
> > > > > > > PERF_COUNT_HW_REF_CPU_CYCLES (exclude user space). Both tests have
> > > > > > > similar results.
> > > > > > >
> > > > > > > The tests have roughly below sequence:
> > > > > > > for (i = 0; i < 1000, i++)
> > > > > > >     create 1000 mappings (1 page per VMA)
> > > > > > >     start the sampling
> > > > > > >     for (j = 0; j < 1000, j++)
> > > > > > >         mprotect one mapping
> > > > > > >     stop and save the sample
> > > > > > >     delete 1000 mappings
> > > > > > > calculates all samples.
> > > > > >
> > > > > >
> > > > > > Thank you for doing this performance testing.
> > > > > >
> > > > > > >
> > > > > > > Below tests are performed on Intel(R) Pentium(R) Gold 7505 @ 2.00GHz,
> > > > > > > 4G memory, Chromebook.
> > > > > > >
> > > > > > > Based on the latest upstream code:
> > > > > > > The first test (measuring time)
> > > > > > > syscall__     vmas    t       t_mseal delta_ns        per_vma %
> > > > > > > munmap__      1       909     944     35      35      104%
> > > > > > > munmap__      2       1398    1502    104     52      107%
> > > > > > > munmap__      4       2444    2594    149     37      106%
> > > > > > > munmap__      8       4029    4323    293     37      107%
> > > > > > > munmap__      16      6647    6935    288     18      104%
> > > > > > > munmap__      32      11811   12398   587     18      105%
> > > > > > > mprotect      1       439     465     26      26      106%
> > > > > > > mprotect      2       1659    1745    86      43      105%
> > > > > > > mprotect      4       3747    3889    142     36      104%
> > > > > > > mprotect      8       6755    6969    215     27      103%
> > > > > > > mprotect      16      13748   14144   396     25      103%
> > > > > > > mprotect      32      27827   28969   1142    36      104%
> > > > > > > madvise_      1       240     262     22      22      109%
> > > > > > > madvise_      2       366     442     76      38      121%
> > > > > > > madvise_      4       623     751     128     32      121%
> > > > > > > madvise_      8       1110    1324    215     27      119%
> > > > > > > madvise_      16      2127    2451    324     20      115%
> > > > > > > madvise_      32      4109    4642    534     17      113%
> > > > > > >
> > > > > > > The second test (measuring cpu cycle)
> > > > > > > syscall__     vmas    cpu     cmseal  delta_cpu       per_vma %
> > > > > > > munmap__      1       1790    1890    100     100     106%
> > > > > > > munmap__      2       2819    3033    214     107     108%
> > > > > > > munmap__      4       4959    5271    312     78      106%
> > > > > > > munmap__      8       8262    8745    483     60      106%
> > > > > > > munmap__      16      13099   14116   1017    64      108%
> > > > > > > munmap__      32      23221   24785   1565    49      107%
> > > > > > > mprotect      1       906     967     62      62      107%
> > > > > > > mprotect      2       3019    3203    184     92      106%
> > > > > > > mprotect      4       6149    6569    420     105     107%
> > > > > > > mprotect      8       9978    10524   545     68      105%
> > > > > > > mprotect      16      20448   21427   979     61      105%
> > > > > > > mprotect      32      40972   42935   1963    61      105%
> > > > > > > madvise_      1       434     497     63      63      115%
> > > > > > > madvise_      2       752     899     147     74      120%
> > > > > > > madvise_      4       1313    1513    200     50      115%
> > > > > > > madvise_      8       2271    2627    356     44      116%
> > > > > > > madvise_      16      4312    4883    571     36      113%
> > > > > > > madvise_      32      8376    9319    943     29      111%
> > > > > > >
> > > > > >
> > > > > > If I am reading this right, madvise() is affected more than the other
> > > > > > calls?  Is that expected or do we need to have a closer look?
> > > > > >
> > > > > The madvise() has a bigger percentage (per_vma %), but it also has a
> > > > > smaller base value (cpu).
> > > >
> > > > Sorry, it's unclear to me what the "vmas" column denotes. Is that how
> > > > many VMAs were created before timing the syscall? If so, then 32 is
> > > > the max that you show here while you seem to have tested with 1000
> > > > VMAs. What is the overhead with 1000 VMAs?
> > >
> > > The vmas column is the number of VMA used in one call.
> > >
> > > For example: for 32 and mprotect(ptr,size), the memory range used in
> > > mprotect has 32 VMAs.
> >
> > Ok, so the 32 here denotes how many VMAs one mprotect() call spans?
> >
> Yes.
>
> > >
> > > It also matters how many memory ranges are in-use at the time of the
> > > test, This is where 1000 comes in. The test creates 1000 memory
> > > ranges, each memory range has 32 vmas, then calls mprotect on the 1000
> > > memory range. (the pseudocode was included in the original email)
> >
> > So, if each range has 32 vmas and you have 1000 ranges then you are
> > creating 32000 vmas? Sorry, your pseudocode does not clarify that. My
> > current understanding is this:
> >
> > for (i = 0; i < 1000, i++)
> >     mmap N*1000 areas (N=[1-32])
> >     start the sampling
> >     for (j = 0; j < 1000, j++)
> >         mprotect N areas with one syscall
> >     stop and save the sample
> >     munmap N*1000 areas
> > calculates all samples.
> >
> > Is that correct?
> >
> Yes, There will be 32000 VMA in the system.
>
> The pseudocode is correct in concept.
> The test implementation is slightly different, it uses mprotect to
> split the memory and make sure the VMAs  doesn't merge. For detail,
> the reference [8]  of the original email link to the test code.

Ok, thanks for clarifications. I don't think the overhead is high
enough to worry about.
Thanks,
Suren.


>
> -Jeff