lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 14 Dec 2022 18:05:44 -0700
From:   Nico Pache <npache@...hat.com>
To:     Sidhartha Kumar <sidhartha.kumar@...cle.com>
Cc:     Mike Kravetz <mike.kravetz@...cle.com>,
        linux-kernel@...r.kernel.org, linux-mm@...ck.org,
        muchun.song@...ux.dev, akpm@...ux-foundation.org,
        willy@...radead.org, gerald.schaefer@...ux.ibm.com
Subject: Re: [RFC V2] mm: add the zero case to page[1].compound_nr in set_compound_order

On Tue, Dec 13, 2022 at 11:38 PM Sidhartha Kumar
<sidhartha.kumar@...cle.com> wrote:
>
> On 12/13/22 5:02 PM, Mike Kravetz wrote:
> > On 12/13/22 17:27, Nico Pache wrote:
> >> According to the document linked the following approach is even faster
> >> than the one I used due to CPU parallelization:
> >
> > I do not think we are very concerned with speed here.  This routine is being
> > called in the creation of compound pages, and in the case of hugetlb the
> > tear down of gigantic pages.  In general, creation and tear down of gigantic
> > pages happens infrequently.  Usually only at system/application startup and
> > system/application shutdown.
> >
> Hi Nico,
>
> I wrote a bpftrace script to track the time spent in
> __prep_compound_gigantic_folio both with and without the branch in
> folio_set_order() and resulting histogram was the same for both
> versions. This is probably because the for loop through every base page
> has a much higher overhead than the singular call to folio_set_order().
> I am not sure what the performance difference for THP would be.

Hi Sidhartha,

Ok great! We may want to proactively implement a branchless version so
once/if THP comes around to utilizing this we won't see a regression.

Furthermore, Waiman brought up a good point off the list:
This bitmath is needlessly complex and can be achieved with
           page[1].compound_nr = (1U << order) & ~1U;

Tested:
order 0 output : 0
order 1 output : 2
order 2 output : 4
order 3 output : 8
order 4 output : 16
order 5 output : 32
order 6 output : 64
order 7 output : 128
order 8 output : 256
order 9 output : 512
order 10 output : 1024


> Below is the script.
> Thanks,
> Sidhartha Kumar

Thanks for the script!!
Cheers,
-- Nico

> k:__prep_compound_gigantic_folio
> {
>          @prep_start[pid] = nsecs;
> }
>
> kr:__prep_compound_gigantic_folio
> {
>          @prep_nsecs = hist((nsecs - @prep_start[pid]));
>          delete(@prep_start[pid]);
> }

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ