[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAA1CXcDmhQErLnEvoVE2_8EeY0EGTBVWSTC4UXP5F2n-JOgvfw@mail.gmail.com>
Date: Wed, 14 Dec 2022 18:05:44 -0700
From: Nico Pache <npache@...hat.com>
To: Sidhartha Kumar <sidhartha.kumar@...cle.com>
Cc: Mike Kravetz <mike.kravetz@...cle.com>,
linux-kernel@...r.kernel.org, linux-mm@...ck.org,
muchun.song@...ux.dev, akpm@...ux-foundation.org,
willy@...radead.org, gerald.schaefer@...ux.ibm.com
Subject: Re: [RFC V2] mm: add the zero case to page[1].compound_nr in set_compound_order
On Tue, Dec 13, 2022 at 11:38 PM Sidhartha Kumar
<sidhartha.kumar@...cle.com> wrote:
>
> On 12/13/22 5:02 PM, Mike Kravetz wrote:
> > On 12/13/22 17:27, Nico Pache wrote:
> >> According to the document linked the following approach is even faster
> >> than the one I used due to CPU parallelization:
> >
> > I do not think we are very concerned with speed here. This routine is being
> > called in the creation of compound pages, and in the case of hugetlb the
> > tear down of gigantic pages. In general, creation and tear down of gigantic
> > pages happens infrequently. Usually only at system/application startup and
> > system/application shutdown.
> >
> Hi Nico,
>
> I wrote a bpftrace script to track the time spent in
> __prep_compound_gigantic_folio both with and without the branch in
> folio_set_order() and resulting histogram was the same for both
> versions. This is probably because the for loop through every base page
> has a much higher overhead than the singular call to folio_set_order().
> I am not sure what the performance difference for THP would be.
Hi Sidhartha,
Ok great! We may want to proactively implement a branchless version so
once/if THP comes around to utilizing this we won't see a regression.
Furthermore, Waiman brought up a good point off the list:
This bitmath is needlessly complex and can be achieved with
page[1].compound_nr = (1U << order) & ~1U;
Tested:
order 0 output : 0
order 1 output : 2
order 2 output : 4
order 3 output : 8
order 4 output : 16
order 5 output : 32
order 6 output : 64
order 7 output : 128
order 8 output : 256
order 9 output : 512
order 10 output : 1024
> Below is the script.
> Thanks,
> Sidhartha Kumar
Thanks for the script!!
Cheers,
-- Nico
> k:__prep_compound_gigantic_folio
> {
> @prep_start[pid] = nsecs;
> }
>
> kr:__prep_compound_gigantic_folio
> {
> @prep_nsecs = hist((nsecs - @prep_start[pid]));
> delete(@prep_start[pid]);
> }
Powered by blists - more mailing lists