[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <fb2fb91c-1e54-a4ee-bf69-299e9114ae1e@oracle.com>
Date: Tue, 13 Dec 2022 22:38:15 -0800
From: Sidhartha Kumar <sidhartha.kumar@...cle.com>
To: Mike Kravetz <mike.kravetz@...cle.com>,
Nico Pache <npache@...hat.com>
Cc: linux-kernel@...r.kernel.org, linux-mm@...ck.org,
muchun.song@...ux.dev, akpm@...ux-foundation.org,
willy@...radead.org, gerald.schaefer@...ux.ibm.com
Subject: Re: [RFC V2] mm: add the zero case to page[1].compound_nr in
set_compound_order
On 12/13/22 5:02 PM, Mike Kravetz wrote:
> On 12/13/22 17:27, Nico Pache wrote:
>> According to the document linked the following approach is even faster
>> than the one I used due to CPU parallelization:
>
> I do not think we are very concerned with speed here. This routine is being
> called in the creation of compound pages, and in the case of hugetlb the
> tear down of gigantic pages. In general, creation and tear down of gigantic
> pages happens infrequently. Usually only at system/application startup and
> system/application shutdown.
>
Hi Nico,
I wrote a bpftrace script to track the time spent in
__prep_compound_gigantic_folio both with and without the branch in
folio_set_order() and resulting histogram was the same for both
versions. This is probably because the for loop through every base page
has a much higher overhead than the singular call to folio_set_order().
I am not sure what the performance difference for THP would be.
@prep_nsecs:
[1M, 2M)
50|@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
Below is the script.
Thanks,
Sidhartha Kumar
k:__prep_compound_gigantic_folio
{
@prep_start[pid] = nsecs;
}
kr:__prep_compound_gigantic_folio
{
@prep_nsecs = hist((nsecs - @prep_start[pid]));
delete(@prep_start[pid]);
}
> I think the only case where we 'might' be concerned with speed is in the
> creation of compound pages for THP. Do note that this code path is
> still using set_compound_order as it has not been converted to folios.
Powered by blists - more mailing lists