lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <d2e0c1a4-1ef1-d895-300b-179d33b83b41@redhat.com>
Date:   Wed, 23 Jun 2021 10:43:48 +1000
From:   Gavin Shan <gshan@...hat.com>
To:     Alexander Duyck <alexander.duyck@...il.com>
Cc:     linux-mm <linux-mm@...ck.org>, LKML <linux-kernel@...r.kernel.org>,
        David Hildenbrand <david@...hat.com>,
        "Michael S. Tsirkin" <mst@...hat.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Anshuman Khandual <anshuman.khandual@....com>,
        Catalin Marinas <catalin.marinas@....com>,
        Will Deacon <will@...nel.org>, shan.gavin@...il.com
Subject: Re: [PATCH v2 2/3] mm/page_reporting: Allow driver to specify
 threshold

On 6/23/21 3:39 AM, Alexander Duyck wrote:
> On Mon, Jun 21, 2021 at 10:48 PM Gavin Shan <gshan@...hat.com> wrote:
>>
>> The page reporting threshold is currently sticky to @pageblock_order.
>> The page reporting can never be triggered because the freeing page
>> can't come up with a free area like that huge. The situation becomes
>> worse when the system memory becomes heavily fragmented.
>>
>> For example, the following configurations are used on ARM64 when 64KB
>> base page size is enabled. In this specific case, the page reporting
>> won't be triggered until the freeing page comes up with a 512MB free
>> area. That's hard to be met, especially when the system memory becomes
>> heavily fragmented.
>>
>>     PAGE_SIZE:          64KB
>>     HPAGE_SIZE:         512MB
>>     pageblock_order:    13       (512MB)
>>     MAX_ORDER:          14
>>
>> This allows the drivers to specify the threshold when the page
>> reporting device is registered. The threshold falls back to
>> @pageblock_order if it's not specified by the driver. The existing
>> users (hv_balloon and virtio_balloon) don't specify the threshold
>> and @pageblock_order is still taken as their page reporting order.
>> So this shouldn't introduce functional changes.
>>
>> Signed-off-by: Gavin Shan <gshan@...hat.com>
>> ---
>>   include/linux/page_reporting.h |  3 +++
>>   mm/page_reporting.c            | 14 ++++++++++----
>>   mm/page_reporting.h            | 10 ++--------
>>   3 files changed, 15 insertions(+), 12 deletions(-)
>>
>> diff --git a/include/linux/page_reporting.h b/include/linux/page_reporting.h
>> index 3b99e0ec24f2..fe648dfa3a7c 100644
>> --- a/include/linux/page_reporting.h
>> +++ b/include/linux/page_reporting.h
>> @@ -18,6 +18,9 @@ struct page_reporting_dev_info {
>>
>>          /* Current state of page reporting */
>>          atomic_t state;
>> +
>> +       /* Minimal order of page reporting */
>> +       unsigned int order;
>>   };
>>
>>   /* Tear-down and bring-up for page reporting devices */
>> diff --git a/mm/page_reporting.c b/mm/page_reporting.c
>> index df9c5054e1b4..27670360bae6 100644
>> --- a/mm/page_reporting.c
>> +++ b/mm/page_reporting.c
> 
> <snip>
> 
>> @@ -324,6 +324,12 @@ int page_reporting_register(struct page_reporting_dev_info *prdev)
>>                  goto err_out;
>>          }
>>
>> +       /*
>> +        * We need to choose the minimal order of page reporting if it's
>> +        * not specified by the driver.
>> +        */
>> +       prdev->order = prdev->order ? prdev->order : pageblock_order;
>> +
>>          /* initialize state and work structures */
>>          atomic_set(&prdev->state, PAGE_REPORTING_IDLE);
>>          INIT_DELAYED_WORK(&prdev->work, &page_reporting_process);
> 
> Rather than using prdev->order directly it might be better to have a
> reporting_order value you could export for use by
> page_reporting_notify_free. That way you avoid the overhead of having
> to make a function call per page freed.
> 

Yes, I obviously missed the point to reduce the overhead because of
function call. In next revision, I will introduce @page_reporting_order
for this. Besides, it will be exported as a module parameter so that
it can be changed dynamically, as David suggested before.

>> diff --git a/mm/page_reporting.h b/mm/page_reporting.h
>> index 2c385dd4ddbd..d9f972e72649 100644
>> --- a/mm/page_reporting.h
>> +++ b/mm/page_reporting.h
>> @@ -10,11 +10,9 @@
>>   #include <linux/pgtable.h>
>>   #include <linux/scatterlist.h>
>>
>> -#define PAGE_REPORTING_MIN_ORDER       pageblock_order
>> -
>>   #ifdef CONFIG_PAGE_REPORTING
>>   DECLARE_STATIC_KEY_FALSE(page_reporting_enabled);
>> -void __page_reporting_notify(void);
>> +void __page_reporting_notify(unsigned int order);
>>
>>   static inline bool page_reported(struct page *page)
>>   {
>> @@ -37,12 +35,8 @@ static inline void page_reporting_notify_free(unsigned int order)
>>          if (!static_branch_unlikely(&page_reporting_enabled))
>>                  return;
>>
>> -       /* Determine if we have crossed reporting threshold */
>> -       if (order < PAGE_REPORTING_MIN_ORDER)
>> -               return;
>> -
>>          /* This will add a few cycles, but should be called infrequently */
>> -       __page_reporting_notify();
>> +       __page_reporting_notify(order);
>>   }
>>   #else /* CONFIG_PAGE_REPORTING */
>>   #define page_reported(_page)   false
> 
> With us making the function call per page freed we are likely to have
> a much more significant impact on performance with page reporting
> enabled. Ideally we want to limit this impact so that we only take the
> cost for the conditional check on the lower order pages.
> 

Yep, thanks for the explanation, Alex.

Thanks,
Gavin

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ