lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <239e7a04-c0fc-40f6-b383-627603f27a99@app.fastmail.com>
Date: Mon, 18 Dec 2023 09:27:35 -0800
From: "Stefan Roesch" <shr@...kernel.io>
To: "David Hildenbrand" <david@...hat.com>, kernel-team@...com
Cc: "Andrew Morton" <akpm@...ux-foundation.org>, hannes@...xchg.org,
 riel@...riel.com, linux-kernel@...r.kernel.org, linux-mm@...ck.org
Subject: Re: [PATCH v4 1/4] mm/ksm: add ksm advisor



On Mon, Dec 18, 2023, at 3:29 AM, David Hildenbrand wrote:
> On 13.12.23 19:27, Stefan Roesch wrote:
>> This adds the ksm advisor. The ksm advisor automatically manages the
>> pages_to_scan setting to achieve a target scan time. The target scan
>> time defines how many seconds it should take to scan all the candidate
>> KSM pages. In other words the pages_to_scan rate is changed by the
>> advisor to achieve the target scan time. The algorithm has a max and min
>> value to:
>> - guarantee responsiveness to changes
>> - limit CPU resource consumption
>> 
>> The respective parameters are:
>> - ksm_advisor_target_scan_time (how many seconds a scan should take)
>> - ksm_advisor_max_cpu (maximum value for cpu percent usage)
>> 
>> - ksm_advisor_min_pages (minimum value for pages_to_scan per batch)
>> - ksm_advisor_max_pages (maximum value for pages_to_scan per batch)
>> 
>> The algorithm calculates the change value based on the target scan time
>> and the previous scan time. To avoid pertubations an exponentially
>> weighted moving average is applied.
>> 
>> The advisor is managed by two main parameters: target scan time,
>> cpu max time for the ksmd background thread. These parameters determine
>> how aggresive ksmd scans.
>> 
>> In addition there are min and max values for the pages_to_scan parameter
>> to make sure that its initial and max values are not set too low or too
>> high. This ensures that it is able to react to changes quickly enough.
>> 
>> The default values are:
>> - target scan time: 200 secs
>> - max cpu: 70%
>> - min pages: 500
>> - max pages: 30000
>> 
>> By default the advisor is disabled. Currently there are two advisors:
>> none and scan-time.
>> 
>> Tests with various workloads have shown considerable CPU savings. Most
>> of the workloads I have investigated have more candidate pages during
>> startup, once the workload is stable in terms of memory, the number of
>> candidate pages is reduced. Without the advisor, the pages_to_scan needs
>> to be sized for the maximum number of candidate pages. So having this
>> advisor definitely helps in reducing CPU consumption.
>> 
>> For the instagram workload, the advisor achieves a 25% CPU reduction.
>> Once the memory is stable, the pages_to_scan parameter gets reduced to
>> about 40% of its max value.
>> 
>> Signed-off-by: Stefan Roesch <shr@...kernel.io>
>> ---
>>   mm/ksm.c | 161 ++++++++++++++++++++++++++++++++++++++++++++++++++++++-
>>   1 file changed, 160 insertions(+), 1 deletion(-)
>> 
>> diff --git a/mm/ksm.c b/mm/ksm.c
>> index 7efcc68ccc6ea..4f7b71a1f3112 100644
>> --- a/mm/ksm.c
>> +++ b/mm/ksm.c
>> @@ -21,6 +21,7 @@
>>   #include <linux/sched.h>
>>   #include <linux/sched/mm.h>
>>   #include <linux/sched/coredump.h>
>> +#include <linux/sched/cputime.h>
>>   #include <linux/rwsem.h>
>>   #include <linux/pagemap.h>
>>   #include <linux/rmap.h>
>> @@ -248,6 +249,9 @@ static struct kmem_cache *rmap_item_cache;
>>   static struct kmem_cache *stable_node_cache;
>>   static struct kmem_cache *mm_slot_cache;
>>   
>> +/* Default number of pages to scan per batch */
>> +#define DEFAULT_PAGES_TO_SCAN 100
>> +
>>   /* The number of pages scanned */
>>   static unsigned long ksm_pages_scanned;
>>   
>> @@ -276,7 +280,7 @@ static unsigned int ksm_stable_node_chains_prune_millisecs = 2000;
>>   static int ksm_max_page_sharing = 256;
>>   
>>   /* Number of pages ksmd should scan in one batch */
>> -static unsigned int ksm_thread_pages_to_scan = 100;
>> +static unsigned int ksm_thread_pages_to_scan = DEFAULT_PAGES_TO_SCAN;
>>   
>>   /* Milliseconds ksmd should sleep between batches */
>>   static unsigned int ksm_thread_sleep_millisecs = 20;
>> @@ -297,6 +301,155 @@ unsigned long ksm_zero_pages;
>>   /* The number of pages that have been skipped due to "smart scanning" */
>>   static unsigned long ksm_pages_skipped;
>>   
>> +/* Don't scan more than max pages per batch. */
>> +static unsigned long ksm_advisor_max_pages = 30000;
>> +
>> +/* At least scan this many pages per batch. */
>> +static unsigned long ksm_advisor_min_pages = 500;
>> +
>> +/* Min CPU for scanning pages per scan */
>> +static unsigned int ksm_advisor_min_cpu =  10;
>
> That will never be modified, right? Either mark it const or just turn it 
> into a define.
>


Changed it to a define.

> [...]
>
>> +/*
>> + * The scan time advisor is based on the current scan rate and the target
>> + * scan rate.
>> + *
>> + *      new_pages_to_scan = pages_to_scan * (scan_time / target_scan_time)
>> + *
>> + * To avoid perturbations it calculates a change factor of previous changes.
>> + * A new change factor is calculated for each iteration and it uses an
>> + * exponentially weighted moving average. The new pages_to_scan value is
>> + * multiplied with that change factor:
>> + *
>> + *      new_pages_to_scan *= change facor
>> + *
>> + * The new_pages_to_scan value is limited by the cpu min and max values. It
>> + * calculates the cpu percent for the last scan and calculates the new
>> + * estimated cpu percent cost for the next scan. That value is capped by the
>> + * cpu min and max setting.
>> + *
>> + * In addition the new pages_to_scan value is capped by the max and min
>> + * limits.
>> + */
>> +static void scan_time_advisor(void)
>> +{
>> +	unsigned int cpu_percent;
>> +	unsigned long cpu_time;
>> +	unsigned long cpu_time_diff;
>> +	unsigned long cpu_time_diff_ms;
>> +	unsigned long pages;
>> +	unsigned long per_page_cost;
>> +	unsigned long factor;
>> +	unsigned long change;
>> +	unsigned long last_scan_time;
>> +	unsigned long scan_time;
>> +
>> +	/* Convert scan time to seconds */
>> +	scan_time = div_s64(ktime_ms_delta(ktime_get(), advisor_ctx.start_scan),
>> +			    MSEC_PER_SEC);
>> +	scan_time = scan_time ? scan_time : 1;
>> +
>> +	/* Calculate CPU consumption of ksmd background thread */
>> +	cpu_time = task_sched_runtime(current);
>> +	cpu_time_diff = cpu_time - advisor_ctx.cpu_time;
>> +	cpu_time_diff_ms = cpu_time_diff / 1000 / 1000;
>> +
>> +	cpu_percent = (cpu_time_diff_ms * 100) / (scan_time * 1000);
>> +	cpu_percent = cpu_percent ? cpu_percent : 1;
>> +	last_scan_time = prev_scan_time(&advisor_ctx, scan_time);
>
> I'd simply inline prev_scan_time() here and get rid of it. Whatever you 
> think is best.
>

I think prev_scan_time is a bit more expressive.

>
> Acked-by: David Hildenbrand <david@...hat.com>
>
> -- 
> Cheers,
>
> David / dhildenb

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ