lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <60eb2c69-8644-45de-9499-a57dc6dbab61@redhat.com>
Date:   Tue, 17 Oct 2023 17:28:35 +0200
From:   David Hildenbrand <david@...hat.com>
To:     Stefan Roesch <shr@...kernel.io>
Cc:     kernel-team@...com, akpm@...ux-foundation.org, hannes@...xchg.org,
        riel@...riel.com, linux-kernel@...r.kernel.org, linux-mm@...ck.org
Subject: Re: [PATCH v1 0/4] mm/ksm: Add ksm advisor

On 10.10.23 18:02, Stefan Roesch wrote:
> 
> David Hildenbrand <david@...hat.com> writes:
> 
>> On 06.10.23 18:17, Stefan Roesch wrote:
>>> David Hildenbrand <david@...hat.com> writes:
>>>
>>>> On 04.10.23 21:02, Stefan Roesch wrote:
>>>>> What is the KSM advisor?
>>>>> =========================
>>>>> The ksm advisor automatically manages the pages_to_scan setting to
>>>>> achieve a target scan time. The target scan time defines how many seconds
>>>>> it should take to scan all the candidate KSM pages. In other words the
>>>>> pages_to_scan rate is changed by the advisor to achieve the target scan
>>>>> time.
>>>>> Why do we need a KSM advisor?
>>>>> ==============================
>>>>> The number of candidate pages for KSM is dynamic. It can often be observed
>>>>> that during the startup of an application more candidate pages need to be
>>>>> processed. Without an advisor the pages_to_scan parameter needs to be
>>>>> sized for the maximum number of candidate pages. With the scan time
>>>>> advisor the pages_to_scan parameter based can be changed based on demand.
>>>>> Algorithm
>>>>> ==========
>>>>> The algorithm calculates the change value based on the target scan time
>>>>> and the previous scan time. To avoid pertubations an exponentially
>>>>> weighted moving average is applied.
>>>>> The algorithm has a max and min
>>>>> value to:
>>>>> - guarantee responsiveness to changes
>>>>> - to avoid to spend too much CPU
>>>>> Parameters to influence the KSM scan advisor
>>>>> =============================================
>>>>> The respective parameters are:
>>>>> - ksm_advisor_mode
>>>>>      0: None (default), 1: scan time advisor
>>>>> - ksm_advisor_target_scan_time
>>>>>      how many seconds a scan should of all candidate pages take
>>>>> - ksm_advisor_min_pages
>>>>>      minimum value for pages_to_scan per batch
>>>>> - ksm_advisor_max_pages
>>>>>      maximum value for pages_to_scan per batch
>>>>> The parameters are exposed as knobs in /sys/kernel/mm/ksm.
>>>>> By default the scan time advisor is disabled.
>>>>
>>>> What would be the main reason to not have this enabled as default?
>>>>
>>> There might be already exisiting users which directly set pages_to_scan
>>> and tuned the KSM settings accordingly, as the default setting of 100 for
>>> pages_to_scan is too low for typical workloads.
>>
>> Good point.
>>
>>>
>>>> IIUC, it is kind-of an auto-tuning of pages_to_scan. Would "auto-tuning"
>>>> describe it better than "advisor" ?
>>>>
>>>> [...]
>>>>
>>> I'm fine with auto-tune. I was also thinking about that name, but I
>>> chose advisor, its a bit less strong and it needs input from the user.
>>>
>>
>> I'm not a native speaker, but "adviser" to me implies that no action is taken,
>> only advises are given :) But again, no native speaker.
>>
>>>>> How is defining a target scan time better?
>>>>> ===========================================
>>>>> For an administrator it is more logical to set a target scan time.. The
>>>>> administrator can determine how many pages are scanned on each scan.
>>>>> Therefore setting a target scan time makes more sense.
>>>>> In addition the administrator might have a good idea about the
>>>>> memory sizing of its respective workloads.
>>>>
>>>> Is there any way you could imagine where we could have this just do something
>>>> reasonable without any user input? IOW, true auto-tuning?
>>>>
>>> True auto-tuning might be difficult as users might want to be able to
>>> choose how aggressive KSM is. Some might want it to be as aggressive as
>>> possible to get the maximum de-duplication rate. Others might want a
>>> more balanced approach that takes CPU-consumption into consideration.
>>> I guess it depends if you are memory-bound, cpu-bound or both.
>>
>> Agreed, more below.
>>
>>>
>>>> I read above:
>>>>> - guarantee responsiveness to changes
>>>>> - to avoid to spend too much CPU
>>>>
>>>> whereby both things are accountable/measurable to use that as the input for
>>>> auto-tuning?
>>>>
>>> I'm not sure a true auto-tuning can be achieved. I think we need
>>> some input from the user
>>> - How much resources to consume
>>> - How fast memory changes or how stable memory is
>>>     (this we might be able to detect)
>>
>> Setting the pages_to_scan is a bit mystical. Setting upper/lower pages_to_scan
>> bounds is similarly mystical, and highly workload dependent.
>>
>> So I agree that a better abstraction to automatically tune the scanning is
>> reasonable. I wonder if we can let the user give better inputs that are less
>> workload dependent.
>>
>> For example, do we need min/max values for pages_to_scan, or can we replace it
>> by something better to the auto-tuning algorithm?
>>
>> IMHO "target scan time" goes into the right direction, but it can still be
>> fairly workload dependent. Maybe a "max CPU consumption" or sth. like that would
>> similarly help to limit CPU waste, and it could be fairly workload dependent.
> 
> I can look into replacing min/max values for pages_to_scan with min/max
> cpu utilization. This might be easier for users to decide on. However I
> still think that we need a target value like scan time to optimize for.

Agreed, it can't be completely automatic.

-- 
Cheers,

David / dhildenb

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ