lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d33fce4d-6018-0235-5391-debc8974eda5@oracle.com>
Date:   Tue, 24 Sep 2019 08:11:09 -0600
From:   Khalid Aziz <khalid.aziz@...cle.com>
To:     Vlastimil Babka <vbabka@...e.cz>, Nitin Gupta <nigupta@...dia.com>,
        "dan.j.williams@...el.com" <dan.j.williams@...el.com>,
        "mhocko@...e.com" <mhocko@...e.com>,
        "mgorman@...hsingularity.net" <mgorman@...hsingularity.net>,
        "akpm@...ux-foundation.org" <akpm@...ux-foundation.org>
Cc:     "cai@....pw" <cai@....pw>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "linux-mm@...ck.org" <linux-mm@...ck.org>,
        "gregkh@...uxfoundation.org" <gregkh@...uxfoundation.org>,
        "aryabinin@...tuozzo.com" <aryabinin@...tuozzo.com>,
        "jannh@...gle.com" <jannh@...gle.com>, "guro@...com" <guro@...com>,
        "hannes@...xchg.org" <hannes@...xchg.org>,
        "keescook@...omium.org" <keescook@...omium.org>,
        "yuzhao@...gle.com" <yuzhao@...gle.com>,
        "arunks@...eaurora.org" <arunks@...eaurora.org>,
        "willy@...radead.org" <willy@...radead.org>,
        "janne.huttunen@...ia.com" <janne.huttunen@...ia.com>,
        "khlebnikov@...dex-team.ru" <khlebnikov@...dex-team.ru>
Subject: Re: [RFC] mm: Proactive compaction

On 9/24/19 7:39 AM, Vlastimil Babka wrote:
> On 9/20/19 1:37 AM, Nitin Gupta wrote:
>> On Tue, 2019-08-20 at 10:46 +0200, Vlastimil Babka wrote:
>>>
>>> That's a lot of control knobs - how is an admin supposed to tune them to
>>> their
>>> needs?
>>
>>
>> Yes, it's difficult for an admin to get so many tunable right unless
>> targeting a very specific workload.
>>
>> How about a simpler solution where we exposed just one tunable per-node:
>>    /sys/.../node-x/compaction_effort
>> which accepts [0, 100]
>>
>> This parallels /proc/sys/vm/swappiness but for compaction. With this
>> single number, we can estimate per-order [low, high] watermarks for external
>> fragmentation like this:
>>  - For now, map this range to [low, medium, high] which correponds to specific
>> low, high thresholds for extfrag.
>>  - Apply more relaxed thresholds for higher-order than for lower orders.
>>
>> With this single tunable we remove the burden of setting per-order explicit
>> [low, high] thresholds and it should be easier to experiment with.
> 
> What about instead autotuning by the numbers of allocations hitting
> direct compaction recently? IIRC there were attempts in the past (myself
> included) and recently Khalid's that was quite elaborated.
> 

I do think the right way forward with this longstanding problem is to
take the burden of managing free memory away from end user and let the
kernel autotune itself to the demands of workload. We can start with a
simpler algorithm in the kernel that adapts to workload and refine it as
we move forward. As long as initial implementation performs at least as
well as current free page management, we have a workable path for
improvements. I am moving the implementation I put together in kernel to
a userspace daemon just to test it out on larger variety of workloads.
It is more limited in userspace with limited access to statistics the
algorithm needs to perform trend analysis so I would rather be doing
this in the kernel.

--
Khalid

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ