linux-kernel - RE: [PATCH V2] mm: compaction: support triggering of proactive compaction by user

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <BYAPR12MB34163A80AD9567746F7904CDD8239@BYAPR12MB3416.namprd12.prod.outlook.com>
Date:   Thu, 27 May 2021 23:52:30 +0000
From:   Nitin Gupta <nigupta@...dia.com>
To:     Charan Teja Kalla <charante@...eaurora.org>,
        "akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
        "mcgrof@...nel.org" <mcgrof@...nel.org>,
        "keescook@...omium.org" <keescook@...omium.org>,
        "yzaikin@...gle.com" <yzaikin@...gle.com>,
        "vbabka@...e.cz" <vbabka@...e.cz>,
        "bhe@...hat.com" <bhe@...hat.com>,
        "mateusznosek0@...il.com" <mateusznosek0@...il.com>,
        "sh_def@....com" <sh_def@....com>,
        "iamjoonsoo.kim@....com" <iamjoonsoo.kim@....com>,
        "vinmenon@...eaurora.org" <vinmenon@...eaurora.org>
CC:     "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "linux-mm@...ck.org" <linux-mm@...ck.org>,
        "linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>
Subject: RE: [PATCH V2] mm: compaction: support triggering of proactive
 compaction by user



> -----Original Message-----
> From: charante=codeaurora.org@...codeaurora.org
> <charante=codeaurora.org@...codeaurora.org> On Behalf Of Charan Teja
> Kalla
> Sent: Thursday, May 27, 2021 2:28 AM
> To: Nitin Gupta <nigupta@...dia.com>; akpm@...ux-foundation.org;
> mcgrof@...nel.org; keescook@...omium.org; yzaikin@...gle.com;
> vbabka@...e.cz; bhe@...hat.com; mateusznosek0@...il.com;
> sh_def@....com; iamjoonsoo.kim@....com; vinmenon@...eaurora.org
> Cc: linux-kernel@...r.kernel.org; linux-mm@...ck.org; linux-
> fsdevel@...r.kernel.org
> Subject: Re: [PATCH V2] mm: compaction: support triggering of proactive
> compaction by user
> 
> External email: Use caution opening links or attachments
> 
> 
> Thanks Nitin for your inputs!!
> 
> On 5/26/2021 2:05 AM, Nitin Gupta wrote:
> > The proactive compaction[1] gets triggered for every 500msec and run
> > compaction on the node for COMPACTION_HPAGE_ORDER (usually order-
> 9)
> > pages based on the value set to sysctl.compaction_proactiveness.
> > Triggering the compaction for every 500msec in search of
> >
> > COMPACTION_HPAGE_ORDER pages is not needed for all applications,
> >> especially on the embedded system usecases which may have few MB's of
> >> RAM. Enabling the proactive compaction in its state will endup in
> >> running almost always on such systems.
> >>
> > You can disable proactive compaction by setting
> sysctl.compaction_proactiveness to 0.
> 
> Agree. But proactive compaction got its own uses too like it knows when to
> stop the compaction, instead of simply doing the full node compaction, thus
> we don't want to disable it always.
> 
> >
> >
> >> As an example, say app
> >> launcher decide to launch the memory heavy application which can be
> >> launched fast if it gets more higher order pages thus launcher can
> >> prepare the system in advance by triggering the proactive compaction
> >> from userspace.
> >>
> > You can always do: echo 1 > /proc/sys/vm/compact_memory On a small
> > system, this should not take much time.
> 
> Hmm... With 3GB Snapdragon system, we have observed that write to
> compact_memory is taking peak time of 400+msec, could be that
> MIGRATE_SYNC on a full node is causing this peak, which is much time.
> 
>
> >
> > Hijacking proactive compaction for one-off compaction (say, before a
> > large app launch) does not sound right to me.
> 
> Actually we are using the proactive compaction to 'just prepare the system
> before asking for huge memory' as compact_memory can take longer and is
> not controllable like proactive compaction.
> 
> In the V1 of this patch, we actually created a /proc interface(similar to
> compact_memory), providing a way to trigger the proactive compaction from
> user space. https://lore.kernel.org/patchwork/patch/1417064/. But since
> this involved a new /proc interface addition, in V2 we just implemented an
> alternative way to it.
> 
> Another problem, I think, this patch tried to address is that, in the existing
> implementation it is not guaranteed the user set value of
> compaction_proactiveness is effective unless atleast
> HPAGE_FRAG_CHECK_INTERVAL_MSEC(500msec) is elapsed, Right? Does this
> seems correct provided we had given this user interface and can't specified
> any where when this value will be effective(where it comes into effect in the
> next compact thread wake up for proactive compaction).
> 
> Consider the below testcase where a user thinks that the application he is
> going to run is performance critical thus decides to do the below steps:
> 1) Save the present the compaction_proactiveness (Say it is zero thus
> disabled)
> 2) Set the compaction_proactiveness to 100.
> 3) Allocate memory for the application.
> 4) Restore the compaction_proactiveness.(set to disabled again)
> 5) Then proactive compaction is tried to run.
> 
> First, Does the user doing the above steps are valid?
> If yes, then we should guarantee to the user that proactive compaction
> atleast tried to run when the user changed the proactiveness.
> If not, I feel, we should document that 'once user changed the
> compaction_proactiveness, he need to wait atleast
> HPAGE_FRAG_CHECK_INTERVAL_MSEC before considering that the value he
> tried to set is effective and proactive compaction tried to run on that'.
> Doesn't this seem okay?

Proactive compaction does not guarantee if the kernel will be able to achieve
fragmentation targets implied from the compaction_proactiveness sysctl. It also
does not guarantee how much time it will take to reach desired fragmentation
levels (if at all possible). Also, HPAGE_FRAG_CHECK_INTERVAL_MSEC is the
maximum sleep time, depending on relative timing of your sysctl input and
the timeout.

The intent of proactive compaction is to maintain desired fragmentation levels
wrt the default hugepage size in the background so we don't have to pay latency
cost associated with on-demand compaction. I don't like the idea of forcing a round
of compaction on sysctl write. Maybe add a Kconfig parameter for setting
HPAGE_FRAG_CHECK_INTERVAL_MSEC to say 1msec?

Thanks,
Nitin