lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4e2aa5c2-3d8c-2a2f-691b-218e23e7271f@bytedance.com>
Date:   Wed, 28 Sep 2022 11:09:47 +0800
From:   Abel Wu <wuyun.abel@...edance.com>
To:     Michal Hocko <mhocko@...e.com>
Cc:     Zhongkun He <hezhongkun.hzk@...edance.com>, corbet@....net,
        akpm@...ux-foundation.org, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org, linux-doc@...r.kernel.org
Subject: Re: [RFC] proc: Add a new isolated /proc/pid/mempolicy type.

On 9/27/22 9:58 PM, Michal Hocko wrote:
> On Tue 27-09-22 21:07:02, Abel Wu wrote:
>> On 9/27/22 6:49 PM, Michal Hocko wrote:
>>> On Tue 27-09-22 11:20:54, Abel Wu wrote:
>>> [...]
>>>>>> Btw.in order to add per-thread-group mempolicy, is it possible to add
>>>>>> mempolicy in mm_struct?
>>>>>
>>>>> I dunno. This would make the mempolicy interface even more confusing.
>>>>> Per mm behavior makes a lot of sense but we already do have per-thread
>>>>> semantic so I would stick to it rather than introducing a new semantic.
>>>>>
>>>>> Why is this really important?
>>>>
>>>> We want soft control on memory footprint of background jobs by applying
>>>> NUMA preferences when necessary, so the impact on different NUMA nodes
>>>> can be managed to some extent. These NUMA preferences are given by the
>>>> control panel, and it might not be suitable to overwrite the tasks with
>>>> specific memory policies already (or vice versa).
>>>
>>> Maybe the answer is somehow implicit but I do not really see any
>>> argument for the per thread-group semantic here. In other words why a
>>> new interface has to cover more than the local [sg]et_mempolicy?
>>> I can see convenience as one potential argument. Also if there is a
>>> requirement to change the policy in atomic way then this would require a
>>> single syscall.
>>
>> Convenience is not our major concern. A well-tuned workload can have
>> specific memory policies for different tasks/vmas in one process, and
>> this can be achieved by set_mempolicy()/mbind() respectively. While
>> other workloads are not, they don't care where the memory residents,
>> so the impact they brought on the co-located workloads might vary in
>> different NUMA nodes.
>>
>> The control panel, which has a full knowledge of workload profiling,
>> may want to interfere the behavior of the non-mempolicied processes
>> by giving them NUMA preferences, to better serve the co-located jobs.
>>
>> So in this scenario, a process's memory policy can be assigned by two
>> objects dynamically:
>>
>>   a) the process itself, through set_mempolicy()/mbind()
>>   b) the control panel, but API is not available right now
>>
>> Considering the two policies should not fight each other, it sounds
>> reasonable to introduce a new syscall to assign memory policy to a
>> process through struct mm_struct.
> 
> So you want to allow restoring the original local policy if the external
> one is disabled?

Pretty much, but the internal policies are expected to have precedence
over the external ones, since they are set for some reason to meet their
specific requirements. The external ones are used only when there is no
internal policy active.

> 
> Anyway, pidfd_$FOO behavior should be semantically very similar to the
> original $FOO. Moving from per-task to per-mm is a major shift in the
> semantic.  I can imagine to have a dedicated flag for the syscall to
> enforce the policy to the full thread group. But having a different
> semantic is both tricky and also constrained because per-thread binding
> is then impossible.

Agreed. What about a syscall only apply to per-mm? There are precedents
like process_madvice(2).

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ