[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aa3382b4-4046-988f-42ea-8812dba7882b@bytedance.com>
Date: Tue, 11 Apr 2023 21:04:18 +0800
From: Gang Li <ligang.bdlg@...edance.com>
To: Michal Koutný <mkoutny@...e.com>
Cc: Waiman Long <longman@...hat.com>, Michal Hocko <mhocko@...e.com>,
cgroups@...r.kernel.org, linux-mm@...ck.org, rientjes@...gle.com,
Zefan Li <lizefan.x@...edance.com>,
linux-kernel@...r.kernel.org
Subject: Re: Re: [PATCH v4] mm: oom: introduce cpuset oom
On 2023/4/11 20:23, Michal Koutný wrote:
> Hello.
>
> On Tue, Apr 11, 2023 at 02:58:15PM +0800, Gang Li <ligang.bdlg@...edance.com> wrote:
>> + cpuset_for_each_descendant_pre(cs, pos_css, &top_cpuset) {
>> + if (nodes_equal(cs->mems_allowed, task_cs(current)->mems_allowed)) {
>> + css_task_iter_start(&(cs->css), CSS_TASK_ITER_PROCS, &it);
>> + while (!ret && (task = css_task_iter_next(&it)))
>> + ret = fn(task, arg);
>> + css_task_iter_end(&it);
>> + }
>> + }
>> + rcu_read_unlock();
>> + cpuset_read_unlock();
>> + return ret;
>> +}
>
> I see this traverses all cpusets without the hierarchy actually
> mattering that much. Wouldn't the CONSTRAINT_CPUSET better achieved by
> globally (or per-memcg) scanning all processes and filtering with:
Oh I see, you mean scanning all processes in all cpusets and scanning
all processes globally are equivalent.
> nodes_intersect(current->mems_allowed, p->mems_allowed
Perhaps it would be better to use nodes_equal first, and if no suitable
victim is found, then downgrade to nodes_intersect?
NUMA balancing mechanism tends to keep memory on the same NUMA node, and
if the selected victim's memory happens to be on a node that does not
intersect with the current process's node, we still won't be able to
free up any memory.
In this example:
A->mems_allowed: 0,1
B->mems_allowed: 1,2
nodes_intersect(A->mems_allowed, B->mems_allowed) == true
Memory Distribution:
+=======+=======+=======+
| Node0 | Node1 | Node2 |
+=======+=======+=======+
| A | | |
+-------+-------+-------+
| | |B |
+-------+-------+-------+
Process A invoke oom, then kill B.
But A still can't get any free mem on Node0 and 1.
> (`current` triggers the OOM, `p` is the iterated task)
> ?
>
> Thanks,
> Michal
Powered by blists - more mailing lists