lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2ac6f207-e08a-2a7f-01ae-dfaf15eefaf6@redhat.com>
Date:   Wed, 23 Nov 2022 15:23:55 -0500
From:   Waiman Long <longman@...hat.com>
To:     "haifeng.xu" <haifeng.xu@...pee.com>
Cc:     lizefan.x@...edance.com, tj@...nel.org, hannes@...xchg.org,
        cgroups@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] cgroup/cpuset: Optimize update_tasks_nodemask()

On 11/23/22 03:21, haifeng.xu wrote:
> When change the 'cpuset.mems' under some cgroup, system will hung
> for a long time. From the dmesg, many processes or theads are
> stuck in fork/exit. The reason is show as follows.
>
> thread A:
> cpuset_write_resmask /* takes cpuset_rwsem */
>    ...
>      update_tasks_nodemask
>        mpol_rebind_mm /* waits mmap_lock */
>
> thread B:
> worker_thread
>    ...
>      cpuset_migrate_mm_workfn
>        do_migrate_pages /* takes mmap_lock */
>
> thread C:
> cgroup_procs_write /* takes cgroup_mutex and cgroup_threadgroup_rwsem */
>    ...
>      cpuset_can_attach
>        percpu_down_write /* waits cpuset_rwsem */
>
> Once update the nodemasks of cpuset, thread A wakes up thread B to
> migrate mm. But when thread A iterates through all tasks, including
> child threads and group leader, it has to wait the mmap_lock which
> has been take by thread B. Unfortunately, thread C wants to migrate
> tasks into cgroup at this moment, it must wait thread A to release
> cpuset_rwsem. If thread B spends much time to migrate mm, the
> fork/exit which acquire cgroup_threadgroup_rwsem also need to
> wait for a long time.
>
> There is no need to migrate the mm of child threads which is
> shared with group leader. Just iterate through the group
> leader only.
>
> Signed-off-by: haifeng.xu <haifeng.xu@...pee.com>
> ---
>   kernel/cgroup/cpuset.c | 3 +++
>   1 file changed, 3 insertions(+)
>
> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
> index 589827ccda8b..43cbd09546d0 100644
> --- a/kernel/cgroup/cpuset.c
> +++ b/kernel/cgroup/cpuset.c
> @@ -1968,6 +1968,9 @@ static void update_tasks_nodemask(struct cpuset *cs)
>   
>   		cpuset_change_task_nodemask(task, &newmems);
>   
> +		if (!thread_group_leader(task))
> +			continue;
> +
>   		mm = get_task_mm(task);
>   		if (!mm)
>   			continue;

Could you try the attached test patch to see if it can fix your problem? 
Something along the line of this patch will be more acceptable.

Thanks,
Longman


View attachment "test.patch" of type "text/x-patch" (1275 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ