linux-kernel - Re: [PATCH] cgroup/cpuset: remove circular dependency deadlock

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:   Thu, 7 Sep 2017 19:45:05 +0200
From:   Peter Zijlstra <peterz@...radead.org>
To:     Prateek Sood <prsood@...eaurora.org>
Cc:     tj@...nel.org, lizefan@...wei.com, cgroups@...r.kernel.org,
        mingo@...nel.org, longman@...hat.com, boqun.feng@...il.com,
        tglx@...utronix.de, linux-kernel@...r.kernel.org,
        sramana@...eaurora.org
Subject: Re: [PATCH] cgroup/cpuset: remove circular dependency deadlock

On Thu, Sep 07, 2017 at 07:26:23PM +0530, Prateek Sood wrote:
> Remove circular dependency deadlock in a scenario where hotplug of CPU is
> being done while there is updation in cgroup and cpuset triggered from
> userspace.
> 
> Process A => kthreadd => Process B => Process C => Process A

> Process A
> cpu_subsys_offline();
>   cpu_down();
>     _cpu_down();
>       percpu_down_write(&cpu_hotplug_lock); //held
>       cpuhp_invoke_callback();
>         workqueue_offline_cpu();
>           wq_update_unbound_numa();
>             kthread_create_on_node();
>               wake_up_process();  //wakeup kthreadd

TJ, I'm puzzled, why would we need to spawn new threads to update NUMA
affinity when taking a CPU out? That doesn't make sense to me, we can
either shrink the affinity of an existing thread or completely kill of a
thread if the mask becomes empty. But why spawn a new thread?

>           flush_work();
>           wait_for_completion();
> 
> kthreadd
> kthreadd();
>   kernel_thread();
>     do_fork();
>       copy_process();
>         percpu_down_read(&cgroup_threadgroup_rwsem);
>           __rwsem_down_read_failed_common(); //waiting

So this will eventually do our:

		complete() to make A go.


> Process B
> kernfs_fop_write();
>   cgroup_file_write();
>     cgroup_procs_write();
>       percpu_down_write(&cgroup_threadgroup_rwsem); //held
>       cgroup_attach_task();
>         cgroup_migrate();
>           cgroup_migrate_execute();
>             cpuset_can_attach();
>               mutex_lock(&cpuset_mutex); //waiting
> 
> Process C
> kernfs_fop_write();
>   cgroup_file_write();
>     cpuset_write_resmask();
>       mutex_lock(&cpuset_mutex); //held
>       update_cpumask();
>         update_cpumasks_hier();
>           rebuild_sched_domains_locked();
>             get_online_cpus();
>               percpu_down_read(&cpu_hotplug_lock); //waiting


So the whole thing looks like:


A		B			C		D

L(hotplug)
		L(threadgroup)
					L(cpuset)
							L(threadgroup)
WFC(c)
		L(cpuset)
					L(hotplug)
							C(c)

Yes, inverting cpuset and hotplug would break that chain, but I'm still
wondering why workqueue needs to spawn threads on CPU down.