lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4668d1ec-dc43-8a9c-4f94-a421683d3c17@codeaurora.org>
Date:   Mon, 9 Oct 2017 18:57:46 +0530
From:   Prateek Sood <prsood@...eaurora.org>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     tj@...nel.org, lizefan@...wei.com, cgroups@...r.kernel.org,
        mingo@...nel.org, longman@...hat.com, boqun.feng@...il.com,
        tglx@...utronix.de, linux-kernel@...r.kernel.org,
        sramana@...eaurora.org
Subject: Re: [PATCH] cgroup/cpuset: remove circular dependency deadlock

On 09/07/2017 11:21 PM, Peter Zijlstra wrote:
> On Thu, Sep 07, 2017 at 07:26:23PM +0530, Prateek Sood wrote:
>> Remove circular dependency deadlock in a scenario where hotplug of CPU is
>> being done while there is updation in cgroup and cpuset triggered from
>> userspace.
> 
> You've forgotten to mention your solution to the deadlock, namely
> inverting cpuset_mutex and cpu_hotplug_lock.
> 
>> Signed-off-by: Prateek Sood <prsood@...eaurora.org>
>> ---
>>  kernel/cgroup/cpuset.c | 32 +++++++++++++++++++-------------
>>  1 file changed, 19 insertions(+), 13 deletions(-)
>>
>> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
>> index 2f4039b..60dc0ac 100644
>> --- a/kernel/cgroup/cpuset.c
>> +++ b/kernel/cgroup/cpuset.c
>> @@ -816,16 +816,15 @@ static int generate_sched_domains(cpumask_var_t **domains,
>>   * 'cpus' is removed, then call this routine to rebuild the
>>   * scheduler's dynamic sched domains.
>>   *
>> - * Call with cpuset_mutex held.  Takes get_online_cpus().
>>   */
>> -static void rebuild_sched_domains_locked(void)
>> +static void rebuild_sched_domains_cpuslocked(void)
>>  {
>>  	struct sched_domain_attr *attr;
>>  	cpumask_var_t *doms;
>>  	int ndoms;
>>  
>> +	lockdep_assert_cpus_held();
>>  	lockdep_assert_held(&cpuset_mutex);
>> -	get_online_cpus();
>>  
>>  	/*
>>  	 * We have raced with CPU hotplug. Don't do anything to avoid
>> @@ -833,27 +832,27 @@ static void rebuild_sched_domains_locked(void)
>>  	 * Anyways, hotplug work item will rebuild sched domains.
>>  	 */
>>  	if (!cpumask_equal(top_cpuset.effective_cpus, cpu_active_mask))
>> -		goto out;
>> +		return;
>>  
>>  	/* Generate domain masks and attrs */
>>  	ndoms = generate_sched_domains(&doms, &attr);
>>  
>>  	/* Have scheduler rebuild the domains */
>>  	partition_sched_domains(ndoms, doms, attr);
>> -out:
>> -	put_online_cpus();
>>  }
>>  #else /* !CONFIG_SMP */
>> -static void rebuild_sched_domains_locked(void)
>> +static void rebuild_sched_domains_cpuslocked(void)
>>  {
>>  }
>>  #endif /* CONFIG_SMP */
>>  
>>  void rebuild_sched_domains(void)
>>  {
>> +	get_online_cpus();
>>  	mutex_lock(&cpuset_mutex);
>> -	rebuild_sched_domains_locked();
>> +	rebuild_sched_domains_cpuslocked();
>>  	mutex_unlock(&cpuset_mutex);
>> +	put_online_cpus();
>>  }
> 
> But if you invert these locks, the need for cpuset_hotplug_workfn() goes
> away, at least for the CPU part, and we can make in synchronous again.
> Yay!!
> 
> Also, I think new code should use cpus_read_lock() instead of
> get_online_cpus().
> 

Thanks for the review comments Peter.
For patch related to circular deadlock, I will send an updated version.

The callback making a call to cpuset_hotplug_workfn()in hotplug path are
        [CPUHP_AP_ACTIVE] = {
                .name                   = "sched:active",
                .startup.single         = sched_cpu_activate,
                .teardown.single        = sched_cpu_deactivate,
        },

if we make cpuset_hotplug_workfn() synchronous, deadlock might happen:
_cpu_down()
   cpus_write_lock()  //held
      cpuhp_kick_ap_work()
        cpuhp_kick_ap()
           __cpuhp_kick_ap()
              wake_up_process() //cpuhp_thread_fun
                wait_for_ap_thread() //wait for complete from cpuhp_thread_fun()

cpuhp_thread_fun()
   cpuhp_invoke_callback()
     sched_cpu_deactivate()
       cpuset_cpu_inactive()
          cpuset_update_active_cpus()
             cpuset_hotplug_work()
                rebuild_sched_domains()
                   cpus_read_lock() //waiting as acquired in _cpu_down()
                  

-- 
Qualcomm India Private Limited, on behalf of Qualcomm Innovation
Center, Inc., is a member of Code Aurora Forum, a Linux Foundation
Collaborative Project

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ