lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <77575407-0332-8734-2697-bc2099fd9fe0@redhat.com>
Date:   Tue, 22 Jun 2021 16:05:33 +0200
From:   Daniel Bristot de Oliveira <bristot@...hat.com>
To:     Steven Rostedt <rostedt@...dmis.org>
Cc:     Phil Auld <pauld@...hat.com>,
        Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
        Kate Carcia <kcarcia@...hat.com>,
        Jonathan Corbet <corbet@....net>,
        Ingo Molnar <mingo@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Alexandre Chartre <alexandre.chartre@...cle.com>,
        Clark Willaims <williams@...hat.com>,
        John Kacur <jkacur@...hat.com>,
        Juri Lelli <juri.lelli@...hat.com>,
        Borislav Petkov <bp@...en8.de>,
        "H. Peter Anvin" <hpa@...or.com>, x86@...nel.org,
        linux-doc@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH V4 05/12] trace/hwlat: Support hotplug operations

On 6/21/21 7:46 PM, Steven Rostedt wrote:
> On Mon, 21 Jun 2021 18:14:36 +0200
> Daniel Bristot de Oliveira <bristot@...hat.com> wrote:
> 
>>>> Yep! I tried to take the trace_type_lock here, and got the lockdep info about
>>>> this problem.
>>>>  
>>>>> The only thing I could think of is to wake up a worker thread to do the
>>>>> work. That is, this just wakes the worker thread, then the worker grabs
>>>>> the trace_types_lock, iterates through the cpu mask of expect running
>>>>> threads, and then starts or kills them depending on the hwlat_busy
>>>>> value.    
>>>> So, it will not wait for the kworker to run?  
>>> What wont wait?  
>>
>> For example, at the shutdown, should the hotplug callback wait for the workqueue
>> to run & kill the thread, or not?
> 
> Doing that won't help the deadlock situation.

yep, that is why I asked... :-(

> 	CPU 1			CPU 2
> 	-----			-----
>     Start shutdown
>     down online_cpus()
> 
> 			   mutex_lock(trace_types_lock);
> 			   get_online_cpus()
> 			   [BLOCK]
> 
>     wake_up_thread;
>     [schedule worker]
> 
>     mutex_lock(trace_types_lock);
> 
>  [ DEADLOCK ]
> 
> 
> Make all access to save_cpumask and hwlat_per_cpu_data inside the
> get_online_cpus() protection. (like in move_to_next_cpu(),
> start_single_thread() expand the get_online_cpus()).
> 
> Then in the cpu going down case, we can simply kill the thread and
> update the save_cpumask, as it will be protected by the
> get_online_cpus() code.
> 
> That is, don't even check if hwlat_busy is set or not. Just simply do:
> 
> 
> CPU_DOWN:
> 
> 	stop_cpu_kthead(cpu);
> 
> That will stop the kthread if it is running.  But we should update
> that function to also set per_cpu(hwlat_per_cpu_data).kthread = NULL;
> Like stop_single_kthread() does.
> 
> But for CPU_UP, we should do the work via a worker thread.
> 
> CPU_UP:
> 	schedule_work_on(&update_kthreads, cpu);
> 
> Which in the work function for that update_kthreads work queue:
> 
> 	mutex_lock(&trace_types_lock);
> 	if (!hwlat_busy || hwlat_data.thread_mode != MODE_PER_CPU)
> 		goto out_unlock;
> 
> 	get_online_cpus();
> 	if (!this_cpu(hwlat_per_cpu_data).kthread)
> 		start_per_cpu_kthread(smp_processor_id());
> 	put_online_cpus();
> 
>  out_unlock:
> 	mutex_unlock(&trace_types_lock);
> 
> Or something like that.

It works!

I will send a v5 with all the fixes requested, including this one.

Thanks Steven!
-- Daniel

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ