linux-kernel - Re: per cpun+ spin locks coexistence?

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <47E0033E.4010300@cosmosbay.com>
Date:	Tue, 18 Mar 2008 19:00:30 +0100
From:	Eric Dumazet <dada1@...mosbay.com>
To:	Peter Teoh <htmldeveloper@...il.com>
Cc:	Johannes Weiner <hannes@...urebad.de>,
	Jeremy Fitzhardinge <jeremy@...p.org>,
	LKML <linux-kernel@...r.kernel.org>,
	Tejun Heo <htejun@...il.com>,
	Dipankar Sarma <dipankar@...ibm.com>
Subject: Re: per cpun+ spin locks coexistence?

Peter Teoh a écrit :
> On 3/18/08, Eric Dumazet <dada1@...mosbay.com> wrote:
>
>   
>> You are right Peter, that fs/file.c contains some leftover from previous
>>  implementation of defer queue,
>>  that was using a timer.
>>
>>  So we can probably provide a patch that :
>>
>>  - Use spin_lock() & spin_unlock() instead of spin_lock_bh() &
>>  spin_unlock_bh() in free_fdtable_work()
>>  since we dont anymore use a softirq (timer) to reschedule the workqueue.
>>
>>  ( this timer was deleted by the following patch :
>>  http://readlist.com/lists/vger.kernel.org/linux-kernel/50/251040.html
>>
>>
>>  But, you cannot avoid use of spin_lock()/spin_unlock() because
>>  schedule_work() makes no garantee that the work will be done by this cpu.
>>     
>
> Ah.....u have hit the nail....and combine with Johannes Weiner's
> explanation, I have pieced together the full scenario:
>
> First, the following is possible:
>
>                 fddef = &get_cpu_var(fdtable_defer_list);
>                 spin_lock(&fddef->lock);
>                 fdt->next = fddef->next;
>                 fddef->next = fdt;==============>executing at CPU A
>                 /* vmallocs are handled from the workqueue context */
>                 schedule_work(&fddef->wq);
>                 spin_unlock(&fddef->lock);==============>executing at CPU B
>                 put_cpu_var(fdtable_defer_list);
>
> where the execution can switch CPU after the schedule_work() API, then
> LOGICALLY u definitely need the spin_lock(), and the per_cpu data is
> really not necessary.
>
> But without the per_cpu structure, then the following "dedicated
> chunk" can only execute on one processor, with the possibility of
> switching to another processor after schedule_work():
>   
Hum, you misunderstood the point.

schedule_work(); wont switch your current CPU, since you are inside a spin_lock
()/spin_unlock() pair, so preemption is not possible.



> So then we introduce the per_cpu structure - so that the "dedicated
> chunk" can be executing on multiple processor ALL AT THE SAME TIME,
> without interferring each other, as fddef are per-cpu (rightfully
> owned only before schedule_work() is called, but after schedule_work()
> is called, an arbitrary CPU will be executing this fddef).
>
> spin_lock() is necessary because of the possibility of CPU switch
> (schedule_work()).
>
> and per_cpu is so that the same chunk of code can be executing at
> multiple CPUs all at the same time.
>
> Now the key issue rises up - as I have just asked before u answered my question:
>
> http://mail.nl.linux.org/kernelnewbies/2008-03/msg00236.html
>
> can schedule_work() sleep?   (just like schedule(), whcih can sleep right?)
> schedule_work() is guaranteed to execute the work queue at least once,
> and so this thread may or may not sleep. correct?   Or wrong?
>
>   
schedule_work() cannot sleep. It only queues a work to be done later by 
a special thread.

We need this because vfree() should not be called from softirq handler 
(rcu in this case), so we queue a (small) job.
> Problem is when u sleep and never wake up, then the spin_lock become
> permanently locked, and when later the same CPU (have to be the same
> fddef CPU) is being reschedule to execute the get_cpu_var() again, it
> will spin_lock() infinitely, resulting in 100% CPU utilization error.
>
> To prevent these types of error, spin_lock are always not to be used
> with to wrap around functions that can sleep, and can only containing
> short routines between lock and unlock.
>
> Is my analysis correct?
>
>   
Not exactly :) , but please continue to learn :)





--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/