linux-kernel - Re: per cpun+ spin locks coexistence?

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <804dabb00803181000g408b8ab9oe1075952dc859823@mail.gmail.com>
Date:	Wed, 19 Mar 2008 01:00:27 +0800
From:	"Peter Teoh" <htmldeveloper@...il.com>
To:	"Eric Dumazet" <dada1@...mosbay.com>
Cc:	"Johannes Weiner" <hannes@...urebad.de>,
	"Jeremy Fitzhardinge" <jeremy@...p.org>,
	LKML <linux-kernel@...r.kernel.org>,
	"Tejun Heo" <htejun@...il.com>,
	"Dipankar Sarma" <dipankar@...ibm.com>
Subject: Re: per cpun+ spin locks coexistence?

On 3/18/08, Eric Dumazet <dada1@...mosbay.com> wrote:

>
> You are right Peter, that fs/file.c contains some leftover from previous
>  implementation of defer queue,
>  that was using a timer.
>
>  So we can probably provide a patch that :
>
>  - Use spin_lock() & spin_unlock() instead of spin_lock_bh() &
>  spin_unlock_bh() in free_fdtable_work()
>  since we dont anymore use a softirq (timer) to reschedule the workqueue.
>
>  ( this timer was deleted by the following patch :
>  http://readlist.com/lists/vger.kernel.org/linux-kernel/50/251040.html
>
>
>  But, you cannot avoid use of spin_lock()/spin_unlock() because
>  schedule_work() makes no garantee that the work will be done by this cpu.

Ah.....u have hit the nail....and combine with Johannes Weiner's
explanation, I have pieced together the full scenario:

First, the following is possible:

                fddef = &get_cpu_var(fdtable_defer_list);
                spin_lock(&fddef->lock);
                fdt->next = fddef->next;
                fddef->next = fdt;==============>executing at CPU A
                /* vmallocs are handled from the workqueue context */
                schedule_work(&fddef->wq);
                spin_unlock(&fddef->lock);==============>executing at CPU B
                put_cpu_var(fdtable_defer_list);

where the execution can switch CPU after the schedule_work() API, then
LOGICALLY u definitely need the spin_lock(), and the per_cpu data is
really not necessary.

But without the per_cpu structure, then the following "dedicated
chunk" can only execute on one processor, with the possibility of
switching to another processor after schedule_work():

So then we introduce the per_cpu structure - so that the "dedicated
chunk" can be executing on multiple processor ALL AT THE SAME TIME,
without interferring each other, as fddef are per-cpu (rightfully
owned only before schedule_work() is called, but after schedule_work()
is called, an arbitrary CPU will be executing this fddef).

spin_lock() is necessary because of the possibility of CPU switch
(schedule_work()).

and per_cpu is so that the same chunk of code can be executing at
multiple CPUs all at the same time.

Now the key issue rises up - as I have just asked before u answered my question:

http://mail.nl.linux.org/kernelnewbies/2008-03/msg00236.html

can schedule_work() sleep?   (just like schedule(), whcih can sleep right?)
schedule_work() is guaranteed to execute the work queue at least once,
and so this thread may or may not sleep. correct?   Or wrong?

Problem is when u sleep and never wake up, then the spin_lock become
permanently locked, and when later the same CPU (have to be the same
fddef CPU) is being reschedule to execute the get_cpu_var() again, it
will spin_lock() infinitely, resulting in 100% CPU utilization error.

To prevent these types of error, spin_lock are always not to be used
with to wrap around functions that can sleep, and can only containing
short routines between lock and unlock.

Is my analysis correct?

-- 
Regards,
Peter Teoh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/