linux-kernel - Re: [2.6.36-rc3] Workqueues, XFS, dependencies and deadlocks

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4C87474B.3050405@kernel.org>
Date:	Wed, 08 Sep 2010 10:20:27 +0200
From:	Tejun Heo <tj@...nel.org>
To:	Dave Chinner <david@...morbit.com>
CC:	linux-kernel@...r.kernel.org, xfs@....sgi.com,
	linux-fsdevel@...r.kernel.org
Subject: Re: [2.6.36-rc3] Workqueues, XFS, dependencies and deadlocks

Hello,

On 09/08/2010 09:34 AM, Dave Chinner wrote:
>> I see.  The use case itself shouldn't be problematic at all for cmwq
>> (sans bugs of course).  In the other reply, you said "the system is
>> 100% unresponsive when the livelock occurs", which is kind of
>> puzzling.  It isn't really a livelock.
> 
> Actually, it is. You don't need to burn CPU to livelock, you just
> need a loop in the state machine that cannot be broken by internal
> or external events to be considered livelocked.

Yeah, but for the system to be completely unresponsive even to sysrq,
the system needs to be live/dead locked in a pretty specific way.

> However, this is not what I was calling the livelock problem - this
> is what I was calling the deadlock problem because to all external
> appearences the state machine is deadlocked on the inode lock....
> 
> The livelock case I described where the system is completely
> unresponsive is the one I'm testing the WQ_HIGHPRI mod against.
> 
> FWIW, having considered the above case again, and seeing what the
> WQ_HIGHPRI mod does in terms of queuing, I think that it may also
> solve this deadlock as the log IO completionwill always be queued
> ahead of the data IO completion now.

Cool, but please keep in mind that the nr_active underflow bug may end
up stalling or loosening ordering rules for a workqueue.  Linus has
pulled in the pending fixes today.

>> Hmm... The point where I'm confused is that *delay()'s are busy waits.
>> They burn CPU cycles.  I suppose you're referring to *sleep()'s,
>> right?
> 
> fs/xfs/linux-2.6/time.h:
> 
> static inline void delay(long ticks)
> {
>         schedule_timeout_uninterruptible(ticks);
> }

Heh yeah, there's my confusion.

>> Probably I have overloaded the term 'concurrency' too much.  In this
>> case, I meant the number of workers assigned to work items of the wq.
>> If you fire off N work items which sleep at the same time, cmwq will
>> eventually try to create N workers as each previous worker goes to
>> sleep so that the CPU doesn't sit idle while there are work items to
>> process as long as N < @wq->nr->active.
> 
> Ok, so if I queue N items on a single CPU when max_active == N, they
> get spread across N worker threads on different CPUs? 

They may if necessary to keep the workqueue progressing.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/