[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKywueT4xRHZGomoPvnA9rob3FpzkPYuuUPe+vEDiOFvTbUiGA@mail.gmail.com>
Date: Fri, 21 Mar 2014 12:32:12 +0400
From: Pavel Shilovsky <piastry@...rsoft.ru>
To: Steven Rostedt <rostedt@...dmis.org>
Cc: Jeff Layton <jlayton@...hat.com>,
LKML <linux-kernel@...r.kernel.org>,
linux-cifs <linux-cifs@...r.kernel.org>,
Steve French <sfrench@...ba.org>,
Peter Zijlstra <peterz@...radead.org>,
Clark Williams <williams@...hat.com>,
"Luis Claudio R. Goncalves" <lclaudio@...g.org>,
Thomas Gleixner <tglx@...utronix.de>,
Tejun Heo <tj@...nel.org>, uobergfe@...hat.com
Subject: Re: [RFC PATCH] cifs: Fix possible deadlock with cifs and work queues
2014-03-21 6:23 GMT+04:00 Steven Rostedt <rostedt@...dmis.org>:
> On Thu, 20 Mar 2014 17:02:39 -0400
> Jeff Layton <jlayton@...hat.com> wrote:
>
>> Eventually the server should just allow the read to complete even if
>> the client doesn't respond to the oplock break. It has to since clients
>> can suddenly drop off the net while holding an oplock. That should
>> allow everything to unwedge eventually (though it may take a while).
>>
>> If that's not happening then I'd be curious as to why...
>
> The problem is that the data is being filled in the page and the reader
> is waiting for the page lock to be released. The kworker for the reader
> will issue the complete() and unlock the page to wake up the reader.
>
> But because the other workqueue callback calls down_read(), and there
> can be a down_write() waiting for the reader to finish, this
> down_read() will block on the lock as well (rwsems are fair locks).
> This blocks the other workqueue callback from issuing the complete and
> page_unlock() that will wake up the reader that is holding the rwsem
> with down_read().
>
> DEADLOCK.
Thank you for reporting and clarifying the issue!
Read and write codepaths both obtain lock_sem for read and then wait
for cifsiod_wq to complete and release lock_sem. They don't do any
lock_sem operations inside their work task queued to cifsiod_wq. But
oplock code can obtain/release lock_sem in its work task. So, that's
why I agree with Jeff and suggest to move the oplock code to a
different work queue (cifsioopd_wq?) but leave read and write
codepaths use cifsiod_wq.
--
Best regards,
Pavel Shilovsky.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists