linux-kernel - Re: Prevent busy looping

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Wed, 11 Jun 2008 16:11:09 +0900
From:	Tejun Heo <htejun@...il.com>
To:	Tejun Heo <htejun@...il.com>,
	James Bottomley <James.Bottomley@...senPartnership.com>,
	Jens Axboe <jens.axboe@...cle.com>, linux-ide@...r.kernel.org,
	linux-scsi@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: Prevent busy looping

Picking up a dropped ball.

Elias Oltmanns wrote:
> Jens Axboe <jens.axboe@...cle.com> wrote:
>> On Thu, Apr 17 2008, Elias Oltmanns wrote:
>>> Jens Axboe <jens.axboe@...cle.com> wrote:
>>>> On Wed, Apr 16 2008, Elias Oltmanns wrote:
>>>>> blk_run_queue() as well as blk_start_queue() plug the device on reentry
>>>>> and schedule blk_unplug_work() right afterwards. However,
>>>>> blk_plug_device() takes care of that already and makes sure that there is
>>>>> a short delay before blk_unplug_work() is scheduled. This is important
>>>>> to prevent busy looping and possibly system lockups as observed here:
>>>>> <http://permalink.gmane.org/gmane.linux.ide/28351>.
>>>> If you call blk_start_queue() and blk_run_queue(), you better mean it.
>>>> There should be no delay. The only reason it does blk_plug_device() is
>>>> so that the work queue function will actually do some work.
>>> Well, I'm mainly concerned with blk_run_queue(). In a comment it says
>>> that it should recurse only once so as not to overrun the stack. On my
>>> machine, however, immediate rescheduling may have exactly as disastrous
>>> consequences as an overrunning stack would have since the system locks
>>> up completely.
>>>
>>> Just to get this straight: Are low level drivers allowed to rely on
>>> blk_run_queue() that there will be no loops or do they have to make sure
>>> that this function is not called from the request_fn() of the same
>>> queue?
>> It's not really designed for being called recursively. Which isn't the
>> problem imo, the problem is SCSI apparently being dumb and calling
>> blk_run_queue() all the time. blk_run_queue() must run the queue NOW. If
>> SCSI wants something like 'run the queue in a bit', it should use
>> blk_plug_device() instead.
> 
> James would probably argue that this is alright as long as
> max_device_blocked and max_host_blocked are bigger than one.
> 
>>>> In the newer kernels we just do:
>>>>
>>>>         set_bit(QUEUE_FLAG_PLUGGED, &q->queue_flags);
>>>>         kblockd_schedule_work(q, &q->unplug_work);
>>>>
>>>> instead, which is much better.
>>> Only as long as it doesn't get called from the request_fn() of the same
>>> queue. Otherwise, there may be no chance for other threads to clear the
>>> condition that caused blk_run_queue() to be called in the first place.
>> Broken usage.
> 
> Right. Tejun, would it be possible to apply the patch below (2.6.25) or
> do you see any alternative?

Okay, I (finally) looked into this.  The meaning of blocked counts is
that to wait (count - 1) * plug delay if the target (be it device or
host) is idle before retrying.  libata uses deferring to implement
command scheduling and as such, there shouldn't be any delay if the
target is not busy.

Elias's synthetic test case triggered infinite loop because it wasn't
a proper ->qc_defer().  ->qc_defer() should never defer commands when
the target is idle.

Attached is debug patch to monitor libata command deferring.  It will
whine if certain command is retried 10 times or more, or ->qc_defer()
is called in rapid succession.  I couldn't find anything wrong with
it.  When IDENTIFY is queued while NCQ commands are in flight, it
waited for several hundreds millisecs for NCQ commands to drain with
each ->qc_defer() calling spaced by several milliseconds as determined
by in-flight NCQ command completion.

So, blocked counts of 1 are just fine as long as ->qc_defer() doesn't
try to defer a command when the target is idle.  That said, there's no
harm in increasing the blocked count to two or even leaving it at the
default because those blocked counters are reset to 0 whenever a
command completes and by the same logic which makes blocked counts of
1 okay, it's guaranteed that every deferred command will have matching
command completions to clear its blocked counts.

As the current code has been working well for quite some time now, I'm
more inclined to leave it as it is.

Thanks.

-- 
tejun

View attachment "defer-debug.patch" of type "text/x-patch" (2156 bytes)