linux-kernel - Re: Block: Prevent busy looping

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20080417071335.GR12774@kernel.dk>
Date:	Thu, 17 Apr 2008 09:13:35 +0200
From:	Jens Axboe <jens.axboe@...cle.com>
To:	Elias Oltmanns <eo@...ensachen.de>
Cc:	linux-kernel@...r.kernel.org, stable@...nel.org
Subject: Re: Block: Prevent busy looping

On Thu, Apr 17 2008, Elias Oltmanns wrote:
> Jens Axboe <jens.axboe@...cle.com> wrote:
> > On Wed, Apr 16 2008, Elias Oltmanns wrote:
> >> blk_run_queue() as well as blk_start_queue() plug the device on reentry
> >> and schedule blk_unplug_work() right afterwards. However,
> >> blk_plug_device() takes care of that already and makes sure that there is
> >> a short delay before blk_unplug_work() is scheduled. This is important
> >> to prevent busy looping and possibly system lockups as observed here:
> >> <http://permalink.gmane.org/gmane.linux.ide/28351>.
> >
> > If you call blk_start_queue() and blk_run_queue(), you better mean it.
> > There should be no delay. The only reason it does blk_plug_device() is
> > so that the work queue function will actually do some work.
> 
> Well, I'm mainly concerned with blk_run_queue(). In a comment it says
> that it should recurse only once so as not to overrun the stack. On my
> machine, however, immediate rescheduling may have exactly as disastrous
> consequences as an overrunning stack would have since the system locks
> up completely.
> 
> Just to get this straight: Are low level drivers allowed to rely on
> blk_run_queue() that there will be no loops or do they have to make sure
> that this function is not called from the request_fn() of the same
> queue?

It's not really designed for being called recursively. Which isn't the
problem imo, the problem is SCSI apparently being dumb and calling
blk_run_queue() all the time. blk_run_queue() must run the queue NOW. If
SCSI wants something like 'run the queue in a bit', it should use
blk_plug_device() instead.

> > In the newer kernels we just do:
> >
> >         set_bit(QUEUE_FLAG_PLUGGED, &q->queue_flags);
> >         kblockd_schedule_work(q, &q->unplug_work);
> >
> > instead, which is much better.
> 
> Only as long as it doesn't get called from the request_fn() of the same
> queue. Otherwise, there may be no chance for other threads to clear the
> condition that caused blk_run_queue() to be called in the first place.

Broken usage.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/