[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20130624081838.GB21768@gmail.com>
Date: Mon, 24 Jun 2013 10:18:38 +0200
From: Ingo Molnar <mingo@...nel.org>
To: Jens Axboe <axboe@...nel.dk>
Cc: Matthew Wilcox <willy@...ux.intel.com>,
Al Viro <viro@...iv.linux.org.uk>,
Ingo Molnar <mingo@...hat.com>, linux-kernel@...r.kernel.org,
linux-nvme@...ts.infradead.org, linux-scsi@...r.kernel.org,
Linus Torvalds <torvalds@...ux-foundation.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Peter Zijlstra <a.p.zijlstra@...llo.nl>,
Thomas Gleixner <tglx@...utronix.de>
Subject: Re: RFC: Allow block drivers to poll for I/O instead of sleeping
* Jens Axboe <axboe@...nel.dk> wrote:
> - With the former note, the app either needs to opt in (and hence
> willingly sacrifice CPU cycles of its scheduling slice) or it needs to
> be nicer in when it gives up and goes back to irq driven IO.
The scheduler could look at sleep latency averages of the task in question
- we measure that already in most cases.
If the 'average sleep latency' is below a certain threshold, the
scheduler, if it sees that the CPU is about to go idle, could delay doing
the context switch and do "light idle-polling", for say twice the length
of the expected sleep latency - assuming the CPU is otherwise idle -
before it really schedules away the task and the CPU goes idle.
This would still require an IRQ and a wakeup to be taken, but would avoid
the context switch.
Yet I have an ungood feeling about depending on actual latency values so
explicitly. There will have to be a cutoff value, and if a workload is
just below or just above that threshold then behavior will change
markedly. Such schemes rarely worked out nicely in the past. [Might still
be worth trying it.]
Couldn't the block device driver itself estimate the expected latency of
IO completion and simply poll if that's expected to be very short [such as
there's only a single outstanding IO to a RAM backed device]? IO drivers
doing some polling and waiting in the microseconds range isnt overly
controversial. I'd even do that if the CPU is busy otherwise: the task
should see a proportional slowdown as load increases, with no change in IO
queueing behavior.
Thanks,
Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists