linux-kernel - Re: [rfc] direct IO submission and completion scalability issues

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20070731171403.GL3318@linux-os.sc.intel.com>
Date:	Tue, 31 Jul 2007 10:14:03 -0700
From:	"Siddha, Suresh B" <suresh.b.siddha@...el.com>
To:	Nick Piggin <npiggin@...e.de>
Cc:	"Siddha, Suresh B" <suresh.b.siddha@...el.com>,
	Christoph Lameter <clameter@....com>,
	linux-kernel@...r.kernel.org, arjan@...ux.intel.com, mingo@...e.hu,
	ak@...e.de, jens.axboe@...cle.com, James.Bottomley@...elEye.com,
	andrea@...e.de, akpm@...ux-foundation.org,
	andrew.vasquez@...gic.com
Subject: Re: [rfc] direct IO submission and completion scalability issues

On Tue, Jul 31, 2007 at 06:19:17AM +0200, Nick Piggin wrote:
> On Mon, Jul 30, 2007 at 01:35:19PM -0700, Suresh B wrote:
> > So any suggestions for making this clean and acceptable to everyone?
> 
> It is obviously a good idea to hand over the IO at the point which
> requires the least number of cachelines to be moved, and I think doing
> it in the block layer is right. Mostly you have to convince the block
> and driver maintainers I guess.

Yes. Implementation is the challenging part I guess.

> The scheduler really should be made interrupt-load aware anyway, so I
> don't have a problem with changing that; or scheduling kblockd at a
> higher priority, but I don't know if SCHED_FIFO is a good idea. Couldn't
> it be done in a softirq instead?

Yes, softirq context is one way. But just didn't want to penalize the running
task by taking away some of its cpu time. With CFS micro accounting, perhaps
we can track irq, softirq time and avoid penalizing the running task's cpu
time.

> Latency for IO migration could be the most difficult problem to solve
> really. You don't give much details of the workload, profiles, etc... I
> hope this is for a real world test?

Improvement numbers quoted are from the OLTP database workload. We can look
into other workloads.

> Can the locking be improved in simpler ways first?
> 
> Just some random questions...
> 
> It looks like the main source of cacheline bouncing you're eliminating
> is from the initial starting of IO from an empty queue (ie. unplug).
> From then on, the submission is driven by completion, right?
> 
> Why is the queue allowed to go empty in the first place in an IO critical
> workload?

This workload is using direct IO and there is no batching at the block layer
for direct IO. IO is submitted to the HW as it arrives.

> Are you loading up each CPU with as many disks as it can possibly handle
> plus a few more? If so, is that realistic? (I honestly don't know).

There is 3-4% iowait time in the system. So the cpu's are not 100% busy,
but there is quite a bit of direct IO going on.

> You say that you'd like to do this for direct IO only, but if it is more
> efficient, why not for buffered IO as well? (or is it not more efficient
> for buffered IO? if not, why?)

It is applicable for both direct IO and buffered IO. But the implementations
will differ. For example in buffered IO, we can setup in such a way that the
block plug timeout function runs on the IO completion cpu.

> AFAIKS, you'd still have significant queue_lock contention from other
> CPUs inserting requests into the list?

Correct. We have more potential to explore. Current implementation
is very elementary.

> What IO scheduler are you using? I assume noop...

yes.

> as a crazy experiment, what happens if you create per-cpu request queues?

or in other words, each kblockd thread catering multiple request queues
(perhaps one for each cpu or one for group of cpu's).

softirq context and each kblockd thread handling multiple request queues will
lead to further improvements.

thanks,
suresh
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/