lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 1 Aug 2007 03:24:21 +0200
From:	Nick Piggin <npiggin@...e.de>
To:	"Siddha, Suresh B" <suresh.b.siddha@...el.com>
Cc:	Christoph Lameter <clameter@....com>, linux-kernel@...r.kernel.org,
	arjan@...ux.intel.com, mingo@...e.hu, ak@...e.de,
	jens.axboe@...cle.com, James.Bottomley@...elEye.com,
	andrea@...e.de, akpm@...ux-foundation.org,
	andrew.vasquez@...gic.com
Subject: Re: [rfc] direct IO submission and completion scalability issues

On Tue, Jul 31, 2007 at 05:55:13PM -0700, Suresh B wrote:
> On Wed, Aug 01, 2007 at 02:41:18AM +0200, Nick Piggin wrote:
> > On Tue, Jul 31, 2007 at 10:14:03AM -0700, Suresh B wrote:
> > > task by taking away some of its cpu time. With CFS micro accounting, perhaps
> > > we can track irq, softirq time and avoid penalizing the running task's cpu
> > > time.
> > 
> > But you "penalize" the running task in the completion handler as well
> > anyway.
> 
> Yes.
> 
> Ingo, in general with CFS micro accounting, we should be able to avoid
> penalizing the running task by tracking irq/softirq time. Isn't it?
> 
> > Doing this with a SCHED_FIFO task is sort of like doing interrupt
> > threading which AFAIK has not been accepted (yet).
> 
> I am not recommending SCHED_FIFO. I will take a look at softirq
> infrastructure for this.

I think that would be a fine way to go.


> > > This workload is using direct IO and there is no batching at the block layer
> > > for direct IO. IO is submitted to the HW as it arrives.
> > 
> > So you aren't putting concurrent requests into the queue? Sounds like
> > userspace should be improved.
> 
> Nick remember that there are hundreds of disks in this setup and at
> an instance, there will be max 1 or 2 requests per disk.

Well if there is 2 requests per disk, that's a good thing; you won't
need to unplug. If there is only 1, then as well as the plugging cost,
the hardeware loses some ability to pipeline things effectively.

I'm not saying the kernel shouldn't be improved in the latter case, but
if you're looking for performance, it is nice to ensure you have at
least 2 requests. Presumably you're using some pretty well tuned db
software though, so I guess this is not always possible.

Do you have stats for these things (queue empty vs not empty events,
unplugs, etc)? 


> > > It is applicable for both direct IO and buffered IO. But the implementations
> > > will differ. For example in buffered IO, we can setup in such a way that the
> > > block plug timeout function runs on the IO completion cpu.
> > 
> > It would be nice to be doing that anyway. But unplug via request submission
> > rather than timeout is fairly common in buffered loads too.
> 
> Ok. Currently the patch handles both direct and buffered IO. While making
> improvements to this patch I will make sure that both the paths take
> advantage of this.

Sounds good!

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ