linux-kernel - Re: IO scheduler based IO controller V10

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20091002080417.GG14918@kernel.dk>
Date:	Fri, 2 Oct 2009 10:04:18 +0200
From:	Jens Axboe <jens.axboe@...cle.com>
To:	Mike Galbraith <efault@....de>
Cc:	Vivek Goyal <vgoyal@...hat.com>,
	Ulrich Lukas <stellplatz-nr.13a@...enparkplatz.de>,
	linux-kernel@...r.kernel.org,
	containers@...ts.linux-foundation.org, dm-devel@...hat.com,
	nauman@...gle.com, dpshah@...gle.com, lizf@...fujitsu.com,
	mikew@...gle.com, fchecconi@...il.com, paolo.valente@...more.it,
	ryov@...inux.co.jp, fernando@....ntt.co.jp, jmoyer@...hat.com,
	dhaval@...ux.vnet.ibm.com, balbir@...ux.vnet.ibm.com,
	righi.andrea@...il.com, m-ikeda@...jp.nec.com, agk@...hat.com,
	akpm@...ux-foundation.org, peterz@...radead.org,
	jmarchan@...hat.com, torvalds@...ux-foundation.org, mingo@...e.hu,
	riel@...hat.com
Subject: Re: IO scheduler based IO controller V10

On Fri, Oct 02 2009, Mike Galbraith wrote:
> On Thu, 2009-10-01 at 20:58 +0200, Jens Axboe wrote:
> > On Thu, Oct 01 2009, Mike Galbraith wrote:
> > > > CIC_SEEK_THR is 8K jiffies so that would be 8seconds on 1000HZ system. Try
> > > > using one "slice_idle" period of 8 ms. But it might turn out to be too
> > > > short depending on the disk speed.
> > > 
> > > Yeah, it is too short, as is even _400_ ms.  Trouble is, by the time
> > > some new task is determined to be seeky, the damage is already done.
> > > 
> > > The below does better, though not as well as "just say no to overload"
> > > of course ;-)
> > 
> > So this essentially takes the "avoid impact from previous slice" to a
> > new extreme, but idling even before dispatching requests from the new
> > queue. We basically do two things to prevent this already - one is to
> > only set the slice when the first request is actually serviced, and the
> > other is to drain async requests completely before starting sync ones.
> > I'm a bit surprised that the former doesn't solve the problem fully, I
> > guess what happens is that if the drive has been flooded with writes, it
> > may service the new read immediately and then return to finish emptying
> > its writeback cache. This will cause an impact for any sync IO until
> > that cache is flushed, and then cause that sync queue to not get as much
> > service as it should have.
> 
> I did the stamping selection other than how long have we been solo based
> on these possibly wrong speculations:
> 
> If we're in the idle window and doing the async drain thing, we've at
> the spot where Vivek's patch helps a ton.  Seemed like a great time to
> limit the size of any io that may land in front of my sync reader to
> plain "you are not alone" quantity.

You can't be in the idle window and doing async drain at the same time,
the idle window doesn't start until the sync queue has completed a
request. Hence my above rant on device interference.

> If we've got sync io in flight, that should mean that my new or old
> known seeky queue has been serviced at least once.  There's likely to be
> more on the way, so delay overloading then too. 
> 
> The seeky bit is supposed to be the earlier "last time we saw a seeker"
> thing, but known seeky is too late to help a new task at all unless you
> turn off the overloading for ages, so I added the if incalculable check
> for good measure, hoping that meant the task is new, may want to exec.
> 
> Stamping any place may (see below) possibly limit the size of the io the
> reader can generate as well as writer, but I figured what's good for the
> goose is good for the the gander, or it ain't really good.  The overload
> was causing the observed pain, definitely ain't good for both at these
> times at least, so don't let it do that.
> 
> > Perhaps the "set slice on first complete" isn't working correctly? Or
> > perhaps we just need to be more extreme.
> 
> Dunno, I was just tossing rocks and sticks at it.
> 
> I don't really understand the reasoning behind overloading:  I can see
> that allows cutting thicker slabs for the disk, but with the streaming
> writer vs reader case, seems only the writers can do that.  The reader
> is unlikely to be alone isn't it?  Seems to me that either dd, a flusher
> thread or kjournald is going to be there with it, which gives dd a huge
> advantage.. it has two proxies to help it squabble over disk, konsole
> has none.

That is true, async queues have a huge advantage over sync ones. But
sync vs async is only part of it, any combination of queued sync, queued
sync random etc have different ramifications on behaviour of the
individual queue.

It's not hard to make the latency good, the hard bit is making sure we
also perform well for all other scenarios.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/