linux-kernel - Re: [PATCH] Priorities in Anticipatory I/O scheduler

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2846be6b0810281704r5092c415n3fea9c849c6086ca@mail.gmail.com>
Date:	Tue, 28 Oct 2008 17:04:53 -0700
From:	"Naveen Gupta" <ngupta@...gle.com>
To:	"Naveen Gupta" <ngupta@...gle.com>, linux-kernel@...r.kernel.org,
	jens.axboe@...cle.com, akpm@...ux-foundation.org,
	s-uchida@...jp.nec.com
Subject: Re: [PATCH] Priorities in Anticipatory I/O scheduler

2008/10/28 Dave Chinner <david@...morbit.com>:
> On Tue, Oct 28, 2008 at 03:48:44PM -0700, Naveen Gupta wrote:
>> 2008/10/28 Dave Chinner <david@...morbit.com>:
>> > On Tue, Oct 28, 2008 at 10:14:20AM -0700, Naveen Gupta wrote:
>> >> 2008/10/27 Dave Chinner <david@...morbit.com>:
>> >> > On Mon, Oct 27, 2008 at 12:01:32PM -0700, ngupta@...gle.com wrote:
>> >> >>
>> >> >> Modifications to the Anticipatory I/O scheduler to add multiple priority
>> >> >> levels. It makes use of anticipation and batching in current
>> >> >> anticipatory scheduler to implement priorities.
>> > .....
>> >> >> In this patch I have added a new class IOPRIO_CLASS_LATENCY to differentiate
>> >> >> notion of absolute priority over existing uses of various time-slice based
>> >> >> priority classes in cfq. Though internally within anticipatory scheduler all
>> >> >> of them map to best-effort levels. Hence, one can also use various best-effort
>> >> >> priority levels.
>> >> >
>> >> > Please don't introduce yet another incompatible behaviour between
>> >> > I/O schedulers. It's bad enough from an optimisation point of view
>> >> > that BIO_RW_SYNC and BIO_RW_META mean different things to different
>> >> > schedulers, let alone that only CFQ currently understands
>> >> > priorities. If you are going to introduce priorities into AS, then
>> >> > please, please, please make it use the same interface as CFQ.
>> >> >
>> >> > Why? Both the extN and XFS devs have been considering bumping the
>> >> > priority of journal writes using the existing CFQ-based I/O priority
>> >> > mechanism - the last thing I want to see is a different scheduler
>> >> > requiring a different priority configuration to acheive the same
>> >> > optimisation. There is no way we can support this sort of
>> >> > optimisation in the filesystem code if the interface changes when
>> >> > the I/O scheduler changes. So please use the existing IOPRIO classes
>> >> > to map the priorities for the AS scheduler.
>> >> >
>> >>
>> >> The anticipatory scheduler chooses it's next i/o to be of highest
>> >> available priority level.
>> >
>> > That sounds exactly like what the current RT class is supposed to
>> > be used for - defining the absolute priority of dispatch. How
>> > is this latency class different to the current RT class semantics
>> > that are defined for CFQ?
>> >
>>
>> I/O from RT class in CFQ can still see a bubble with this new latency
>> class. An easy way to check this would be to submit ios at multiple
>> levels both in CFQ and AS and check max latency of the highest levels.
>> I will let Jens or Satoshi comment on exact algorithm for RT class.
>
> You're missing my point entirely.
>
> You're defining a new class that has the exact same meaning as
> the current RT class definition, then mapping the BE class over
> the top of that, hence changing what that means for everyone.
>
> The fact that the *implementation* of AS and CFQ is different is
> irrelevant; if you use the RT class then on CFQ you get the current
> RT behaviour, if you use the RT class on AS you should get your new
> priority dispatch mechanism. We don't need a new API just because
> the implementations are different.
>

There is nothing "real-time" about the current RT class anyways. It is
basically these small *implementation* differences that defines these
classes in current scheme of things, precise definitions of which
would be very hard to find if one started looking around.

The current implementation of AS is basically a flat structure with
multiple priority levels. Initially I planned them to be different
levels of best-effort class, which is exactly what we are doing
"best-effort" from the scheduler/software point of view. So, the
question is what you do with other classes for which you don't have a
significantly different behavior: to keep things simple you map them
to existing flat structure. And, I mapped RT (all levels to BE 0),
idle (all levels to BE 7).

This leaves these RT and IDLE classes open for future implementations,
where one could use hardware priorities (may be in NCQ) to implement
RT class or other improvisations in software other than schedulers to
map to RT class.

Now the initial feedback was since this *implementation* is different
from anything we have in CFQ which is our current *standard* way of
thinking and comparing (that is the only thing that exists) why not
make them into a new class :). And somehow map others so that they
make some sense till we get something for those classes as well.

>> >> So, in some sense it kind of implements absolute priority and
>> >> is best used for jobs which are latency sensitive.  Since the
>> >> priorities can be and are mapped internally in anticipatory
>> >> scheduler, BEST_EFFORT class is mapped one-one with the LATENCY
>> >> class.
>> >
>> > So you map the BE class to something with the same semantics as
>> > the RT class? What mapping do you do when an application uses
>> > the RT class?
>> >
>>
>> Yes I could have used RT class but it was used in CFQ to implement
>> it's time-sliced based highest priority class.  If an application
>> uses RT class, AS maps all levels of RT class to BE class level 0
>> (i.e. to the highest priority available)
>
> Which means you are throwing away all the RT priority levels and
> so an application using the RT class would be subtly broken on AS....
>

As I said earlier the organization of the AS levels is flat, so we
could use any class (RT, BE, LATENCY) and fold the remaining ones. The
other way which you would probably like is to increase number of
levels and map different classes so that they are not folded.

>> >> A filesystem can use best-effort class using similar interface
>> >> as for cfq.
>> >
>> > The folk using the RT priority classes greatly objected to using
>> > the RT class for journal I/O precisely because it would then
>> > preempt their application's RT I/O and introduce unpredictable
>> > latencies.
>> >
>> > Journal I/O will typically use the highest priority BE class so
>> > that it is promoted above BE I/O but does not preempt RT I/O.
>> > With your mapping of BE classes to this new "absolute priority
>> > latency" class, this configuration will give journal I/O the
>> > highest priority in the scheduler. This will cause preemption of
>> > your latency sensitive I/O and so those latencies you are trying
>> > to avoid won't go away....
>> >
>>
>> I see your problem, we could make the LATENCY class different from
>> and above BE class (instead of one-one mapping).
>
> Like the RT class is currently defined to be? ;)
>

I agree with you and we could use RT (though you and I know that
basically it is best effort). LATENCY was invented due to a previous
suggestion.

> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@...morbit.com
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/