linux-kernel - Re: [PATCH V3 00/11] block-throttle: add .high limit

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <EE90477C-FD56-48C1-8A45-97AF39A8BB0C@unimore.it>
Date:   Tue, 4 Oct 2016 19:43:48 +0200
From:   Paolo Valente <paolo.valente@...more.it>
To:     Shaohua Li <shli@...com>
Cc:     Tejun Heo <tj@...nel.org>, Vivek Goyal <vgoyal@...hat.com>,
        linux-block@...r.kernel.org, linux-kernel@...r.kernel.org,
        Jens Axboe <axboe@...com>, Kernel-team@...com,
        jmoyer@...hat.com, Mark Brown <broonie@...nel.org>,
        Linus Walleij <linus.walleij@...aro.org>,
        Ulf Hansson <ulf.hansson@...aro.org>
Subject: Re: [PATCH V3 00/11] block-throttle: add .high limit


> Il giorno 04 ott 2016, alle ore 19:28, Shaohua Li <shli@...com> ha scritto:
> 
> On Tue, Oct 04, 2016 at 07:01:39PM +0200, Paolo Valente wrote:
>> 
>>> Il giorno 04 ott 2016, alle ore 18:27, Tejun Heo <tj@...nel.org> ha scritto:
>>> 
>>> Hello,
>>> 
>>> On Tue, Oct 04, 2016 at 06:22:28PM +0200, Paolo Valente wrote:
>>>> Could you please elaborate more on this point?  BFQ uses sectors
>>>> served to measure service, and, on the all the fast devices on which
>>>> we have tested it, it accurately distributes
>>>> bandwidth as desired, redistributes excess bandwidth with any issue,
>>>> and guarantees high responsiveness and low latency at application and
>>>> system level (e.g., ~0 drop rate in video playback, with any background
>>>> workload tested).
>>> 
>>> The same argument as before.  Bandwidth is a very bad measure of IO
>>> resources spent.  For specific use cases (like desktop or whatever),
>>> this can work but not generally.
>>> 
>> 
>> Actually, we have already discussed this point, and IMHO the arguments
>> that (apparently) convinced you that bandwidth is the most relevant
>> service guarantee for I/O in desktops and the like, prove that
>> bandwidth is the most important service guarantee in servers too.
>> 
>> Again, all the examples I can think of seem to confirm it:
>> . file hosting: a good service must guarantee reasonable read/write,
>> i.e., download/upload, speeds to users
>> . file streaming: a good service must guarantee low drop rates, and
>> this can be guaranteed only by guaranteeing bandwidth and latency
>> . web hosting: high bandwidth and low latency needed here too
>> . clouds: high bw and low latency needed to let, e.g., users of VMs
>> enjoy high responsiveness and, for example, reasonable file-copy
>> time
>> ...
>> 
>> To put in yet another way, with packet I/O in, e.g., clouds, there are
>> basically the same issues, and the main goal is again guaranteeing
>> bandwidth and low latency among nodes.
>> 
>> Could you please provide a concrete server example (assuming we still
>> agree about desktops), where I/O bandwidth does not matter while time
>> does?
> 
> I don't think IO bandwidth does not matter. The problem is bandwidth can't
> measure IO cost. For example, you can't say 8k IO costs 2x IO resource than 4k
> IO.
> 

For what goal do you need to be able to say this, once you succeeded
in guaranteeing bandwidth and low latency to each
process/client/group/node/user?

>>>> Could you please suggest me some test to show how sector-based
>>>> guarantees fails?
>>> 
>>> Well, mix 4k random and sequential workloads and try to distribute the
>>> acteual IO resources.
>>> 
>> 
>> 
>> If I'm not mistaken, we have already gone through this example too,
>> and I thought we agreed on what service scheme worked best, again
>> focusing only on desktops.  To make a long story short(er), here is a
>> snippet from one of our last exchanges.
>> 
>> ----------
>> 
>> On Sat, Apr 16, 2016 at 12:08:44AM +0200, Paolo Valente wrote:
>>> Maybe the source of confusion is the fact that a simple sector-based,
>>> proportional share scheduler always distributes total bandwidth
>>> according to weights. The catch is the additional BFQ rule: random
>>> workloads get only time isolation, and are charged for full budgets,
>>> so as to not affect the schedule of quasi-sequential workloads. So,
>>> the correct claim for BFQ is that it distributes total bandwidth
>>> according to weights (only) when all competing workloads are
>>> quasi-sequential. If some workloads are random, then these workloads
>>> are just time scheduled. This does break proportional-share bandwidth
>>> distribution with mixed workloads, but, much more importantly, saves
>>> both total throughput and individual bandwidths of quasi-sequential
>>> workloads.
>>> 
>>> We could then check whether I did succeed in tuning timeouts and
>>> budgets so as to achieve the best tradeoffs. But this is probably a
>>> second-order problem as of now.
> 
> I don't see why random/sequential matters for SSD. what really matters is
> request size and IO depth. Time scheduling is skeptical too, as workloads can
> dispatch all IO within almost 0 time in high queue depth disks.
> 

That's an orthogonal issue.  If what matter is, e.g., size, then it is
enough to replace "sequential I/O" with "large-request I/O".  In case
I have been too vague, here is an example: I mean that, e.g, in an I/O
scheduler you replace the function that computes whether a queue is
seeky based on request distance, with a function based on
request size.  And this is exactly what has been already done, for
example, in CFQ:

	if (blk_queue_nonrot(cfqd->queue))
		cfqq->seek_history |= (n_sec < CFQQ_SECT_THR_NONROT);
	else
		cfqq->seek_history |= (sdist > CFQQ_SEEK_THR);

Thanks,
Paolo

> Thanks,
> Shaohua


--
Paolo Valente
Algogroup
Dipartimento di Scienze Fisiche, Informatiche e Matematiche
Via Campi 213/B
41125 Modena - Italy
http://algogroup.unimore.it/people/paolo/