linux-kernel - Re: CFQ read performance regression

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Sat, 1 May 2010 14:13:14 +0200
From:	Corrado Zoccolo <czoccolo@...il.com>
To:	Vivek Goyal <vgoyal@...hat.com>
Cc:	Miklos Szeredi <mszeredi@...e.cz>,
	Jens Axboe <jens.axboe@...cle.com>,
	linux-kernel <linux-kernel@...r.kernel.org>,
	Jan Kara <jack@...e.cz>, Suresh Jayaraman <sjayaraman@...e.de>
Subject: Re: CFQ read performance regression

On Wed, Apr 28, 2010 at 10:02 PM, Vivek Goyal <vgoyal@...hat.com> wrote:
> On Tue, Apr 27, 2010 at 07:25:14PM +0200, Corrado Zoccolo wrote:
>> On Mon, Apr 26, 2010 at 9:14 PM, Vivek Goyal <vgoyal@...hat.com> wrote:
>> > On Sat, Apr 24, 2010 at 10:36:48PM +0200, Corrado Zoccolo wrote:
>> >
>> > [..]
>> >> >> Anyway, if that's the case, then we probably need to allow IO from
>> >> >> multiple sequential readers and keep a watch on throughput. If throughput
>> >> >> drops then reduce the number of parallel sequential readers. Not sure how
>> >> >> much of code that is but with multiple cfqq going in parallel, ioprio
>> >> >> logic will more or less stop working in CFQ (on multi-spindle hardware).
>> >> Hi Vivek,
>> >> I tried to implement exactly what you are proposing, see the attached patches.
>> >> I leverage the queue merging features to let multiple cfqqs share the
>> >> disk in the same timeslice.
>> >> I changed the queue split code to trigger on throughput drop instead
>> >> of on seeky pattern, so diverging queues can remain merged if they
>> >> have good throughput. Moreover, I measure the max bandwidth reached by
>> >> single queues and merged queues (you can see the values in the
>> >> bandwidth sysfs file).
>> >> If merged queues can outperform non-merged ones, the queue merging
>> >> code will try to opportunistically merge together queues that cannot
>> >> submit enough requests to fill half of the NCQ slots. I'd like to know
>> >> if you can see any improvements out of this on your hardware. There
>> >> are some magic numbers in the code, you may want to try tuning them.
>> >> Note that, since the opportunistic queue merging will start happening
>> >> only after merged queues have shown to reach higher bandwidth than
>> >> non-merged queues, you should use the disk for a while before trying
>> >> the test (and you can check sysfs), or the merging will not happen.
>> >
>> > Hi Corrado,
>> >
>> > I ran these patches and I did not see any improvement. I think the reason
>> > being that no cooperative queue merging took place and we did not have
>> > any data for throughput with coop flag on.
>> >
>> > #cat /sys/block/dm-3/queue/iosched/bandwidth
>> > 230     753     0
>> >
>> > I think we need to implement something similiar to hw_tag detection logic
>> > where we allow dispatches from multiple sync-idle queues at a time and try
>> > to observe the BW. After certain window once we have observed the window,
>> > then set the system behavior accordingly.
>> Hi Vivek,
>> thanks for testing. Can you try changing the condition to enable the
>> queue merging to also consider the case in which max_bw[1] == 0 &&
>> max_bw[0] > 100MB/s (notice that max_bw is measured in
>> sectors/jiffie).
>> This should rule out low end disks, and enable merging where it can be
>> beneficial.
>> If the results are good, but we find this enabling condition
>> unreliable, then we can think of a better way, but I'm curious to see
>> if the results are promising before thinking to the details.
>
> Ok, I made some changes and now some queue merging seems to be happening
> and I am getting little better BW. This will require more debugging. I
> will try to spare some time later.
>
> Kernel=2.6.34-rc5-corrado-multicfq
> DIR= /mnt/iostmnt/fio          DEV= /dev/mapper/mpathe
> Workload=bsr       iosched=cfq      Filesz=1G    bs=16K
> ==========================================================================
> job       Set NR  ReadBW(KB/s)   MaxClat(us)    WriteBW(KB/s)  MaxClat(us)
> ---       --- --  ------------   -----------    -------------  -----------
> bsr       1   1   136457         67353          0              0
> bsr       1   2   148008         192415         0              0
> bsr       1   4   180223         535205         0              0
> bsr       1   8   166983         542326         0              0
> bsr       1   16  176617         832188         0              0
>
> Output of iosched/bandwidth
>
> 0       546     740
>
> I did following changes on top of your patch.
This is becoming interesting. I think a major limitation of the
current approach is that it is too easy for a merged queue to be
separated again.
My code:
   if (cfq_cfqq_coop(cfqq) && bw <= cfqd->max_bw[1] * 9/10)
                cfq_mark_cfqq_split_coop(cfqq);
will immediately split any merged queue as soon as max_bw[1] grows too
much, so it should be based on max_bw[0].
Moreover, this code will likely split off all cics from the merged
queue, while it would be much better to split off only the cics that
are receiving less than their fair share of the BW (this will also
improve the fairness of the scheduler when queues are merged).

Corrado
>
> Vivek
>
> ---
>  block/cfq-iosched.c |   11 +++++++++--
>  1 files changed, 9 insertions(+), 2 deletions(-)
>
> diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
> index 4e9e015..7589c44 100644
> --- a/block/cfq-iosched.c
> +++ b/block/cfq-iosched.c
> @@ -243,6 +243,7 @@ struct cfq_data {
>         */
>        int hw_tag_est_depth;
>        unsigned int hw_tag_samples;
> +       unsigned int cfqq_merged_samples;
>        /*
>         * performance measurements
>         * max_bw is indexed by coop flag.
> @@ -1736,10 +1737,14 @@ static struct cfq_queue *cfqq_close(struct cfq_data *cfqd,
>        // Opportunistic queue merging could be beneficial even on far queues
>        // We enable it only on NCQ disks, if we observed that merged queues
>        // can reach higher bandwidth than single queues.
> +       // 204 sectors per jiffy is equivalent to 100MB/s on 1000 HZ conf.
> +       // Allow merge if we don't have sufficient merged cfqq samples.
>        rs = cur_cfqq->allocated[READ] + cur_cfqq->allocated[WRITE];
> -       if (cfqd->hw_tag && cfqd->max_bw[1] > cfqd->max_bw[0] &&
> +       if (cfqd->hw_tag
> +          && (cfqd->max_bw[1] > cfqd->max_bw[0]
> +              || (cfqd->max_bw[0] > 204 && !sample_valid(cfqd->cfqq_merged_samples)))
>            // Do not overload the device queue
> -           rs < cfqd->hw_tag_est_depth / 2) {
> +           && rs < cfqd->hw_tag_est_depth / 2) {
>                unsigned r1 = __cfqq->allocated[READ] + __cfqq->allocated[WRITE];
>                unsigned r2 = __cfqq1->allocated[READ] + __cfqq1->allocated[WRITE];
>                // Prefer merging with a queue that has fewer pending requests
> @@ -1750,6 +1755,8 @@ static struct cfq_queue *cfqq_close(struct cfq_data *cfqd,
>                // Do not merge if the merged queue would have too many requests
>                if (r1 + rs > cfqd->hw_tag_est_depth / 2)
>                        return NULL;
> +               cfqd->cfqq_merged_samples++;
> +
>                // Merge only if the BW of the two queues is comparable
>                if (abs(__cfqq->last_bw - cur_cfqq->last_bw) * 4 < cfqd->max_bw[0])
>                        return __cfqq;
> --
> 1.6.2.5
>
>>
>> Thanks,
>> Corrado
>>
>> >
>> > Kernel=2.6.34-rc5-corrado-multicfq
>> > DIR= /mnt/iostmnt/fio          DEV= /dev/mapper/mpathe
>> > Workload=bsr       iosched=cfq      Filesz=2G    bs=4K
>> > ==========================================================================
>> > job       Set NR  ReadBW(KB/s)   MaxClat(us)    WriteBW(KB/s)  MaxClat(us)
>> > ---       --- --  ------------   -----------    -------------  -----------
>> > bsr       1   1   126590         61448          0              0
>> > bsr       1   2   127849         242843         0              0
>> > bsr       1   4   131886         508021         0              0
>> > bsr       1   8   131890         398241         0              0
>> > bsr       1   16  129167         454244         0              0
>> >
>> > Thanks
>> > Vivek
>> >
>>
>>
>>
>> --
>> __________________________________________________________________________
>>
>> dott. Corrado Zoccolo                          mailto:czoccolo@...il.com
>> PhD - Department of Computer Science - University of Pisa, Italy
>> --------------------------------------------------------------------------
>> The self-confidence of a warrior is not the self-confidence of the average
>> man. The average man seeks certainty in the eyes of the onlooker and calls
>> that self-confidence. The warrior seeks impeccability in his own eyes and
>> calls that humbleness.
>>                                Tales of Power - C. Castaneda
>



-- 
__________________________________________________________________________

dott. Corrado Zoccolo                          mailto:czoccolo@...il.com
PhD - Department of Computer Science - University of Pisa, Italy
--------------------------------------------------------------------------
The self-confidence of a warrior is not the self-confidence of the average
man. The average man seeks certainty in the eyes of the onlooker and calls
that self-confidence. The warrior seeks impeccability in his own eyes and
calls that humbleness.
                               Tales of Power - C. Castaneda
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/