linux-kernel - [performance regression, bisected] scheduler: should_we

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Date:	Tue, 10 Sep 2013 14:02:54 +1000
From:	Dave Chinner <david@...morbit.com>
To:	linux-kernel@...r.kernel.org
Cc:	Joonsoo Kim <iamjoonsoo.kim@....com>, Paul Turner <pjt@...gle.com>,
	Peter Zijlstra <peterz@...radead.org>,
	Ingo Molnar <mingo@...nel.org>
Subject: [performance regression, bisected] scheduler: should_we_balance()
 kills filesystem performance

Hi folks,

I just updated my performance test VM to the current 3.12-git
tree after the XFS dev branch was merged. The first test I ran
which was a 16-way concurrent fsmark test to create lots of files
gave me a number about 30% lower than I expected - ~180k files/s
when I was expecting somewhere around 250k files/s.

I did a bisect, and the bisect landed on this commit:

commit 23f0d2093c789e612185180c468fa09063834e87
Author: Joonsoo Kim <iamjoonsoo.kim@....com>
Date:   Tue Aug 6 17:36:42 2013 +0900

    sched: Factor out code to should_we_balance()

    Now checking whether this cpu is appropriate to balance or not
    is embedded into update_sg_lb_stats() and this checking has no direct
    relationship to this function. There is not enough reason to place
    this checking at update_sg_lb_stats(), except saving one iteration
    for sched_group_cpus.
....

Now, i couldn't revert that patch by itself, but I reverted the
series of about 10 scheduler patches in that series total from a
current TOT and the regression went away. Hence I'm pretty confident
that the this is the patch causing the issue as i've verified it in
more than one way and the difference between "good" and "bad" was
signficantlt greater than the variance of the test (1.5-2 stddev
difference).

In more detail:

			v4 filesystem		v5 filesystem
3.11+xfsdev:		220k files/s		225k files/s
3.12-git		180k files/s		185k files/s
3.12-git-revert		245k files/s		247k files/s

The test vm is a 16p/16GB RAM VM, with a sparse 100TB filesystem
image sitting on a 4-way RAID0 SSD array formatted with XFS and the
image file is accessed by virtio+direct IO. The fsmark command line
is:

time ./fs_mark  -D  10000  -S0  -n  100000  -s  0  -L  32 \
        -d  /mnt/scratch/0  -d  /mnt/scratch/1 \
        -d  /mnt/scratch/2  -d  /mnt/scratch/3 \
        -d  /mnt/scratch/4  -d  /mnt/scratch/5 \
        -d  /mnt/scratch/6  -d  /mnt/scratch/7 \
        -d  /mnt/scratch/8  -d  /mnt/scratch/9 \
        -d  /mnt/scratch/10  -d  /mnt/scratch/11 \
        -d  /mnt/scratch/12  -d  /mnt/scratch/13 \
        -d  /mnt/scratch/14  -d  /mnt/scratch/15 \
        | tee >(stats --trim-outliers | tail -1 1>&2)

The workload on XFS runs to almost being CPU bound - the effect of
the above patch was that there was a lot of idle time left in the
system. The workload consumed the same amount of user and system
CPU, just instantaneous CPU usage was reduced by 20-30% and the
elaspsed time was increased by 20-30%.

Cheers,

Dave.
-- 
Dave Chinner
david@...morbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/