lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200602142305.GA31901@xsang-OptiPlex-9020>
Date:   Tue, 2 Jun 2020 22:23:05 +0800
From:   Oliver Sang <oliver.sang@...el.com>
To:     Vincent Guittot <vincent.guittot@...aro.org>
Cc:     Ingo Molnar <mingo@...nel.org>, Ben Segall <bsegall@...gle.com>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Juri Lelli <juri.lelli@...hat.com>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Mel Gorman <mgorman@...e.de>, Mike Galbraith <efault@....de>,
        Peter Zijlstra <peterz@...radead.org>,
        Steven Rostedt <rostedt@...dmis.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        LKML <linux-kernel@...r.kernel.org>, lkp@...ts.01.org,
        OTC LSE PnP <otc.lse.pnp@...el.com>,
        "Huang, Ying" <ying.huang@...el.com>,
        "Si, Beibei" <beibei.si@...el.com>,
        "Du, Julie" <julie.du@...el.com>,
        "Chen, Yu C" <yu.c.chen@...el.com>,
        "Li, Aubrey" <aubrey.li@...el.com>,
        "Tang, Feng" <feng.tang@...el.com>,
        Rui Zhang <rui.zhang@...el.com>, oliver.sang@...el.com
Subject: Re: [sched/fair] 0b0695f2b3:
 phoronix-test-suite.compress-gzip.0.seconds 19.8% regression

On Tue, Jun 02, 2020 at 01:23:19PM +0800, Oliver Sang wrote:
> On Fri, May 29, 2020 at 07:26:01PM +0200, Vincent Guittot wrote:
> > On Mon, 25 May 2020 at 10:02, Vincent Guittot
> > <vincent.guittot@...aro.org> wrote:
> > >
> > > On Thu, 21 May 2020 at 10:28, Oliver Sang <oliver.sang@...el.com> wrote:
> > > >
> > > > On Wed, May 20, 2020 at 03:04:48PM +0200, Vincent Guittot wrote:
> > > > > On Thu, 14 May 2020 at 19:09, Vincent Guittot
> > > > > <vincent.guittot@...aro.org> wrote:
> > > > > >
> > > > > > Hi Oliver,
> > > > > >
> > > > > > On Thu, 14 May 2020 at 16:05, kernel test robot <oliver.sang@...el.com> wrote:
> > > > > > >
> > > > > > > Hi Vincent Guittot,
> > > > > > >
> > > > > > > Below report FYI.
> > > > > > > Last year, we actually reported an improvement "[sched/fair] 0b0695f2b3:
> > > > > > > vm-scalability.median 3.1% improvement" on link [1].
> > > > > > > but now we found the regression on pts.compress-gzip.
> > > > > > > This seems align with what showed in "[v4,00/10] sched/fair: rework the CFS
> > > > > > > load balance" (link [2]), where showed the reworked load balance could have
> > > > > > > both positive and negative effect for different test suites.
> > > > > >
> > > > > > We have tried to run  all possible use cases but it's impossible to
> > > > > > covers all so there were a possibility that one that is not covered,
> > > > > > would regressed.
> > > > > >
> > > > > > > And also from link [3], the patch set risks regressions.
> > > > > > >
> > > > > > > We also confirmed this regression on another platform
> > > > > > > (Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz with 8G memory),
> > > > > > > below is the data (lower is better).
> > > > > > > v5.4    4.1
> > > > > > > fcf0553db6f4c79387864f6e4ab4a891601f395e    4.01
> > > > > > > 0b0695f2b34a4afa3f6e9aa1ff0e5336d8dad912    4.89
> > > > > > > v5.5    5.18
> > > > > > > v5.6    4.62
> > > > > > > v5.7-rc2    4.53
> > > > > > > v5.7-rc3    4.59
> > > > > > >
> > > > > > > It seems there are some recovery on latest kernels, but not fully back.
> > > > > > > We were just wondering whether you could share some lights the further works
> > > > > > > on the load balance after patch set [2] which could cause the performance
> > > > > > > change?
> > > > > > > And whether you have plan to refine the load balance algorithm further?
> > > > > >
> > > > > > I'm going to have a look at your regression to understand what is
> > > > > > going wrong and how it can be fixed
> > > > >
> > > > > I have run the benchmark on my local setups to try to reproduce the
> > > > > regression and I don't see the regression. But my setups are different
> > > > > from your so it might be a problem specific to yours
> > > >
> > > > Hi Vincent, which OS are you using? We found the regression on Clear OS,
> > > > but it cannot reproduce on Debian.
> > > > On https://www.phoronix.com/scan.php?page=article&item=mac-win-linux2018&num=5
> > > > it was mentioned that -
> > > > Gzip compression is much faster out-of-the-box on Clear Linux due to it exploiting
> > > > multi-threading capabilities compared to the other operating systems Gzip support.
> > >
> > > I'm using Debian, I haven't noticed it was only on Clear OS.
> > > I'm going to look at it. Could you send me traces in the meantime ?
> > 
> > I run more tests to understand the problem. Even if Clear Linux uses
> > multithreading, the system is not overloaded and there is a
> > significant amount of idle time. This means that we use the has_spare
> > capacity path that spreads tasks on the system. At least that is what
> > I have seen in the KVM image. Beside this, I think that I have been
> > able to reproduce the problem on my platform with debian using pigz
> > instead of gzip for the compress-gzip-1.2.0 test. On my platform, I
> > can see a difference when I enable all CPU idle states whereas there
> > is no performance difference when only the shallowest idle state is
> > enabled.
> > 
> > The new load balance rework is more efficient at spreading tasks on
> > the system and one side effect could be that there is more idle time
> > between tasks wake up on each CPU. As a result, CPUs have to wake up
> > from a deeper idle state. This could explain the +54% increase of C6
> > usage that is reported.  Is it possible to get All C-state statistics
> > ?
> 
> Hi Vincent, sorry for late. I selected the turbostat while running
> compress-gzip on Clear OS, as attached. Besides v5.4 and v5.7-rc7,
> I got the data for v5.5 and v5.7, too. Not sure it's enough since you
> said "All C-state statistics". Hope it could be useful and if you want
> more data, please let us know.
> 
> Again, I collect these data on our "Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz
> with 8G memory" platform. for each release, I ran test twice to assure
> the results consistent.
> 
> phoronix-test-suite.compress-gzip.0.seconds
> release		run1	run2
> v5.4		4.37	4.37
> v5.5		5.37	5.46
> v5.7-rc7	4.82	4.85
> v5.7		4.86	4.83
> 
> > 
> > Could you run the test after disabling deep idle states like C6 and
> > above and check what is the difference between v5.7-rc7 and v5.4 ?
> 
> thanks for suggestion, will collect this data later.

Hi Vincent, I test by disabling deep idle states, but the regression
still exists, just the gap becomes smaller -
release		run1	run2
v5.4		4.32	4.3	<-- little change comparing to above
v5.5		5.04	5.06	<-- improves
v5.7-rc7	4.79	4.78
v5.7		4.77	4.77

I also attached turbostat data as attached.

> 
> > comparing fcf0553db6 ("sched/fair: Remove meaningless imbalance
> > calculation") and
> >   0b0695f2b3 ("sched/fair: Rework load_balance()") is not really
> > useful because they are part of the same rework and should be
> > considered a one single change.
> > 
> > >
> > > >
> > > > >
> > > > > After analysing the benchmark, it doesn't overload the system and is
> > > > > mainly based on 1 main gzip thread with few others waking up and
> > > > > sleeping around.
> > > > >
> > > > > I thought that scheduler could be too aggressive when trying to
> > > > > balance the threads on your system, which could generate more task
> > > > > migrations and impact the performance. But this doesn't seem to be the
> > > > > case because perf-stat.i.cpu-migrations is -8%. On the other side, the
> > > > > context switch is +16% and more interestingly idle state C1E and C6
> > > > > usages increase more than 50%. I don't know if we can rely or this
> > > > > value or not but I wonder if it could be that threads are now spread
> > > > > on different CPUs which generates idle time on the busy CPUs but the
> > > > > added time to enter/leave these states hurts the performance.
> > > > >
> > > > > Could you make some traces of both kernels ? Tracing sched events
> > > > > should be enough to understand the behavior
> > > > >
> > > > > Regards,
> > > > > Vincent
> > > > >
> > > > > >
> > > > > > Thanks
> > > > > > Vincent
> > > > > >
> > > > > > > thanks
> > > > > > >
> > > > > > > [1] https://lists.01.org/hyperkitty/list/lkp@lists.01.org/thread/SANC7QLYZKUNMM6O7UNR3OAQAKS5BESE/
> > > > > > > [2] https://lore.kernel.org/patchwork/cover/1141687/
> > > > > > > [3] https://www.phoronix.com/scan.php?page=news_item&px=Linux-5.5-Scheduler



Download attachment "compress-gzip-turbostat-no-c6.tgz" of type "application/x-gtar-compressed" (8275 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ