lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Sun, 23 Apr 2017 18:21:33 -0700
From:   Srinivas Pandruvada <srinivas.pandruvada@...ux.intel.com>
To:     "Rafael J. Wysocki" <rafael@...nel.org>,
        Doug Smythies <dsmythies@...us.net>
Cc:     "Rafael J. Wysocki" <rjw@...ysocki.net>,
        Mel Gorman <mgorman@...hsingularity.net>,
        Rafael Wysocki <rafael.j.wysocki@...el.com>,
        Jörg Otte <jrg.otte@...il.com>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Linux PM <linux-pm@...r.kernel.org>
Subject: Re: Performance of low-cpu utilisation benchmark regressed severely
 since 4.6

On Mon, 2017-04-24 at 02:59 +0200, Rafael J. Wysocki wrote:
> On Sun, Apr 23, 2017 at 5:31 PM, Doug Smythies <dsmythies@...us.net>
> wrote:
[...]

> > It looks like the cost is mostly related to moving the load from
> > > one CPU to
> > > another and waiting for the new one to ramp up then.
Last time when we analyzed Mel's result last year this was the
conclusion. The problem was more apparent on systems with per core P-
state.

> > > 
> > > I guess the workload consists of many small tasks that each start
> > > on new CPUs
> > > and cause that ping-pong to happen.
> > Yes, and (from trace data) many tasks are very very very small.
> > Also the test
> > appears to take a few holidays, of up to 1 second, during
> > execution.
> > 
> > > 
> > > > 
> > > > (performance governor, restated from a previous e-mail: 1776.05
> > > > seconds)
> > > But that causes the processor to stay in the maximum sustainable
> > > P-state all
> > > the time, which on Sandy Bridge is quite costly energetically.
> > Agreed. I only provide these data points as a reference and so that
> > we know
> > what the boundary conditions (limits) are.
> > 
> > > 
> > > We can do one more trick I forgot about.  Namely, if we are about
> > > to increase
> > > the P-state, we can jump to the average between the target and
> > > the max
> > > instead of just the target, like in the appended patch (on top of
> > > linux-next).
> > > 
> > > That will make the P-state selection really aggressive, so costly
> > > energetically,
> > > but it shoud small jumps of the average load above 0 to case big
> > > jumps of
> > > the target P-state.
> > I'm already seeing the energy costs of some of this stuff.
> > 3050.2 Seconds.
> Is this with or without reducing the sampling interval?
> 
> > 
> > Idle power 4.06 Watts.
> > 
> > Idle power for kernel 4.11-rc7 (performance-based): 3.89 Watts.
> > Idle power for kernel 4.11-rc7, using load-based: 4.01 watts
> > Idle power for kernel 4.11-rc7 next linux-pm: 3.91 watts
> Power draw differences are not dramatic, so this might be a viable
> change depending on the influence on the results elsewhere.
Last time a solution proposed to have higher floor instead of min-
pstate for Atom platforms. But this end up in increasing power
consumption on some Android workloads.

Thanks,
Srinivas

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ