lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 6 May 2020 18:43:35 -0400
From:   Daniel Jordan <>
To:     Alexander Duyck <>
Cc:     Daniel Jordan <>,
        Josh Triplett <>,
        Andrew Morton <>,
        Herbert Xu <>,
        Steffen Klassert <>,
        Alex Williamson <>,
        Alexander Duyck <>,
        Dan Williams <>,
        Dave Hansen <>,
        David Hildenbrand <>,
        Jason Gunthorpe <>,
        Jonathan Corbet <>,
        Kirill Tkhai <>,
        Michal Hocko <>, Pavel Machek <>,
        Pavel Tatashin <>,
        Peter Zijlstra <>,
        Randy Dunlap <>,
        Shile Zhang <>,
        Tejun Heo <>, Zi Yan <>,, linux-mm <>,
        LKML <>
Subject: Re: [PATCH 6/7] mm: parallelize deferred_init_memmap()

On Wed, May 06, 2020 at 03:36:54PM -0700, Alexander Duyck wrote:
> On Wed, May 6, 2020 at 3:21 PM Daniel Jordan <> wrote:
> >
> > On Tue, May 05, 2020 at 07:55:43AM -0700, Alexander Duyck wrote:
> > > One question about this data. What is the power management
> > > configuration on the systems when you are running these tests? I'm
> > > just curious if CPU frequency scaling, C states, and turbo are
> > > enabled?
> >
> > Yes, intel_pstate is loaded in active mode without hwp and with turbo enabled
> > (those power management docs are great by the way!) and intel_idle is in use
> > too.
> >
> > > I ask because that is what I have seen usually make the
> > > difference in these kind of workloads as the throughput starts
> > > dropping off as you start seeing the core frequency lower and more
> > > cores become active.
> >
> > If I follow, you're saying there's a chance performance would improve with the
> > above disabled, but how often would a system be configured that way?  Even if
> > it were faster, the machine is configured how it's configured, or am I missing
> > your point?
> I think you might be missing my point. What I was getting at is that I
> know for performance testing sometimes C states and P states get
> disabled in order to get consistent results between runs, it sounds
> like you have them enabled though. I was just wondering if you had
> disabled them or not. If they were disabled then you wouldn't get the
> benefits of turbo and as such adding more cores wouldn't come at a
> penalty, while with it enabled the first few cores should start to
> slow down as they fell out of turbo mode. So it may be part of the
> reason why you are only hitting about 10x at full core count.

All right, that makes way more sense.

> As it stands I think your code may speed up a bit if you split the
> work up based on section instead of max order. That would get rid of
> any cache bouncing you may be doing on the pageblock flags and reduce
> the overhead for splitting the work up into individual pieces since
> each piece will be bigger.

See my other mail.

Powered by blists - more mailing lists