lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 6 May 2020 15:36:54 -0700
From:   Alexander Duyck <>
To:     Daniel Jordan <>
Cc:     Josh Triplett <>,
        Andrew Morton <>,
        Herbert Xu <>,
        Steffen Klassert <>,
        Alex Williamson <>,
        Alexander Duyck <>,
        Dan Williams <>,
        Dave Hansen <>,
        David Hildenbrand <>,
        Jason Gunthorpe <>,
        Jonathan Corbet <>,
        Kirill Tkhai <>,
        Michal Hocko <>, Pavel Machek <>,
        Pavel Tatashin <>,
        Peter Zijlstra <>,
        Randy Dunlap <>,
        Shile Zhang <>,
        Tejun Heo <>, Zi Yan <>,, linux-mm <>,
        LKML <>
Subject: Re: [PATCH 6/7] mm: parallelize deferred_init_memmap()

On Wed, May 6, 2020 at 3:21 PM Daniel Jordan <> wrote:
> On Tue, May 05, 2020 at 07:55:43AM -0700, Alexander Duyck wrote:
> > One question about this data. What is the power management
> > configuration on the systems when you are running these tests? I'm
> > just curious if CPU frequency scaling, C states, and turbo are
> > enabled?
> Yes, intel_pstate is loaded in active mode without hwp and with turbo enabled
> (those power management docs are great by the way!) and intel_idle is in use
> too.
> > I ask because that is what I have seen usually make the
> > difference in these kind of workloads as the throughput starts
> > dropping off as you start seeing the core frequency lower and more
> > cores become active.
> If I follow, you're saying there's a chance performance would improve with the
> above disabled, but how often would a system be configured that way?  Even if
> it were faster, the machine is configured how it's configured, or am I missing
> your point?

I think you might be missing my point. What I was getting at is that I
know for performance testing sometimes C states and P states get
disabled in order to get consistent results between runs, it sounds
like you have them enabled though. I was just wondering if you had
disabled them or not. If they were disabled then you wouldn't get the
benefits of turbo and as such adding more cores wouldn't come at a
penalty, while with it enabled the first few cores should start to
slow down as they fell out of turbo mode. So it may be part of the
reason why you are only hitting about 10x at full core count.

As it stands I think your code may speed up a bit if you split the
work up based on section instead of max order. That would get rid of
any cache bouncing you may be doing on the pageblock flags and reduce
the overhead for splitting the work up into individual pieces since
each piece will be bigger.

Powered by blists - more mailing lists