[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAE4VaGA+GOh-wgHBbSsgpRVXgrGtz8egu6dYp143TAH0siL5fA@mail.gmail.com>
Date: Fri, 20 Mar 2020 16:33:43 +0100
From: Jirka Hladky <jhladky@...hat.com>
To: linux-kernel <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 00/13] Reconcile NUMA balancing decisions with the load
balancer v6
> MPI or OMP and what is a low thread count? For MPI at least, I saw a 0.4%
> gain on an 4-node machine for bt_C and a 3.88% regression on 8-nodes. I
> think it must be OMP you are using because I found I had to disable UA
> for MPI at some point in the past for reasons I no longer remember.
Yes, it's indeed OMP. With low threads count, I mean up to 2x number
of NUMA nodes (8 threads on 4 NUMA node servers, 16 threads on 8 NUMA
node servers).
> One possibility would be to spread wide always at clone time and assume
> wake_affine will pull related tasks but it's fragile because it breaks
> if the cloned task execs and then allocates memory from a remote node
> only to migrate to a local node immediately.
I think the only way to find out how it performs is to test it. If you
could prepare a patch like that, I'm more than happy to give it a try!
Jirka
On Fri, Mar 20, 2020 at 4:22 PM Mel Gorman <mgorman@...hsingularity.net> wrote:
>
> On Fri, Mar 20, 2020 at 03:37:44PM +0100, Jirka Hladky wrote:
> > Hi Mel,
> >
> > just a quick update. I have increased the testing coverage and other tests
> > from the NAS shows a big performance drop for the low number of threads as
> > well:
> >
> > sp_C_x - show still the biggest drop upto 50%
> > bt_C_x - performance drop upto 40%
> > ua_C_x - performance drop upto 30%
> >
>
> MPI or OMP and what is a low thread count? For MPI at least, I saw a 0.4%
> gain on an 4-node machine for bt_C and a 3.88% regression on 8-nodes. I
> think it must be OMP you are using because I found I had to disable UA
> for MPI at some point in the past for reasons I no longer remember.
>
> > My point is that the performance drop for the low number of threads is more
> > common than we have initially thought.
> >
> > Let me know what you need more data.
> >
>
> I just a clarification on the thread count and a confirmation it's OMP. For
> MPI, I did note that some of the other NAS kernels shows a slight dip but
> it was nowhere near as severe as SP and the problem was the same as more --
> two or more tasks stayed on the same node without spreading out because
> there was no pressure to do so. There was enough CPU and memory capacity
> with no obvious pattern that could be used to spread the load wide early.
>
> One possibility would be to spread wide always at clone time and assume
> wake_affine will pull related tasks but it's fragile because it breaks
> if the cloned task execs and then allocates memory from a remote node
> only to migrate to a local node immediately.
>
> --
> Mel Gorman
> SUSE Labs
>
--
-Jirka
Powered by blists - more mailing lists