[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <98b3df1-79b7-836f-e334-afbdd594b55@inria.fr>
Date: Tue, 19 Dec 2023 18:51:18 +0100 (CET)
From: Julia Lawall <julia.lawall@...ia.fr>
To: Vincent Guittot <vincent.guittot@...aro.org>
cc: Peter Zijlstra <peterz@...radead.org>, Ingo Molnar <mingo@...hat.com>, 
    Dietmar Eggemann <dietmar.eggemann@....com>, Mel Gorman <mgorman@...e.de>, 
    linux-kernel@...r.kernel.org
Subject: Re: EEVDF and NUMA balancing
> > One CPU has 2 threads, and the others have one.  The one with two threads
> > is returned as the busiest one.  But nothing happens, because both of them
> > prefer the socket that they are on.
>
> This explains way load_balance uses migrate_util and not migrate_task.
> One CPU with 2 threads can be overloaded
>
> ok, so it seems that your 1st problem is that you have 2 threads on
> the same CPU whereas you should have an idle core in this numa node.
> All cores are sharing the same LLC, aren't they ?
Sorry, not following this.
Socket 1 has N-1 threads, and thus an idle CPU.
Socket 2 has N+1 threads, and thus one CPU with two threads.
Socket 1 tries to steal from that one CPU with two threads, but that
fails, because both threads prefer being on Socket 2.
Since most (or all?) of the threads on Socket 2 perfer being on Socket 2.
the only hope for Socket 1 to fill in its idle core is active balancing.
But active balancing is not triggered because of migrate_util and because
CPU_NEWLY_IDLE prevents the failure counter from ebing increased.
The part that I am currently missing to understand is that when I convert
CPU_NEWLY_IDLE to CPU_IDLE, it typically picks a CPU with only one thread
as busiest.  I have the impression that the fbq_type intervenes to cause
it to avoid the CPU with two threads that already prefer Socket 2.  But I
don't know at the moment why that is the case.  In any case, it's fine to
active balance from a CPU with only one thread, because Socket 2 will
even itself out afterwards.
>
> You should not have more than 1 thread per CPU when there are N+1
> threads on a node with N cores / 2N CPUs.
Hmm, I think there is a miscommunication about cores and CPUs.  The
machine has two sockets with 16 physical cores each, and thus 32
hyperthreads.  There are 64 threads running.
julia
> This will enable the
> load_balance to try to migrate a task instead of some util(ization)
> and you should reach the active load balance.
>
> >
> > > In theory you should have the
> > > local "group_has_spare" and the busiest "group_fully_busy" (at most).
> > > This means that no group should be overloaded and load_balance should
> > > not try to migrate utli but only task
> >
> > I didn't collect information about the groups.  I will look into that.
> >
> > julia
> >
> > >
> > >
> > > >
> > > > and changing the above test to:
> > > >
> > > >         if ((env->migration_type == migrate_task || env->migration_type == migrate_util) &&
> > > >             (sd->nr_balance_failed > sd->cache_nice_tries+2))
> > > >
> > > > seems to solve the problem.
> > > >
> > > > I will test this on more applications.  But let me know if the above
> > > > solution seems completely inappropriate.  Maybe it violates some other
> > > > constraints.
> > > >
> > > > I have no idea why this problem became more visible with EEVDF.  It seems
> > > > to have to do with the time slices all turning out to be the same.  I got
> > > > the same behavior in 6.5 by overwriting the timeslice calculation to
> > > > always return 1.  But I don't see the connection between the timeslice and
> > > > the behavior of the idle task.
> > > >
> > > > thanks,
> > > > julia
> > >
>
Powered by blists - more mailing lists
 
