linux-kernel - Re: [rfc] forked kernel task and mm structures imbalanced on NUMA

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Tue, 01 Jun 2010 10:16:42 +0200
From:	Peter Zijlstra <peterz@...radead.org>
To:	Nick Piggin <npiggin@...e.de>
Cc:	Linus Torvalds <torvalds@...ux-foundation.org>,
	Ingo Molnar <mingo@...hat.com>, linux-mm@...ck.org,
	linux-kernel@...r.kernel.org,
	Lee Schermerhorn <lee.schermerhorn@...com>
Subject: Re: [rfc] forked kernel task and mm structures imbalanced on NUMA

On Tue, 2010-06-01 at 17:33 +1000, Nick Piggin wrote:
> Another problem I found when testing this patch is that the scheduler
> has some issues of its own when balancing. This is improved by
> traversing the sd groups starting from a different spot each time, so
> processes get sprinkled around the nodes a bit better.

Right, makes sense. And I think we could merge that group iteration
without much problems.

Your alternative placement for sched_exec() seems to make sense too, the
earlier we do that the better the memory allocations will be.

Your changes to sched_fork() and wake_up_new_task() made my head hurt a
bit -- but that's not your fault. I'm not quite sure why you're changing
that though.

The addition of sched_fork_suggest_cpu() to select a target node seems
to make sense, but since you then call fork balancing a second time we
have a chance of ending up on a totally different node all together.

So I think it would make sense to rework the fork balancing muck to be
called only once and stick with its decision.

One thing that would make the whole fork path much easier is fully
ripping out that child_runs_first mess for CONFIG_SMP, I think its been
disabled by default for long enough, and its always been broken in the
face of fork balancing anyway.

So basically we have to move fork balancing back to sched_fork(), I'd
have to again look at wth happens to ->cpus_allowed, but I guess it
should be fixable, and I don't think we should care overly much about
cpu-hotplug.

A specific code comment:

> @@ -2550,14 +2561,16 @@ void wake_up_new_task(struct task_struct
>          * We set TASK_WAKING so that select_task_rq() can drop rq->lock
>          * without people poking at ->cpus_allowed.
>          */
> -       cpu = select_task_rq(rq, p, SD_BALANCE_FORK, 0);
> -       set_task_cpu(p, cpu);
> -
> -       p->state = TASK_RUNNING;
> -       task_rq_unlock(rq, &flags);
> +       if (!cpumask_test_cpu(cpu, &p->cpus_allowed)) {
> +               p->state = TASK_WAKING;
> +               cpu = select_task_rq(rq, p, SD_BALANCE_FORK, 0);
> +               set_task_cpu(p, cpu);
> +               p->state = TASK_RUNNING;
> +               task_rq_unlock(rq, &flags);
> +               rq = task_rq_lock(p, &flags);
> +       }
>  #endif

That's iffy because p->cpus_allowed isn't stable when we're not holding
the task's current rq->lock or p->state is not TASK_WAKING.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/