lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <YzVbDbLOYUVNnWRu@hirez.programming.kicks-ass.net>
Date:   Thu, 29 Sep 2022 10:45:01 +0200
From:   Peter Zijlstra <peterz@...radead.org>
To:     Gang Li <ligang.bdlg@...edance.com>
Cc:     Jonathan Corbet <corbet@....net>, Ingo Molnar <mingo@...hat.com>,
        Juri Lelli <juri.lelli@...hat.com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
        Daniel Bristot de Oliveira <bristot@...hat.com>,
        Valentin Schneider <vschneid@...hat.com>,
        linux-api@...r.kernel.org, linux-kernel@...r.kernel.org,
        linux-fsdevel@...r.kernel.org, linux-doc@...r.kernel.org
Subject: Re: [PATCH v4] sched/numa: add per-process numa_balancing


The alternative to this is ofcourse to have your latency critical
applications use mbind()/set_mempolicy() etc.., because surely, them
being timing critical, they have the infrastructure to do this right?

Because timing critical software doesn't want it's memory spread
randomly, because well random is bad for performance, hmm?

And once numa balancing sees all the memory has an expliciy policy, it
won't touch it.

On Thu, Sep 29, 2022 at 02:43:58PM +0800, Gang Li wrote:

> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index ef0e6b3e08ff..87215b3776c9 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -2818,6 +2818,24 @@ void task_numa_free(struct task_struct *p, bool final)
>  	}
>  }
>  
> +inline bool numa_balancing_enabled(struct task_struct *p)

Does this want to be static?

> +{
> +	if (p->mm) {
> +		int numab = p->mm->numab_enabled;
> +
> +		switch (numab) {
> +		case NUMAB_ENABLED:
> +			return true;
> +		case NUMAB_DISABLED:
> +			return false;
> +		case NUMAB_DEFAULT:
> +			break;
> +		}
> +	}
> +
> +	return static_branch_unlikely(&sched_numa_balancing);
> +}

Blergh, this sucks. Now you have the unconditional pointer chasing and
cache-misses. The advantage of sched_numa_balancing was that there is no
overhead when disabled.

Also, "numab" is a weird word.

What about something like:

static inline bool numa_balancing_enabled(struct task_struct *p)
{
	if (!static_branch_unlikely(&sched_numa_balancing))
		return false;

	if (p->mm) switch (p->mm->numa_balancing_mode) {
	case NUMA_BALANCING_ENABLED:
		return true;
	case NUMA_BALANCING_DISABLED:
		return false
	default:
		break;
	}

	return sysctl_numa_balancing_mode;
}

( Note how that all following the existing 'numa_balancing' wording
  without inventing weird new words. )

And then you frob the sysctl and prctl such that sched_numa_balancing
and sysctl_numa_balancing_mode are not tied together just so.
Specifically, I'm thinking you should use static_branch_inc() to count
how many enables you have, one for the default and one for each prctl().
Then it all just works.

> @@ -11581,8 +11599,10 @@ static void task_tick_fair(struct rq *rq, struct task_struct *curr, int queued)
>  		entity_tick(cfs_rq, se, queued);
>  	}
>  
> -	if (static_branch_unlikely(&sched_numa_balancing))
> +#ifdef CONFIG_NUMA_BALANCING
> +	if (numa_balancing_enabled(curr))
>  		task_tick_numa(rq, curr);
> +#endif
>  
>  	update_misfit_status(curr, rq);
>  	update_overutilized_status(task_rq(curr));

Surely you can make that #ifdef go away without much effort.

> diff --git a/kernel/sys.c b/kernel/sys.c
> index 8a6432465dc5..11720a35455a 100644
> --- a/kernel/sys.c
> +++ b/kernel/sys.c
> @@ -59,6 +59,7 @@
>  #include <linux/sched/coredump.h>
>  #include <linux/sched/task.h>
>  #include <linux/sched/cputime.h>
> +#include <linux/sched/numa_balancing.h>
>  #include <linux/rcupdate.h>
>  #include <linux/uidgid.h>
>  #include <linux/cred.h>
> @@ -2101,6 +2102,23 @@ static int prctl_set_auxv(struct mm_struct *mm, unsigned long addr,
>  	return 0;
>  }
>  
> +#ifdef CONFIG_NUMA_BALANCING
> +static int prctl_pid_numa_balancing_write(int numa_balancing)
> +{
> +	if (numa_balancing != PR_SET_NUMAB_DEFAULT
> +	    && numa_balancing != PR_SET_NUMAB_DISABLED
> +	    && numa_balancing != PR_SET_NUMAB_ENABLED)
> +		return -EINVAL;

Operators go at the end of the line.

> +	current->mm->numab_enabled = numa_balancing;
> +	return 0;
> +}

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ