lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YJzUAHQwFj1x2HCH@localhost.localdomain>
Date:   Thu, 13 May 2021 10:23:44 +0300
From:   Alexey Dobriyan <adobriyan@...il.com>
To:     Thomas Gleixner <tglx@...utronix.de>
Cc:     mingo@...hat.com, peterz@...radead.org,
        linux-kernel@...r.kernel.org, x86@...nel.org
Subject: Re: [PATCH 1/4] sched: make nr_running() return 32-bit

On Thu, May 13, 2021 at 01:58:16AM +0200, Thomas Gleixner wrote:
> Alexey,
> 
> On Thu, Apr 22 2021 at 23:02, Alexey Dobriyan wrote:
> > Creating 2**32 tasks is impossible due to futex pid limits and wasteful
> > anyway. Nobody has done it.
> >
> 
> this whole pile lacks useful numbers. What's the actual benefit of that
> churn?

The long term goal is to use 32-bit data more. People will see it in
core kernel and copy everywhere elase.

> Just with the default config for one of my reference machines:
> 
>    text		data	bss	dec	 hex	 filename
> 16679864	6627950	1671296	24979110 17d26a6 ../build/vmlinux-before
> 16679894	6627950	1671296	24979140 17d26c4 ../build/vmlinux-after
> ------------------------------------------------------------------------
>      +30
> 
> I'm truly impressed by the massive savings of this change and I'm even
> more impressed by the justification:
> 
> > Bring nr_running() into 32-bit world to save on REX prefixes.

I collected numbers initially but then stopped because noone cared and
they can be config and arch dependent.

> Aside of the obvious useless churn,

oh... Sometimes I think churn is the whole point.

> REX prefixes are universaly true for
> all architectures, right? There is a world outside x86 ...

In general, 32-bitness is preferred for code generation.

32-bit RISCs naturally prefers 32-bit.

64-bit RISCs don't care because they remember 32-bit roots and
have necessary 32-bit fixed width(!) instructions.

x86_64 is the only arch where going 64-bit generally adds more bytes
to the instruction stream.

Effects can be smudged by compilers of course, in this case, percpu
stuff. That "unsigned int i" is a mistake. Proper diff looks like this:

	-ffffffff811115fa: 8b 44 18 04      mov    eax,DWORD PTR [rax+rbx*1+0x4]
	-ffffffff811115fe: 49 01 c4         add    r12,rax
	+ffffffff811115fa: 44 03 64 18 04   add    r12d,DWORD PTR [rax+rbx*1+0x4]

--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4348,9 +4348,10 @@ context_switch(struct rq *rq, struct task_struct *prev,
  * externally visible scheduler statistics: current number of runnable
  * threads, total number of context switches performed since bootup.
  */
-unsigned long nr_running(void)
+unsigned int nr_running(void)
 {
-	unsigned long i, sum = 0;
+	unsigned int sum = 0;
+	unsigned long i;
 
 	for_each_online_cpu(i)
 		sum += cpu_rq(i)->nr_running;

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ