lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <87eee941rv.ffs@nanos.tec.linutronix.de>
Date:   Fri, 14 May 2021 20:18:44 +0200
From:   Thomas Gleixner <tglx@...utronix.de>
To:     Ingo Molnar <mingo@...nel.org>
Cc:     Alexey Dobriyan <adobriyan@...il.com>, mingo@...hat.com,
        Borislav Petkov <bp@...en8.de>, linux-kernel@...r.kernel.org,
        x86@...nel.org, Linus Torvalds <torvalds@...ux-foundation.org>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Peter Zijlstra <peterz@...radead.org>
Subject: Re: [PATCH 1/4] sched: make nr_running() return 32-bit

On Thu, May 13 2021 at 11:58, Ingo Molnar wrote:
> * Thomas Gleixner <tglx@...utronix.de> wrote:
> As to the numbers:
> -	/* size: 1704, cachelines: 27, members: 13 */
> -	/* sum members: 1696, holes: 1, sum holes: 4 */
> +	/* size: 1696, cachelines: 27, members: 13 */
> +	/* sum members: 1688, holes: 1, sum holes: 4 */
>  	/* padding: 4 */
> -	/* last cacheline: 40 bytes */
> +	/* last cacheline: 32 bytes */
>
> 'struct rt_rq' got shrunk from 1704 bytes to 1696 bytes, an 8 byte 
> reduction.

Amazing and it still occupies 27 cache lines

>   ffffffffxxxxxxxx:      e8 49 8e fb ff          call   ffffffffxxxxxxxx <nr_iowait_cpu>
> - ffffffffxxxxxxxx:      48 85 c0                test   %rax,%rax
>
>   ffffffffxxxxxxxx:      e8 64 8e fb ff          call   ffffffffxxxxxxxx <nr_iowait_cpu>
> + ffffffffxxxxxxxx:      85 c0                   test   %eax,%eax
>
> Note how the 'test %rax,%rax' lost the 0x48 64-bit REX prefix and the 
> generated code got one byte shorter from "48 85 c0" to "85 c0".

Which will surely be noticable big time. None of this truly matters
because once the data is in L1 the REX prefix is just noise.

> ( Note, my asm generation scripts filter out some of the noise to make it 
>   easier to diff generated asm, hence the ffffffffxxxxxxxx placeholder. )
>
> The nr_iowait() function itself got shorter by two bytes as well, due to:
>
> The size of nr_iowait() shrunk from 78 bytes to 76 bytes.

That's important because nr_iowait() is truly a hotpath function...

> The nr_running() function itself got shorter by 2 bytes, due to shorter 
> instruction sequences.

along with nr_running() which both feed /proc/stat. The latter feeds
/proc/loadavg as well.

Point well taken.

But looking at the /proc/stat usage there is obviously way bigger fish
to fry.

   seq_printf(...., nr_running(), nr_iowait());

which is outright stupid because both functions iterate over CPUs
instead of doing it once, which definitely would be well measurable on
large machines.

But looking at nr_running() and nr_iowait():

    nr_running() walks all CPUs in cpu_online_mask

    nr_iowait() walks all CPUs in cpu_possible_mask

The latter is because rq::nr_iowait is not transferred to an online CPU
when a CPU goes offline. Oh well.

That aside:

I'm not against the change per se, but I'm disagreeing with patches
which come with zero information, are clearly focussed on one
architecture and obviously nobody bothered to check whether there is an
impact on others.

Thanks,

        tglx



Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ