lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20251029140755.GF4067720@noisy.programming.kicks-ass.net>
Date: Wed, 29 Oct 2025 15:07:55 +0100
From: Peter Zijlstra <peterz@...radead.org>
To: Dmitry Ilvokhin <d@...okhin.com>
Cc: Ingo Molnar <mingo@...hat.com>, Juri Lelli <juri.lelli@...hat.com>,
	Vincent Guittot <vincent.guittot@...aro.org>,
	Dietmar Eggemann <dietmar.eggemann@....com>,
	Steven Rostedt <rostedt@...dmis.org>,
	Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
	Valentin Schneider <vschneid@...hat.com>,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH RESEND] sched/stats: Optimize /proc/schedstat printing

On Wed, Oct 29, 2025 at 01:07:15PM +0000, Dmitry Ilvokhin wrote:
> Function seq_printf supports rich format string for decimals printing,
> but there is no need for it in /proc/schedstat, since majority of the
> data is space separared decimals. Use seq_put_decimal_ull instead as
> faster alternative.
> 
> Performance counter stats (truncated) for sh -c 'cat /proc/schedstat >
> /dev/null' before and after applying the patch from machine with 72 CPUs
> are below.
> 
> Before:
> 
>       2.94 msec task-clock               #    0.820 CPUs utilized
>          1      context-switches         #  340.551 /sec
>          0      cpu-migrations           #    0.000 /sec
>        340      page-faults              #  115.787 K/sec
> 10,327,200      instructions             #    1.89  insn per cycle
>                                          #    0.10  stalled cycles per insn
>  5,458,307      cycles                   #    1.859 GHz
>  1,052,733      stalled-cycles-frontend  #   19.29% frontend cycles idle
>  2,066,321      branches                 #  703.687 M/sec
>     25,621      branch-misses            #    1.24% of all branches
> 
> 0.00357974 +- 0.00000209 seconds time elapsed  ( +-  0.06% )
> 
> After:
> 
>       2.50 msec task-clock              #    0.785 CPUs utilized
>          1      context-switches        #  399.780 /sec
>          0      cpu-migrations          #    0.000 /sec
>        340      page-faults             #  135.925 K/sec
>  7,371,867      instructions            #    1.59  insn per cycle
>                                         #    0.13  stalled cycles per insn
>  4,647,053      cycles                  #    1.858 GHz
>    986,487      stalled-cycles-frontend #   21.23% frontend cycles idle
>  1,591,374      branches                #  636.199 M/sec
>     28,973      branch-misses           #    1.82% of all branches
> 
> 0.00318461 +- 0.00000295 seconds time elapsed  ( +-  0.09% )
> 
> This is ~11% (relative) improvement in time elapsed.

Yeah, but who cares? Why do we want less obvious code for a silly stats
file?

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ