lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <0d890fd4-5df0-4a19-a278-74c95aa19935@linux.ibm.com>
Date: Thu, 27 Feb 2025 13:28:30 +0530
From: Shrikanth Hegde <sshegde@...ux.ibm.com>
To: frederic@...nel.org
Cc: mingo@...nel.org, peterz@...radead.org, vincent.guittot@...aro.org,
        maddy@...ux.ibm.com, dietmar.eggemann@....com, riel@...riel.com,
        linux-kernel@...r.kernel.org
Subject: Re: [RFC] sched/cputime: issue with time accounting using default
 configs



On 2/12/25 01:15, Shrikanth Hegde wrote:
> While experimenting with irq time accounting stumbled upon this issue
> with cputime accounting while running simple benchmarks.
> 
> This is very likely a common issue across different archs unless one turns
> on IRQ_TIME_ACCOUNTING. Took a look at src rpms of rhel and suse. Only
> rhel on x86 seems to enable it.
> 
> (default configs)
> CONFIG_VIRT_CPU_ACCOUNTING=y
> CONFIG_VIRT_CPU_ACCOUNTING_GEN=y
> # CONFIG_IRQ_TIME_ACCOUNTING is not set
> CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
> all    3.41    0.00   73.81    0.00   22.00    0.00    0.10    0.00    0.00    0.67
> all    3.39    0.00   73.30    0.00   22.71    0.01    0.01    0.00    0.00    0.58
> 
> (With CONFIG_IRQ_TIME_ACCOUNTING=y)
> CONFIG_VIRT_CPU_ACCOUNTING=y
> CONFIG_VIRT_CPU_ACCOUNTING_GEN=y
> CONFIG_IRQ_TIME_ACCOUNTING=y
> CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
> all    3.64    0.00   94.26    0.00    1.77    0.06    0.05    0.00    0.00    0.21
> all    3.42    0.00   93.89    0.00    1.94    0.07    0.00    0.00    0.00    0.68
> 
> 
> Forced NATIVE to be enabled by removing conditional check in NO_HZ_FULL.
> CONFIG_VIRT_CPU_ACCOUNTING=y
> # CONFIG_TICK_CPU_ACCOUNTING is not set
> CONFIG_VIRT_CPU_ACCOUNTING_NATIVE=y
> CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
> all    5.78    0.00   92.55    0.00    1.56    0.00    0.00    0.00    0.00    0.11
> all    6.14    0.00   91.86    0.00    1.68    0.02    0.00    0.00    0.00    0.29
> 
> Given the code, NATIVE accounting seems most accurate,
> since it tracks enter/exit of user, hardirq, softirqs.
> Though it comes with its own overhead.
> 
> 
> Such a drastic difference w.r.t to *irq time*. That made me wonder why?
> This happens because of when NO_HZ_FULL is chosen, NATIVE accounting
> cannot be enabled and GENeric is the option.
> GEN -> account_process_tick ->
> 	-> if context tracking is enabled, do accounting based on it.
> 	-> if irq_time accounting is enabled, do that.
> 	-> If not, fall back to simple tick based accounting. With this
> 	   whole tick duration can be attributed to IRQ. Which is not true.
> 
> NATIVE -> account_process_tick ->
> 	vtime_flush - native based accounting.
> 
> 
> The main concern is, context tracking is enabled only if NO_HZ_FULL=y and
> (nohz_full= or isolcpus=) is set. Most of the kernels are built with
> NO_HZ_FULL, but many may not pass the nohz_full=. (correct me if i am
> wrong). This leads to context tracking isn't being enabled. Since irq
> time isn't enabled either, it falls into simple tick based accounting.
> 
> A few ways to fix. Some may not be sane. These are the hacks that i have
> tried.
> 
> 1. Looking at irq_time vs native accounting, seems like irq_time is
> lightweight and close enough to native. maybe that can be a middle
> ground. So enable it for the arch default configs. That way distros can
> enable it. below patch is with this method.
> NOTE: this needs more work still w.r.t to measuring the overhead.
> 
> 2. Select IRQ_TIME_ACCOUNTING in case of NO_HZ_FULL. This would fix this
> accounting issue for all archs. But given a slight overhead, some archs
> may not want it.
> 
> 3. If context tracking is not enabled, then do native way if archs
> supports it. since native and irq_time are exclusive only one of them
> can be enabled. This needs a lot of change given how the current code is
> with macros. Also this meant decoupling native from NO_HZ_FULL.
>   
> Is this a problem worth fixing? are there any better way to fix it?

Hi Frederic. Any comments on this?

This is a generic problem across different archs. Not sure whats the 
best way to fix it. Convincing people to enable it in their config may 
be one way.

> 
> Signed-off-by: Shrikanth Hegde <sshegde@...ux.ibm.com>
> ---
>   arch/powerpc/configs/ppc64_defconfig | 1 +
>   1 file changed, 1 insertion(+)
> 
> diff --git a/arch/powerpc/configs/ppc64_defconfig b/arch/powerpc/configs/ppc64_defconfig
> index 465eb96c755e..9bc678d92384 100644
> --- a/arch/powerpc/configs/ppc64_defconfig
> +++ b/arch/powerpc/configs/ppc64_defconfig
> @@ -3,6 +3,7 @@ CONFIG_POSIX_MQUEUE=y
>   CONFIG_AUDIT=y
>   CONFIG_NO_HZ_FULL=y
>   CONFIG_NO_HZ=y
> +CONFIG_IRQ_TIME_ACCOUNTING=y
>   CONFIG_HIGH_RES_TIMERS=y
>   CONFIG_BPF_SYSCALL=y
>   CONFIG_BPF_JIT=y


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ