[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <0d890fd4-5df0-4a19-a278-74c95aa19935@linux.ibm.com>
Date: Thu, 27 Feb 2025 13:28:30 +0530
From: Shrikanth Hegde <sshegde@...ux.ibm.com>
To: frederic@...nel.org
Cc: mingo@...nel.org, peterz@...radead.org, vincent.guittot@...aro.org,
maddy@...ux.ibm.com, dietmar.eggemann@....com, riel@...riel.com,
linux-kernel@...r.kernel.org
Subject: Re: [RFC] sched/cputime: issue with time accounting using default
configs
On 2/12/25 01:15, Shrikanth Hegde wrote:
> While experimenting with irq time accounting stumbled upon this issue
> with cputime accounting while running simple benchmarks.
>
> This is very likely a common issue across different archs unless one turns
> on IRQ_TIME_ACCOUNTING. Took a look at src rpms of rhel and suse. Only
> rhel on x86 seems to enable it.
>
> (default configs)
> CONFIG_VIRT_CPU_ACCOUNTING=y
> CONFIG_VIRT_CPU_ACCOUNTING_GEN=y
> # CONFIG_IRQ_TIME_ACCOUNTING is not set
> CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
> all 3.41 0.00 73.81 0.00 22.00 0.00 0.10 0.00 0.00 0.67
> all 3.39 0.00 73.30 0.00 22.71 0.01 0.01 0.00 0.00 0.58
>
> (With CONFIG_IRQ_TIME_ACCOUNTING=y)
> CONFIG_VIRT_CPU_ACCOUNTING=y
> CONFIG_VIRT_CPU_ACCOUNTING_GEN=y
> CONFIG_IRQ_TIME_ACCOUNTING=y
> CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
> all 3.64 0.00 94.26 0.00 1.77 0.06 0.05 0.00 0.00 0.21
> all 3.42 0.00 93.89 0.00 1.94 0.07 0.00 0.00 0.00 0.68
>
>
> Forced NATIVE to be enabled by removing conditional check in NO_HZ_FULL.
> CONFIG_VIRT_CPU_ACCOUNTING=y
> # CONFIG_TICK_CPU_ACCOUNTING is not set
> CONFIG_VIRT_CPU_ACCOUNTING_NATIVE=y
> CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
> all 5.78 0.00 92.55 0.00 1.56 0.00 0.00 0.00 0.00 0.11
> all 6.14 0.00 91.86 0.00 1.68 0.02 0.00 0.00 0.00 0.29
>
> Given the code, NATIVE accounting seems most accurate,
> since it tracks enter/exit of user, hardirq, softirqs.
> Though it comes with its own overhead.
>
>
> Such a drastic difference w.r.t to *irq time*. That made me wonder why?
> This happens because of when NO_HZ_FULL is chosen, NATIVE accounting
> cannot be enabled and GENeric is the option.
> GEN -> account_process_tick ->
> -> if context tracking is enabled, do accounting based on it.
> -> if irq_time accounting is enabled, do that.
> -> If not, fall back to simple tick based accounting. With this
> whole tick duration can be attributed to IRQ. Which is not true.
>
> NATIVE -> account_process_tick ->
> vtime_flush - native based accounting.
>
>
> The main concern is, context tracking is enabled only if NO_HZ_FULL=y and
> (nohz_full= or isolcpus=) is set. Most of the kernels are built with
> NO_HZ_FULL, but many may not pass the nohz_full=. (correct me if i am
> wrong). This leads to context tracking isn't being enabled. Since irq
> time isn't enabled either, it falls into simple tick based accounting.
>
> A few ways to fix. Some may not be sane. These are the hacks that i have
> tried.
>
> 1. Looking at irq_time vs native accounting, seems like irq_time is
> lightweight and close enough to native. maybe that can be a middle
> ground. So enable it for the arch default configs. That way distros can
> enable it. below patch is with this method.
> NOTE: this needs more work still w.r.t to measuring the overhead.
>
> 2. Select IRQ_TIME_ACCOUNTING in case of NO_HZ_FULL. This would fix this
> accounting issue for all archs. But given a slight overhead, some archs
> may not want it.
>
> 3. If context tracking is not enabled, then do native way if archs
> supports it. since native and irq_time are exclusive only one of them
> can be enabled. This needs a lot of change given how the current code is
> with macros. Also this meant decoupling native from NO_HZ_FULL.
>
> Is this a problem worth fixing? are there any better way to fix it?
Hi Frederic. Any comments on this?
This is a generic problem across different archs. Not sure whats the
best way to fix it. Convincing people to enable it in their config may
be one way.
>
> Signed-off-by: Shrikanth Hegde <sshegde@...ux.ibm.com>
> ---
> arch/powerpc/configs/ppc64_defconfig | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/arch/powerpc/configs/ppc64_defconfig b/arch/powerpc/configs/ppc64_defconfig
> index 465eb96c755e..9bc678d92384 100644
> --- a/arch/powerpc/configs/ppc64_defconfig
> +++ b/arch/powerpc/configs/ppc64_defconfig
> @@ -3,6 +3,7 @@ CONFIG_POSIX_MQUEUE=y
> CONFIG_AUDIT=y
> CONFIG_NO_HZ_FULL=y
> CONFIG_NO_HZ=y
> +CONFIG_IRQ_TIME_ACCOUNTING=y
> CONFIG_HIGH_RES_TIMERS=y
> CONFIG_BPF_SYSCALL=y
> CONFIG_BPF_JIT=y
Powered by blists - more mailing lists