[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250707203057.1b2af73d@gandalf.local.home>
Date: Mon, 7 Jul 2025 20:30:57 -0400
From: Steven Rostedt <rostedt@...dmis.org>
To: "Li,Rongqing" <lirongqing@...du.com>
Cc: Oleg Nesterov <oleg@...hat.com>, Peter Zijlstra <peterz@...radead.org>,
David Laight <david.laight.linux@...il.com>, "linux-kernel@...r.kernel.org"
<linux-kernel@...r.kernel.org>, "vschneid@...hat.com"
<vschneid@...hat.com>, "mgorman@...e.de" <mgorman@...e.de>,
"bsegall@...gle.com" <bsegall@...gle.com>, "dietmar.eggemann@....com"
<dietmar.eggemann@....com>, "vincent.guittot@...aro.org"
<vincent.guittot@...aro.org>, "juri.lelli@...hat.com"
<juri.lelli@...hat.com>, "mingo@...hat.com" <mingo@...hat.com>
Subject: Re: [????] Re: [????] Re: divide error in x86 and cputime
On Tue, 8 Jul 2025 00:10:54 +0000
"Li,Rongqing" <lirongqing@...du.com> wrote:
> > stime = mul_u64_u64_div_u64(stime, rtime, stime + utime);
> > + /*
> > + * Because mul_u64_u64_div_u64() can approximate on some
> > + * achitectures; enforce the constraint that: a*b/(b+c) <= a.
> > + */
> > + if (unlikely(stime > rtime))
> > + stime = rtime;
>
>
> My 5.10 has not this patch " sched/cputime: Fix mul_u64_u64_div_u64() precision for cputime ",
> but I am sure this patch can not fix this overflow issue, Since division error happened in mul_u64_u64_div_u64()
Have you tried it? Or are you just making an assumption?
How can you be so sure? Did you even *look* at the commit?
sched/cputime: Fix mul_u64_u64_div_u64() precision for cputime
In extreme test scenarios:
the 14th field utime in /proc/xx/stat is greater than sum_exec_runtime,
utime = 18446744073709518790 ns, rtime = 135989749728000 ns
In cputime_adjust() process, stime is greater than rtime due to
mul_u64_u64_div_u64() precision problem.
before call mul_u64_u64_div_u64(),
stime = 175136586720000, rtime = 135989749728000, utime = 1416780000.
after call mul_u64_u64_div_u64(),
stime = 135989949653530
unsigned reversion occurs because rtime is less than stime.
utime = rtime - stime = 135989749728000 - 135989949653530
= -199925530
= (u64)18446744073709518790
Trigger condition:
1). User task run in kernel mode most of time
2). ARM64 architecture
3). TICK_CPU_ACCOUNTING=y
CONFIG_VIRT_CPU_ACCOUNTING_NATIVE is not set
Fix mul_u64_u64_div_u64() conversion precision by reset stime to rtime
When stime ends up greater than rtime, it causes utime to go NEGATIVE!
That means *YES* it can overflow a u64 number. That's your bug.
Next time, look to see if there's fixes in the code that is triggering
issues for you and test them out, before bothering upstream.
Goodbye.
-- Steve
Powered by blists - more mailing lists