lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20130411083634.GB1380@redhat.com>
Date:	Thu, 11 Apr 2013 10:36:35 +0200
From:	Stanislaw Gruszka <sgruszka@...hat.com>
To:	Ingo Molnar <mingo@...nel.org>
Cc:	Frederic Weisbecker <fweisbec@...il.com>,
	Peter Zijlstra <peterz@...radead.org>, hpa@...or.com,
	rostedt@...dmis.org, akpm@...ux-foundation.org, tglx@...utronix.de,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	linux-kernel@...r.kernel.org
Subject: Re: [RFC 4/4] cputime: remove scaling

On Wed, Apr 10, 2013 at 02:02:28PM +0200, Ingo Molnar wrote:
> 
> * Stanislaw Gruszka <sgruszka@...hat.com> wrote:
> 
> > Scaling cputime cause problems, bunch of them was fixed, but still is possible 
> > to hit multiplication overflow issue, which make {u,s}time values incorrect. 
> > This problem has no good solution in kernel.
> 
> Wasn't 128-bit math a solution to the overflow problems? 128-bit math isn't nice, 
> but at least for multiplication it's defensible.

128 bit division is needed unfortunately. Though on 99.9% of cases, it will go
through 64 bit fast path.

> > This patch remove scaling code and export raw values of {u,t}ime . Procps 
> > programs can use newly introduced sum_exec_runtime to find out precisely 
> > calculated process cpu time and scale utime, stime values accordingly.
> > 
> > Unfortunately times(2) syscall has no such option.
> > 
> > This change affect kernels compiled without CONFIG_VIRT_CPU_ACCOUNTING_*.
> 
> So, the concern here is that 'top hiding' code can now hide again. It's also that 
> we are not really solving the problem, we are pushing it to user-space - which in 
> the best case gets updated to solve the problem in some similar fashion - and in 
> the worst case does not get updated or does it in a buggy way.
>
> So while user-space has it a bit easier because it can do floating point math, is 
> there really no workable solution to the current kernel side integer overflow bug? 

I do not see any. Basically all we have make problem less reproducible
or just defer it. The best solution, except full 128 bit math I found
is something like this (dropping precision if values are big and overflow
will happen):

u64 _scale_time(u64 rtime, u64 total, u64 time)
{
        const int zero_bits = clzll(time) + clzll(rtime);
        u64 scaled;

        if (zero_bits < 64) {
                /* Drop precision */
                const int drop_bits = 64 - zero_bits;

                time >>= drop_bits;
                rtime >>= drop_bits;
                total >>= 2*drop_bits;

                if (total == 0)
                        return time;
        }

        scaled = (time * rtime) / total;

        return scaled;
}

It defer problem to quite long period. My testing script detect failure at:

FAIL!
rtime: 1954463459156 <- 22621 days (one thread , CONFIG_HZ=1000) 
total: 1771603722423
stime: 354320744484
kernel: 391351504748 <- kernel value
python: 390892691830 <- correct value

For one thread this is fine, but for 512 threads inaccuracy will happen
after only 40 days (due to dropping too many of "total" variable bits).

> I really prefer robust kernel side accounting/instrumentation.

We have CONFIG_IRQ_TIME_ACCOUNTING and CONFIG_VIRT_CPU_ACCOUNTING_GEN.
Perhaps we can change to use one of those options by default. I wonder
if the additional performance cost related with them is really something
that we should care about. Are there any measurement that show those
will make performance worse ?

Stanislaw
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ