lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 28 Sep 2009 20:41:41 +0200
From:	Eric Dumazet <eric.dumazet@...il.com>
To:	Martin Schwidefsky <schwidefsky@...ibm.com>
CC:	Linus Torvalds <torvalds@...ux-foundation.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	John Stultz <johnstul@...ibm.com>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Ingo Molnar <mingo@...e.hu>,
	Arjan van de Ven <arjan@...radead.org>
Subject: Re: Linux 2.6.32-rc1

Martin Schwidefsky a écrit :
> On Mon, 28 Sep 2009 08:39:31 -0700 (PDT)
> Linus Torvalds <torvalds@...ux-foundation.org> wrote:
> 
>> On Mon, 28 Sep 2009, Eric Dumazet wrote:
>>> Linus Torvalds a écrit :
>>>> Go wild, test it out, and let us know about any regressions you find,
>>> Something seems wrong with process time accounting.
>>>
>>> Following program should consume 10*8 seconds of cpu on a 8 cpu machine, but
>>> with 2.6.32-rc1 numbers are crazy.
>> Ok, so the top-level process time looks sane _while_ the thing is running 
>> (sum of all threads), but the per-thread times look broken (as if they had 
>> randomly had the times of the other threads added into them - looks to me 
>> like they on average had half the other threads' time accounted into 
>> them).
>>
>> And then at the end, it looks like the time of the threads (which was 
>> over-accounted) get re-accounted back into the main process, so the final 
>> time is indeed wildly inflated.
>>
>> IOW, looks like we're adding thread times multiple times to other threads 
>> (and then finally to the parent).
>>
>> I'm adding the usual suspects to the cc, and leaving your results and 
>> test-program quoted here for them.. Thomas, Martin, John - if you're not 
>> the people I should blame for this, let me know.
> 
> Uh-oh.. usual suspects, eh?
> 
> For starters I run the test program on an s390 with
> VIRT_CPU_ACCOUNTING=y:
> 
> time ./burn-cpu 
>   PID TTY      STAT   TIME COMMAND 
>  2979 pts/0    -      0:08 ./burn-cpu
>     - -        Sl+    0:00 -         
>     - -        Rl+    0:01 -         
>     - -        Rl+    0:01 -         
>     - -        Rl+    0:01 -         
>     - -        Rl+    0:01 -         
>     - -        Rl+    0:01 -         
>     - -        Rl+    0:01 -         
>     - -        Rl+    0:01 -         
>     - -        Rl+    0:01 -         
>   PID TTY      STAT   TIME COMMAND   
>  2979 pts/0    -      0:16 ./burn-cpu
>     - -        Sl+    0:00 -         
>     - -        Rl+    0:02 -         
>     - -        Rl+    0:02 -         
>     - -        Rl+    0:02 -         
>     - -        Rl+    0:02 -         
>     - -        Rl+    0:02 -         
>     - -        Rl+    0:02 -         
>     - -        Rl+    0:02 -         
>     - -        Rl+    0:02 -         
>   PID TTY      STAT   TIME COMMAND   
>  2979 pts/0    -      0:25 ./burn-cpu
>     - -        Sl+    0:00 -         
>     - -        Rl+    0:03 -         
>     - -        Rl+    0:03 -         
>     - -        Rl+    0:03 -         
>     - -        Rl+    0:03 -         
>     - -        Rl+    0:03 -         
>     - -        Rl+    0:03 -         
>     - -        Rl+    0:03 -         
>     - -        Rl+    0:03 -         
>   PID TTY      STAT   TIME COMMAND   
>  2979 pts/0    -      0:33 ./burn-cpu
>     - -        Sl+    0:00 -         
>     - -        Rl+    0:04 -         
>     - -        Rl+    0:04 -         
>     - -        Rl+    0:04 -         
>     - -        Rl+    0:04 -         
>     - -        Rl+    0:04 -         
>     - -        Rl+    0:04 -         
>     - -        Rl+    0:04 -         
>     - -        Rl+    0:04 -         
>   PID TTY      STAT   TIME COMMAND   
>  2979 pts/0    -      0:41 ./burn-cpu
>     - -        Sl+    0:00 -         
>     - -        Rl+    0:05 -         
>     - -        Rl+    0:05 -         
>     - -        Rl+    0:05 -         
>     - -        Rl+    0:05 -         
>     - -        Rl+    0:05 -         
>     - -        Rl+    0:05 -         
>     - -        Rl+    0:05 -         
>     - -        Rl+    0:05 -         
>   PID TTY      STAT   TIME COMMAND   
>  2979 pts/0    -      0:50 ./burn-cpu
>     - -        Sl+    0:00 -
>     - -        Rl+    0:06 -
>     - -        Rl+    0:06 -
>     - -        Rl+    0:06 -
>     - -        Rl+    0:06 -
>     - -        Rl+    0:06 -
>     - -        Rl+    0:06 -
>     - -        Rl+    0:06 -
>     - -        Rl+    0:06 -
>   PID TTY      STAT   TIME COMMAND
>  2979 pts/0    -      0:58 ./burn-cpu
>     - -        Sl+    0:00 -
>     - -        Rl+    0:07 -
>     - -        Rl+    0:07 -
>     - -        Rl+    0:07 -
>     - -        Rl+    0:07 -
>     - -        Rl+    0:07 -
>     - -        Rl+    0:07 -
>     - -        Rl+    0:07 -
>     - -        Rl+    0:07 -
>   PID TTY      STAT   TIME COMMAND
>  2979 pts/0    -      1:07 ./burn-cpu
>     - -        Sl+    0:00 -
>     - -        Rl+    0:08 -
>     - -        Rl+    0:08 -
>     - -        Rl+    0:08 -
>     - -        Rl+    0:08 -
>     - -        Rl+    0:08 -
>     - -        Rl+    0:08 -
>     - -        Rl+    0:08 -
>     - -        Rl+    0:08 -
>   PID TTY      STAT   TIME COMMAND
>  2979 pts/0    -      1:15 ./burn-cpu
>     - -        Sl+    0:00 -
>     - -        Rl+    0:09 -
>     - -        Rl+    0:09 -
>     - -        Rl+    0:09 -
>     - -        Rl+    0:09 -
>     - -        Rl+    0:09 -
>     - -        Rl+    0:09 -
>     - -        Rl+    0:09 -
>     - -        Rl+    0:09 -
>   PID TTY      STAT   TIME COMMAND
>  2979 pts/0    -      1:25 ./burn-cpu
>     - -        S+     1:25 -
> 
> real    0m10.708s
> user    1m25.051s
> sys     0m0.174s
> 
> looks better, gives an average of 10.63 seconds per thread and the per
> thread numbers are fine. I'll see what happens with the test case on my
> pc@...e.
> 


I did a bisection and found commit def0a9b2573e00ab0b486cb5382625203ab4c4a6
was the origin of the problem on my x86_32 machine.

def0a9b2573e00ab0b486cb5382625203ab4c4a6 is first bad commit
commit def0a9b2573e00ab0b486cb5382625203ab4c4a6
Author: Peter Zijlstra <a.p.zijlstra@...llo.nl>
Date:   Fri Sep 18 20:14:01 2009 +0200

    sched_clock: Make it NMI safe

    Arjan complained about the suckyness of TSC on modern machines, and
    asked if we could do something about that for PERF_SAMPLE_TIME.

    Make cpu_clock() NMI safe by removing the spinlock and using
    cmpxchg. This also makes it smaller and more robust.

    Affects architectures that use HAVE_UNSTABLE_SCHED_CLOCK, i.e. IA64
    and x86.

    Signed-off-by: Peter Zijlstra <a.p.zijlstra@...llo.nl>
    LKML-Reference: <new-submission>
    Signed-off-by: Ingo Molnar <mingo@...e.hu>

git bisect start
# bad: [17d857be649a21ca90008c6dc425d849fa83db5c] Linux 2.6.32-rc1
git bisect bad 17d857be649a21ca90008c6dc425d849fa83db5c
# good: [74fca6a42863ffacaf7ba6f1936a9f228950f657] Linux 2.6.31
git bisect good 74fca6a42863ffacaf7ba6f1936a9f228950f657
# good: [74fca6a42863ffacaf7ba6f1936a9f228950f657] Linux 2.6.31
git bisect good 74fca6a42863ffacaf7ba6f1936a9f228950f657
# good: [5d1fe0c98f2aef99fb57aaf6dd25e793c186cea3] Staging: vt6656: Integrate vt6656 into build system.
git bisect good 5d1fe0c98f2aef99fb57aaf6dd25e793c186cea3
# good: [5d1fe0c98f2aef99fb57aaf6dd25e793c186cea3] Staging: vt6656: Integrate vt6656 into build system.
git bisect good 5d1fe0c98f2aef99fb57aaf6dd25e793c186cea3
# bad: [c720f5655df159a630fa0290a0bd67c93e92b0bf] Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-2.6
git bisect bad c720f5655df159a630fa0290a0bd67c93e92b0bf
# good: [a03fdb7612874834d6847107198712d18b5242c7] Merge branch 'timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
git bisect good a03fdb7612874834d6847107198712d18b5242c7
# good: [3530c1886291df061e3972c55590777ef1cb67f8] Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
git bisect good 3530c1886291df061e3972c55590777ef1cb67f8
# good: [84d6ae431f315e8973aac3c3fe1d550fc9240ef3] V4L/DVB (13033): pt1: Don't use a deprecated DMA_BIT_MASK macro
git bisect good 84d6ae431f315e8973aac3c3fe1d550fc9240ef3
# bad: [ebc79c4f8da0f92efa968e0328f32334a2ce80cf] Merge git://git.kernel.org/pub/scm/linux/kernel/git/jaswinder/linux-2.6
git bisect bad ebc79c4f8da0f92efa968e0328f32334a2ce80cf
# good: [7bd032dc2793afcbaf4a350056768da84cdbd89b] USB serial: update the console driver
git bisect good 7bd032dc2793afcbaf4a350056768da84cdbd89b
# good: [1281a49b7aa865a1f0d60e2b28410e6234fc686b] perf trace: Sample timestamp and cpu when using record flag
git bisect good 1281a49b7aa865a1f0d60e2b28410e6234fc686b
# bad: [e11c675ede0d42a405ae595528bf0b29ce1ae56f] Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty-2.6
git bisect bad e11c675ede0d42a405ae595528bf0b29ce1ae56f
# bad: [6161352142d5fed4cd753b32e5ccde66e705b14e] tracing, perf: Convert the power tracer into an event tracer
git bisect bad 6161352142d5fed4cd753b32e5ccde66e705b14e
# bad: [929bf0d0156562ce631728b6fa53d68004d456d2] Merge branch 'linus' into perfcounters/core
git bisect bad 929bf0d0156562ce631728b6fa53d68004d456d2
# good: [5622f295b53fb60dbf9bed3e2c89d182490a8b7f] x86, perf_counter, bts: Optimize BTS overflow handling
git bisect good 5622f295b53fb60dbf9bed3e2c89d182490a8b7f
# bad: [def0a9b2573e00ab0b486cb5382625203ab4c4a6] sched_clock: Make it NMI safe
git bisect bad def0a9b2573e00ab0b486cb5382625203ab4c4a6


My .config file is attached to this mail


Download attachment ".config" of type "application/xml" (58848 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ