lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAK1hUK-EXT3J5xM63oJVmdWX2OpuM73951cJcXkJAqJiJUN8TQ@mail.gmail.com>
Date:   Tue, 30 Aug 2016 00:26:16 +0800
From:   Mac Lin <mkl0301@...il.com>
To:     Thomas Gleixner <tglx@...utronix.de>
Cc:     Vegard Nossum <vegard.nossum@...il.com>,
        John Stultz <john.stultz@...aro.org>,
        LKML <linux-kernel@...r.kernel.org>
Subject: Re: Random abnormal high CPU sys usage related to timer

On Mon, Aug 29, 2016 at 6:45 PM, Thomas Gleixner <tglx@...utronix.de> wrote:
> Tracing will tell you exactly what's going on in the system.
Will, it seems that I lost my direction. But anyway, there are some
gains to revisit the old tests.

>>> I've checked the /proc/timer_stats, /proc/interrupts, and perf, all
>>> the irq counter, timer counter, timer/irq event didn't show any
>>> abnormal value or useful clue.
I have to take that back. The above result is with another test code
which is harder to reproduce the issue.

Attached the debug info get with and without the issue. Comparing the
two got the following:
 * The /proc/timer_stats is almost the same, but perf events shows
extra softirq/timer events.
 * The perf sample of the failed case is much more than the good case,
 but the ratio of the sampled functions are basically the same.




On Mon, Aug 29, 2016 at 6:45 PM, Thomas Gleixner <tglx@...utronix.de> wrote:
> On Sat, 27 Aug 2016, Mac Lin wrote:
>> Hi Vegard,
>> Thanks for the prompt response.
>> The commit is introduced since 4.6, but the issue can be reproduced at
>> 3.10 (earliest I have ever test). And testing on buildroot+4.7 with
>> the commit reverted, the issue still happen.
>>
>> In fact, I did a test that ran a script that keep increase a counter
>> for 10 seconds on the same CPU. If I ran 2 of it, the number is half
>> of running 1. But if I ran it while the issue happened, the counter
>> reported is around the same value as the 1 process case. So I doubt
>> that it might be an issue of reported number.
>>
>> Is there other way to ensure the CPU is "really" doing something?
>
> Tracing will tell you exactly what's going on in the system.
>
> Thanks,
>
>         tglx

View attachment "debug-fail.log" of type "text/x-log" (26422 bytes)

View attachment "debug-good.log" of type "text/x-log" (12165 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ