linux-kernel - /proc/stat information incorrect

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <CABWpNyZkL8K3R=ObOvV9SGMVGTcs1R83H0LSWyXFrQ-Wwb5p+w@mail.gmail.com>
Date:	Mon, 16 Apr 2012 10:04:48 +0200
From:	Bas van der Oest <bassvdo@...il.com>
To:	linux-kernel@...r.kernel.org
Subject: /proc/stat information incorrect

Hi,

I am trying to figure out what my benchmark system is spending CPU
time on by looking at the /proc/stat information provided by the
kernel.
I manually map my benchmark application (in this case an
Ethernet-benchmark) to CPU4-7 and the interrupts are assigned to CPU5.
I am running kernel version 3.2.8 on a 8-core nehalem architecture CPU
(2* Intel E5540).
The interrupts are generated by a 10Gb Ethernet adapter (Intel AT2).
I benchmark for a ten second period and take the /proc/stat
information before and after this period.

The /proc/stat information looks like this:
before:
cpu     10384   0       136128  14417335        15115   6       21414
0       0       0
cpu0    371     0       1136    1808846         14606   3       398 0
     0       0
cpu1    238     0       451     1827606         126     0       50 0
    0       0
cpu2    89      0       345     1828021         39      0       15 0
    0       0
cpu3    314     0       524     1827629         26      0       10 0
    0       0
cpu4    268     0       3008    1804530         106     2       1827 0
      0       0
cpu5    2320    0       33947   1767454         99      0       18941
0       0       0
cpu6    3429    0       48610   1776287         70      0       76 0
    0       0
cpu7    3352    0       48105   1776957         41      0       95 0
    0       0
after:
cpu     10389   0       136562  14425011        15115   6       21467
0       0       0
cpu0    371     0       1136    1809913         14606   3       398 0
     0       0
cpu1    238     0       452     1828676         126     0       50 0
    0       0
cpu2    89      0       345     1829092         39      0       15 0
    0       0
cpu3    315     0       524     1828699         26      0       10 0
    0       0
cpu4    269     0       3102    1805504         106     2       1827 0
      0       0
cpu5    2322    0       34039   1767989         99      0       18993
0       0       0
cpu6    3429    0       48692   1777274         70      0       76 0
    0       0
cpu7    3353    0       48270   1777862         41      0       95 0
    0       0

After dropping some zero columns and taking the difference between the
statistics the reformatted result is:
       user    nice    system  idle    iowait  irq     softirq sum
cpu     5       0       434     7676    0       0       53      8168
cpu0    0       0       0       1067    0       0       0       1067
cpu1    0       0       1       1070    0       0       0       1071
cpu2    0       0       0       1071    0       0       0       1071
cpu3    1       0       0       1070    0       0       0       1071
cpu4    1       0       94      974     0       0       0       1069
cpu5    2       0       92      535     0       0       52      681
cpu6    0       0       82      987     0       0       0       1069
cpu7    1       0       165     905     0       0       0       1071

I added a sum column which totals the time spent in the different
modes. The above table now shows how long each CPU was in what mode
for how long.
Now I am wondering how it is possible that CPU5 has spent much less
time than all the other CPUs. I expected that all CPUs spent around
the same time (10s). This time includes idle time so this is not
related to the difference in active/idle CPUs.

I know for a fact that this effect is related to which CPU is handling
my IRQs; this effect happens to all CPUs if I map the interrupts to
that particular CPU.
I looked up the scheduler's statistics handling in the kernel source
but was not able to find any cause for the above mentioned effect.

Can anyone reproduce this behaviour?
Does anyone know where/what might be the cause of this?

Regards,

Bas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/