linux-kernel - Bogus high interrupt load?

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Date:	Wed, 19 Sep 2007 02:56:48 +0300
From:	amanous@...ccessnetworks.com (Aggelos Manousarides)
To:	linux-kernel@...r.kernel.org
Subject: Bogus high interrupt load?

I am using kernel 2.6.13 on a PXA. The system has a DSP driver which
triggers an interrupt every 5ms with a free running clock, unrelated to
the PXA clock.

During the debugging of the DSP driver a strange load peak was detected with
a very precise period. Normally the total load figure was about 1%-2%,
but every 4:40 minutes, the interrupt load was almost 30% of the system
for about 10 seconds. All tools which are based on the /proc/stat
numbers reported this (top, a custom top build on /proc/stat, snmp
agent).  This system supports multiple slots on a custom bus and another
DSP card was used (which obviously had a different crystal).  The same
phenomenon was repeated but with a different period.

The symptoms suggest that this a clock drift related problem. Various
debugging methods where used (including the oprofile facility) and the
possibility of a bug in the DSP driver has been eliminated. Also, the
logic analyzer did not reveal any prolonged hardware cycles during this
period in which there was load.

Then, we came across a bug in 2.6.21, related to the load average and
the calc_load function. The load I am referring is not based on the load
average figures but the ones reported in /proc/stat, so this is probably
not the same issue.  But this gave us the idea that the load might not
actually be real! So the following experiment was performed.

A silly C program (about 5 lines long) sets a counter to zero and
increases the counter in a busy loop until it reaches a certain bit
boundary. The boundary was set to a number which made the loop last
about 3.4 seconds. The following script was executed:

while true; do time ./test; done

Obviously this produces a 100% load on the system. When no interrupt
load is present, time reported 3,4 seconds real, 3,4 seconds user. When
actual interrupt load was introduced (high ethernet traffic) time
reported 3,4 seconds user and a bigger number for real. Everything looks
normal.

However, during the supposed load produced by the DSP driver, the following
amazing thing happened. Time reported that the real time was again 3,4
seconds (and it was, there was no delay) and that 1,7 seconds was user
load and 1,7 was system load!

How can a process that does "i = 0; while (i++) ;" produce system load?

There is obviously something happening to the load reporting facility of
the kernel when a periodic interrupt source is present. Has anyone seen
something like this? I am not using any real time patches, this is a
vanilla kernel patched only to support the platform at hand.

--
Angelos Manousaridis
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/