linux-kernel - Kernel getting hosed?

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Date:	Fri, 25 Sep 2009 21:02:44 -0400
From:	Loren Rogers <loren.rogers@...il.com>
To:	linux-kernel <linux-kernel@...r.kernel.org>
Subject: Kernel getting hosed?

Hello,
I am developing a multi-threaded media-based application written for
an iMX27-based processor running kernel 2.6.24.  But I'm seeing a
weird "phenomenon" where certain processes/threads are not being
serviced and my clock (according to gettimeofday()) get's set back as
well.  There are many symptoms to this behavior.  Here are some
symptoms:

1. It's usually the same application-based threads that are either
being serviced or not serviced
2. The problem usually lasts for about 5 and a half minutes and then
appears to correct itself
3. I'll see the cpu load for my application-process quickly jump up to
99% right before the phenomenon (according to top)
4. My IP-telnet and serial terminal sessions are both unusable.
5. I have a logging utility with a timestamp feature (gettimeofday())
where, once this problem corrects itself, the clock has been set to
the exact time the problem started (i.e. let's say the problem starts
at 12:00:00, and I'll be logging msgs like 12:01:00, 12:04:22, etc...
Then after the problem "stops" the timestamp on my logger is once
again 12:00:00).  And when I do a command "date" the clock will say
12:00:00!
6. I think all of my IP-based network threads are being serviced.
7. A colleague wrote a utility on one of the "alive" threads to start
collecting proc data once we know we are in this state; and he told me
that the proc counters have pretty much halted.

My colleagues and I have been chasing this for three weeks now.  I
have no clue on how to determine the culprit(s).  At first I thought
it was some bad code in the user-based application, but can someone
tell me with 100% certainty that this is either a user-space problem
or a kernel problem?  If it is a kernel problem, how can a user-space
application hose a kernel to this extent?

If anybody can help me with some tool or tools to help diagnose the
cause of the problem or even where to start looking I would REALLY
appreciate it.  Thank you
/Loren
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/