linux-kernel - Re: 2.6.35-rc3 deadlocks on semaphore operations

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <AANLkTilm19O8nPGHSzExdCOPWR2Cridxq_sjs_uRkd8E@mail.gmail.com>
Date:	Sat, 26 Jun 2010 14:52:22 +0200
From:	Luca Tettamanti <kronos.it@...il.com>
To:	Manfred Spraul <manfred@...orfullife.com>
Cc:	Christoph Lameter <cl@...ux-foundation.org>,
	linux-kernel@...r.kernel.org, Julia Lawall <julia@...u.dk>,
	Andrew Morton <akpm@...ux-foundation.org>,
	maciej.rutecki@...il.com
Subject: Re: 2.6.35-rc3 deadlocks on semaphore operations

On Fri, Jun 25, 2010 at 11:09 PM, Manfred Spraul
<manfred@...orfullife.com> wrote:
> On 06/24/2010 09:22 PM, Luca Tettamanti wrote:
>>
>> Argh, "seems" was indeed appropriate. Manfred your patch does
>> alleviate the problem but something is still wrong. I noticed (I'm
>> developing an ajax heavy web app) that sometimes an apache worker
>> hangs; I can reproduce the problem with ab (apache benchmark) and a
>> high concurrency level (I'm testing with 100 and 10k requests, and I
>> get only 2-5 dropped requests). This does not happen with 2.4.34.
>> Any idea on how I can debug this further?
>>
>>
>
> Are there still hung httpd processes?

They don't seem really hung as before, I see two different behaviours:
* Near the end of the run ab is frozen for a few seconds, but in the
end all requests are processed; however I see a few "length" errors,
meaning that the received page does not match the expected content
(I'm testing a static page):

Concurrency Level:      100
Time taken for tests:   14.682 seconds
Complete requests:      10000
Failed requests:        10
   (Connect: 0, Receive: 0, Length: 10, Exceptions: 0)
Write errors:           0
Total transferred:      79987946 bytes
HTML transferred:       77190466 bytes
Requests per second:    681.11 [#/sec] (mean)
Time per request:       146.819 [ms] (mean)
Time per request:       1.468 [ms] (mean, across all concurrent requests)
Transfer rate:          5320.36 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    1   0.4      0       3
Processing:     3   29 428.9     15   14678
Waiting:        0   15   1.3     15      19
Total:          5   30 429.0     16   14682
ERROR: The median and mean for the initial connection time are more
than twice the standard
       deviation apart. These results are NOT reliable.
Percentage of the requests served within a certain time (ms)
  50%     16
  66%     17
  75%     17
  80%     17
  90%     17
  95%     17
  98%     18
  99%     18
 100%  14682 (longest request)

strace on apache shows:
[pid  3787] restart_syscall(<... resuming interrupted call ...> <unfinished ...>
[pid  3789] restart_syscall(<... resuming interrupted call ...> <unfinished ...>
[pid  3788] restart_syscall(<... resuming interrupted call ...> <unfinished ...>
[pid  3784] restart_syscall(<... resuming interrupted call ...> <unfinished ...>
[pid  3783] restart_syscall(<... resuming interrupted call ...> <unfinished ...>
[pid  3782] restart_syscall(<... resuming interrupted call ...> <unfinished ...>
[pid  3239] restart_syscall(<... resuming interrupted call ...> <unfinished ...>
[pid  3233] restart_syscall(<... resuming interrupted call ...> <unfinished ...>
[pid  3238] restart_syscall(<... resuming interrupted call ...> <unfinished ...>
[pid  3237] restart_syscall(<... resuming interrupted call ...>


* Sometimes ab pauses at the end but not all requests are successful; ab says:

apr_socket_recv: Connection timed out (110)
Total of 9932 requests completed

The problem seems more acute if there's other application running on
the system (e.g. right now I have kde, firefox, thunderbird active and
I see more errors; note that nothing is using much CPU time). I've
taken a sysrq-t log during one of ab pauses, it's attached.

Might be related: KDE (4.4) takes an usual amount of time to start up:
the splash screen is fronzen for a while on the screen before the
desktop is available (.xsession-error does not show anything
interesting).

Luca
PS: I'll be offline until Monday 5th...

Download attachment "sysrq-t.txt.gz" of type "application/x-gzip" (7706 bytes)