linux-kernel - Re: 2.6.21-rc3-mm2

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <200703090152.49253.kernel@kolivas.org>
Date:	Fri, 9 Mar 2007 01:52:49 +1100
From:	Con Kolivas <kernel@...ivas.org>
To:	Andrew Morton <akpm@...ux-foundation.org>
Cc:	linux-kernel@...r.kernel.org
Subject: Re: 2.6.21-rc3-mm2

On Thursday 08 March 2007 15:19, Andrew Morton wrote:
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc3/2.
>6.21-rc3-mm2/
>
> - This is the same as 2.6.21-rc3-mm1, except Con's CPU scheduler changes
>   were dropped.

As for benchmarking between them I haven't seen anyone post anything yet. 
Test.kernel.org is having a field day with both of them ABORTing or WARNing 
without one GOOD so far. Urgh what was the description you gave of -mm to 
Gene?...I don't envy you :(

Apart from your big fat oops on that massive config, I've had people boot it 
also on alpha and IA64. The bitmap error also manifested on alpha but not on 
IA64 (neither oopsed).

So on qemu I can reproduce the oops you're getting with your config (make 
oldconfig all default on top of your config), but I'm getting other wonderful 
related problems too on rc3-mm2. On qemu -mm1 boots mostly without error and 
then crashes nicely when I type 'ps' with a long pause for about twenty 
seconds and then a combination of soft lockups, bitmap errors, and eventually 
hits the BUG_ON I put in bitmap_error(). However, -mm2 also vomits on 
typing 'ps'. 

It pauses and then spits out (fun lines selected from ps output):

    7 ?        serial8250: too much work for irq4
00:00:00 watchdog/1
   88 ?        00:00:0serial8250: too much work for irq4
0 cqueue/1
  137 ?        00:00serial8250: too much work for irq4
:00 aio/0

Checking a few /proc files I see that "serial83250" info littered 
throughout /proc/stat as well. -mm2 does not oops but the proc output is 
variously corrupted.

Interestingly if I don't type 'ps' in the -mm1 qemu it runs fine with no sign 
of a bug... In summary, here I can only reproduce your big fat oops by it 
being triggered by some corruption elsewhere on this config related to /proc 
breakage that I haven't managed to track down. I checked the broken-out 
patches to see which touched /proc and it was oh, most of them. I tried on rc3 
and had the same thing happen. I haven't tried rc3 without rsdl (your config 
takes too darn long to build!).

The bitmap error that you're hitting on ppc I believe is a different issue. I 
don't have hardware that does it at my end but one user has hit it also on 
alpha and is trying to help me find the root cause. If we change it from 
being a once only printk to every time, it seems to go away after 10 mins of 
runtime never to return again. What could possibly change over that time for 
the error never to return again is a mystery. This part definitely looks to 
be my fault but a bug "getting better and going away" is a new one for me. I 
have to think about some timer interaction although I don't use jiffies or 
timers directly at all in my code. It also happens on 2.6.20 with RSDL.

Summary from what I've been able to find:
x86_32: ok
x86_64: ok
x86_64 fat config: scheduler code oops brought on by accessing /proc
IA64 ok: ok
Alpha: bitmap error, runs ok

I'm off to pass out for now so that I'm of some use to the rest of my 
existence tomorrow.

-- 
-ck
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/