linux-kernel - Re: 2.6.25-mm1: not looking good

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <19f34abd0804180622l4f89191cp4cc7833822e058f5@mail.gmail.com>
Date:	Fri, 18 Apr 2008 15:22:57 +0200
From:	"Vegard Nossum" <vegard.nossum@...il.com>
To:	"Jason Wessel" <jason.wessel@...driver.com>
Cc:	"Ingo Molnar" <mingo@...e.hu>,
	"Andrew Morton" <akpm@...ux-foundation.org>, tglx@...utronix.de,
	penberg@...helsinki.fi, linux-usb@...r.kernel.org,
	linux-kernel@...r.kernel.org, linux-mm@...ck.org,
	jmorris@...ei.org, sds@...ho.nsa.gov
Subject: Re: 2.6.25-mm1: not looking good

On Fri, Apr 18, 2008 at 3:02 PM, Jason Wessel
<jason.wessel@...driver.com> wrote:
> Vegard Nossum wrote:
>  > On Fri, Apr 18, 2008 at 2:34 PM, Ingo Molnar <mingo@...e.hu> wrote:
>  >
>  >>  * Vegard Nossum <vegard.nossum@...il.com> wrote:
>  >>
>  >>  > With the patch below, it seems 100% reproducible to me (7 out of 7
>  >>  > bootups hung).
>  >>  >
>  >>  > The number of loops it could do before hanging were, in order: 697,
>  >>  > 898, 237, 55, 45, 92, 59
>  >>
>  >>  cool! Jason: i think that particular self-test should be repeated 1000
>  >>  times before reporting success ;-)
>  >>
>  >
>  > BTW, I just tested a 32-bit config and it hung after 55 iterations as well.
>  >
>  > Vegard
>  >
>  >
>  >
>  I assume this was SMP?

Yes. But now that I realize this, I tried running same kernel with
qemu, using -smp 16, and it seems to be stuck here:

[   16.562659] kgdb: Registered I/O driver kgdbts.
[   16.565875] kgdbts:RUN plant and detach test

and the code is at kgdb_handle_exception():

        /*
         * Wait for the other CPUs to be notified and be waiting for us:
         */
        for_each_online_cpu(i) {
                while (!atomic_read(&cpu_in_kgdb[i]))
                        cpu_relax();
        }


>
>  While I had not tried it yet, my guess would have been this did not
>  happen on a UP kernel.  If it does occur on a UP kernel it means the
>  problem is squarely between the task scheduling after the exception is
>  handled and the kgdb state logic for re-entering the debug state after a
>  single step exception occurs.
>
>  It seems reasonable to go for 1000 iterations of this particular test to
>  declare success as pointed out by Ingo.  Previous versions of kgdb
>  handled some of the irq + single step + cpu sync slightly differently
>  and it is entirely possible there is a regression there.
>
>  Jason.


Vegard

-- 
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
	-- E. W. Dijkstra, EWD1036
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/