linux-kernel - Re: [PATCH 0/2] [GIT PULL] tracing: various bug fixes

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.DEB.2.00.0904230935060.24310@gandalf.stny.rr.com>
Date:	Thu, 23 Apr 2009 09:53:52 -0400 (EDT)
From:	Steven Rostedt <rostedt@...dmis.org>
To:	Ingo Molnar <mingo@...e.hu>
cc:	Frédéric Weisbecker <fweisbec@...il.com>,
	LKML <linux-kernel@...r.kernel.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Glauber de Oliveira Costa <gcosta@...hat.com>,
	Chris Wright <chrisw@...s-sol.org>,
	Jeremy Fitzhardinge <jeremy@...p.org>,
	Rusty Russell <rusty@...tcorp.com.au>,
	Pekka Enberg <penberg@...helsinki.fi>
Subject: Re: [PATCH 0/2] [GIT PULL] tracing: various bug fixes


On Thu, 23 Apr 2009, Ingo Molnar wrote:

> > 
> > Here's the situation:
> > 
> > We've added selftests for the event tracer. What this basically does is 
> > enables each event one at a time and runs tests. The tests include 
> > creating a kernel thread, executing workqueues and grabbing locks.
> > 
> > We also have PROVE_LOCKING (LOCKDEP) enabled, that keeps track of 
> > interrupts being enabled. When they are, we set a flag in the task struct 
> > "hardirqs_enabled". When they are disabled, this flag is cleared.
> > 
> > When we fork a process, a test is made to see if the flag is set for the 
> > new process and if it is not, a warning is printed (as is done in the 
> > above dump).
> > 
> > I investigated this and found that the flag is mysteriously being set and 
> > cleared for no apparent reason. The flag is in the task struct and nothing 
> > should be touching it. In fact, it is a full int, not even a bit in a 
> > flags variable.
> > 
> > I had a test that would print the flag and irqs_disabled() in copy_process 
> > before the warning and it showed that the flag was cleared but irqs was 
> > enabled. The funny part is, I if the test triggered, I printed the flag 
> > again, and the second print it was set again!!
> > 
> > 	if (!p->hardirqs_enabled) {
> > 		printk("irqs:%d flag:%d\n", irqs_disabled(),
> > 			p->hardirqs_enabled);
> > 		printk("try again: %d\n", p->hardirqs_enabled);
> > 	}
> > 
> > The first print showed that it was cleared, the second showed it was set 
> > again??
> 
> function tracer was active? So somewhere there we corrupted this 
> state? It's unlikely that printk itself did this.

The problem always arises at the same spot. I'm not saying printk was the 
culprit, I'm saying that printk actually "fixed" the issue. Which can be a 
sign of a corrupted register somewhere.

I currently tracked it down to something in "prep_new_page". It is hard to 
debug because as I add tests into the code, it makes the race window 
smaller, and I need to run multiple boots to trigger the code. But when I 
do trigger it (and I try to trigger it a few times) it always happens at 
the same spot.

Note, everytime I catch the issue, the printk again "fixes" the problem 
:-/

Since I need both DEBUG_PAGEALLOC and PARAVIRT on, I'm thinking that the 
debug page alloc might be messing with the paravirt "modified" code (or 
vice versa). It could also be the function tracer doing it too. But it 
looks like the DEBUG_PAGEALLOC code is causing issues because the bug 
always appears in the alloc code.

I'm not blaming any subsystem yet. It just seems that the combination of 
these systems are broken.

-- Steve

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/