linux-kernel - Re: [git pull] kgdb-light -v10

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.LFD.1.00.0802121001180.2920@woody.linux-foundation.org>
Date:	Tue, 12 Feb 2008 10:11:13 -0800 (PST)
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	Andi Kleen <andi@...stfloor.org>
cc:	Ingo Molnar <mingo@...e.hu>, linux-kernel@...r.kernel.org,
	"Frank Ch. Eigler" <fche@...hat.com>,
	Roland McGrath <roland@...hat.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	"H. Peter Anvin" <hpa@...or.com>,
	Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [git pull] kgdb-light -v10

On Tue, 12 Feb 2008, Andi Kleen wrote:
>
> >  - the kgdb commands should always act on the *current* CPU only
> >  - add one command that says "switch over to CPU #n" which just releases 
> >    the current CPU and sends an IPI to that CPU #n (no timeouts, no 
> >    synchronous waiting, no nothing - it's like a "continue", but with a 
> >    "try to get the other CPU to stop"
> 
> The problem I see here is that the kernel tends to get badly confused
> if one CPU just stops responding. At some point someone does an global
> IPI and that then hangs.  You would need to hotunplug the CPU which
> is theoretically possible, but quite intrusive.

You're thinking about this totally *wrong*.

You definitely do not want to hot-unplug or isolate anything at all. 
That's explicitly against the whole point of kgdb not changing what it is 
trying to measure.

Just let the other CPU's hang naturally if they need to wait for IPI's 
etc. What's the downside? That's what you were trying to do in the first 
place by havign the kgdb callback!

So you can't have it both ways. Either serializing other cpu's with kgdb 
is good (the whole "kgdb_nmicallback" thing or whatever it was called), in 
which case it's also perfectly ok to just let them stop when waiting for 
IPI's.

My point was *not* that kgdb should take control of one CPU, and the other 
CPU's should continue to work as if nothing happened. That is insane and 
impossible (since you may be stopping a CPU while it holds central 
spinlocks etc). No, my point was that I think kgdb should be as light and 
non-intrusive as possible, and that any "higher level behaviour" (like the 
decision of whether to try to synchronize other CPU's or not) should be 
left to the debugger.

But only if that makes kgdb patches less intrusive!

In other words, I'm not at all trying to push any particular solution 
here, except for the "keep it simple, and anything even remotely debatable 
or intrusive to the system should be excised". And I wanted to point out 
that maybe all these timeout etc decisions can be pushed to the debugger.

So I think we can either:

 - have no timeouts or other fancy crap _at_all_, with very simple locking 
   (ie looks what v10 mostly seems to do)

 - or you do the fancy dance entirely in the remote debugger.

I don't care. The only thing I care about is that kgdb support never 
_ever_ shows up in any interesting code, and that it remains totally 
invisible to essentially all of the kernel except the place that would 
otherwise print out an oops.

And I absolutely don't want it to be fancy, I want it to be so simple that 
even _I_ can look at it and say "I think this is crap, but it's _trivial_ 
crap".

IOW: as long as people keep arguing about it, I sure as hell won't ever 
merge it. It needs to be so _obvious_ and so _minimal_ that I can feel 
that I finally don't need to care.

			Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/