linux-kernel - Re: [PATCH 2/5] x86: fix runtime error in arch/x86/kernel/cpu/mcheck/mce_amd

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Mon, 28 Jan 2008 20:01:49 +0100
From:	Ingo Molnar <mingo@...e.hu>
To:	Greg KH <gregkh@...e.de>
Cc:	linux-kernel@...r.kernel.org, Yinghai Lu <yhlu.kernel@...il.com>,
	Jacob Shin <jacob.shin@....com>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Alexander Viro <viro@....linux.org.uk>
Subject: Re: [PATCH 2/5] x86: fix runtime error in
	arch/x86/kernel/cpu/mcheck/mce_amd_64.c

* Greg KH <gregkh@...e.de> wrote:

> > Are kobjects protected against accidental copying? If not add &kobj 
> > to the 'magic value' too, and check that - it becomes 
> > copying-resistent that way and has the same cost to check. (which is 
> > negligible anyway)
> 
> Oh, that's a very cool idea, I like it :)

hey, you are welcome :-)

[ I guess i should not mention that i've implemented list debugging for 
  Linux that checksums the struct list contents and stores the checksum 
  in it (offset by a magic value plus to address of the list head), and 
  thus protects it against accidental corruption? It was capable of 
  reliably detecting mixed up list_add() arguments for example, it 
  detected list corruption of _every_ sort, it detected double
  list_del() and list_add() of an already active list member as well. It
  was even capable of detecting SMP races: two parallel unserialized
  list_del()'s on the same list head were detected and warned about as 
  well. I guess i should release it one of these days? =B-) ]

> So I guess the complaint is that CONFIG_KOBJECT is too verbose?  I can 
> understand that, I'll look to clean things up as we have a lot of 
> "tracing" type functions in the debug output.  Hm, I wonder if the new 
> marker code could help with that...

that would certainly be helpful and nice as a second line of defense. 
(as a second line of defense the DEBUG_KOBJECT stuff was pretty 
effective as well, as you have shown it with this bug.)

but the "first line of defense" must be an as automatically and as 
transparently fault-resilient data structure as humanly possible.

"please enable DEBUG_KOBJECT and send me the logs" reduces the active 
tester base to somewhere around 0.1% of the full tester base - and if 
the bug is in any way spurious, the tester base goes down to 0.001% or 
down to outright zero.

i really think that in terms of central kernel data structures, no 
amount of debugging infrastructure is "too much" (within taste limits) - 
as long as it's cleanly structured. Even 10,000 lines of debugging code 
dwarves the 9,000,000 lines of code that it ultimately covers.

> > For example locks currently have 4000 lines of code of debugging 
> > infrastructure and 1500 lines of code of self-tests. The total 
> > amount of core locking code is less 1000 lines of code. So for every 
> > line of locking code there's more than 5 lines of debugging 
> > infrastructure (!).
> 
> Heh, good point, I'll look into fixing this up to be more useful to 
> the "normal" developer, and not just the two of us doing kobject core 
> development :)

you are right, that is another important argument: meaningful debugging 
checks that catch difficult to debug problems speed up development and 
allows one to iterate the codebase faster. As long as the debugging 
checks are transparent (i.e. they have no outrageous false positive rate 
and they do not spam the logs), distros and testers will happily enable 
it and you'll get much more feedback than you ever imagined would be 
possible.

Time is precious, even if you know what you are doing, so the more of 
the debugging you are capable of pushing to the computer, the better. 
Even if some debugging levels are as outrageously expensive as "iterate 
the linear list of all kobjects in the system, for every kobject op", 
people will still be happy to turn it on, if it catches bugs. Attach a 
nice WARN_ON_ONCE() to it and kerneloops.org will pick it up and 
categorize it for you to check. And i think that's an important effort 
for core subsystem maintainers: for example i spent far more time 
writing debugging code for mutexes and other locks than i spent time 
writing the facilities themselves. Preventing a certain category of bugs 
is far more important to users than shaving off 1 cycle from a fastpath.

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/