linux-kernel - Re: [RFC PATCH] x86 NMI-safe INT3 and Page Fault (v3)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <48085B73.3070904@firstfloor.org>
Date:	Fri, 18 Apr 2008 10:27:31 +0200
From:	Andi Kleen <andi@...stfloor.org>
To:	Mathieu Desnoyers <compudj@...stal.dyndns.org>
CC:	Jeremy Fitzhardinge <jeremy@...p.org>, Ingo Molnar <mingo@...e.hu>,
	akpm@...l.org, "H. Peter Anvin" <hpa@...or.com>,
	Steven Rostedt <rostedt@...dmis.org>,
	"Frank Ch. Eigler" <fche@...hat.com>, linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH] x86 NMI-safe INT3 and Page Fault (v3)

> arch/x86/oprofile/nmi_timer_int.c: profile_timer_exceptions_notify()
> calls
> drivers/oprofile/oprofile_add_sample()
> which calls oprofile_add_ext_sample()
> where
>        if (log_sample(cpu_buf, pc, is_kernel, event))
>                 oprofile_ops.backtrace(regs, backtrace_depth);

A red hering: The notifier setup calls vmalloc_sync_all() and oprofile
allocates its buffers before registering the notifier.

> First, log_sample writes into the vmalloc'd cpu buffer. That's for one
> possible page fault.

> Then, is a kernel backtrace happen, then I am not sure if printk_address
> won't try to read any of the module data, which is vmalloc'd.

Yes, admittedly the backtrace mode was always somewhat flakey. It probably
has more problems too.

The right fix for that is to call vmalloc_sync_all() after module load
when any nmi notifiers are registered.

> 
> 
>> NMI are maybe 5-6 functions all over the kernel.
>>
>> I just don't think it makes any sense to put markers in there.
>> It is a really small part of the kernel the kernel that is unlikely
>> to be really useful for anybody. You should rather first solve the
>> problem of tracing the other 99.999999% of the kernel properly.
>>
> 
> The fact is that NMIs are very useful and powerful when it comes to try
> to understand where code disabling interrupts is stucked, to get
> performance counter reads periodically

First there are no truly periodic (as in time) NMIs. The NMI watchdog 
is not really periodic but is delayed arbitrarily all the time when the CPU 
is in sleep states.

Then oprofile does this already what you describe. Why do we need
another questionable infrastructure to reimplement what is 
already there? 

 without suffering from IRQ
> latency

Just from all kind of other latency caused by non ticking performance
counters.

. Also, when trying to figure out what is actually happening in
> the kernel timekeeping, having a stable periodic time source can be
> pretty useful. 

Haha. You seem to be so deep into nonsense land, it is hard to comprehend.

> That would be one way to do it, except that it would not deal with int3.
> Also, it would have to be taken into account at module load time. To me,
> that looks like an error-prone design. If the problem is at the lower
> end of the architecture, in the interrupt return path, why don't we
> simply fix it there for good ?

There are all kinds of problems with NMIs, this is only one of them.
And NMIs are a really really obscure case

Frankly, if you spend all your time on fringe cases like this instead
of getting it to work on the 99.99999999999999% case it doesn't
surprise me that the markers don't make any progress for years now.

 And yes, boot code is one of the first thing embedded system
> developers want to instrument.

Crap. That code runs once. The only interest is correctness and 
if it's not correct you just step it through with a JTAG debugger.

> I wonder if they are used so rarely because the underlying kernel is
> buggy with respect with NMIs or because they are useless.

lockless programming is just really hard and not doing it is in most 
cases the sanest option.

Anyways I give up. Do what you want.

-Andi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/