linux-kernel - Re: [BUG] Stack overflow when running perf and function tracer

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <877dr8nh6u.fsf@nanos.tec.linutronix.de>
Date:   Fri, 30 Oct 2020 11:26:01 +0100
From:   Thomas Gleixner <tglx@...utronix.de>
To:     Peter Zijlstra <peterz@...radead.org>,
        Steven Rostedt <rostedt@...dmis.org>
Cc:     LKML <linux-kernel@...r.kernel.org>,
        Ingo Molnar <mingo@...nel.org>, kan.liang@...ux.intel.com,
        like.xu@...ux.intel.com
Subject: Re: [BUG] Stack overflow when running perf and function tracer

On Fri, Oct 30 2020 at 10:00, Peter Zijlstra wrote:
> On Fri, Oct 30, 2020 at 12:27:22AM -0400, Steven Rostedt wrote:
>> I found a bug in the recursion protection that prevented function
>> tracing from running in NMI context. Applying this fix to 5.9 worked
>> fine (tested by running perf record and function tracing at the same
>> time). But when I applied the patch to 5.10-rc1, it blew up with a
>> stack overflow:
>
> So we just blew away our NMI stack, right?

Looks like that:

>>  RSP: 0018:fffffe000003c000 EFLAGS: 00010046

Clearly a page boundary.

>>  RAX: 000000000000001c RBX: ffff928ada27b400 RCX: 0000000000000000
>>  RDX: ffff928ada07b200 RSI: fffffe000003c028 RDI: ffff928ada27b400
>>  RBP: ffff928ada27b4f0 R08: 0000000000000001 R09: 0000000000000000
>>  R10: fffffe000003c440 R11: ffff928a7383cc60 R12: fffffe000003c028
>>  R13: 00000000000003e8 R14: 0000000000000046 R15: 0000000000110001
>>  FS:  00007f25d43cf780(0000) GS:ffff928adaa40000(0000) knlGS:0000000000000000
>>  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>  CR2: fffffe000003bff8 CR3: 00000000b52a8005 CR4: 00000000001707e0

and CR2 says it tried below.

>> I bisected it down to:
>> 
>> 35d1ce6bec133679ff16325d335217f108b84871 ("perf/x86/intel/ds: Fix
>> x86_pmu_stop warning for large PEBS")
>> 
>> Which looks to be storing an awful lot on the stack:
>> 
>> static void __intel_pmu_pebs_event(struct perf_event *event,
>> 				   struct pt_regs *iregs,
>> 				   void *base, void *top,
>> 				   int bit, int count,
>> 				   void (*setup_sample)(struct perf_event *,
>> 						struct pt_regs *,
>> 						void *,
>> 						struct perf_sample_data *,
>> 						struct pt_regs *))
>> {
>> 	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
>> 	struct hw_perf_event *hwc = &event->hw;
>> 	struct perf_sample_data data;
>> 	struct x86_perf_regs perf_regs;
>> 	struct pt_regs *regs = &perf_regs.regs;
>> 	void *at = get_next_pebs_record_by_bit(base, top, bit);
>> 	struct pt_regs dummy_iregs;
>
> The only thing I can come up with in a hurry is that that dummy_iregs
> thing really should be static. That's 168 bytes of stack out the window
> right there.

What's worse is perf_sample_data which is 384 bytes and is 64 bytes aligned.

> Still, this seems to suggest (barring some actual issue hidding in those
> 135 lost lines, we're very close to the limit on the NMI stack, which is
> a single 4k page IIRC.

Yes, unless KASAN is enabled

Thanks,

        tglx