lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAEf4BzZNOqxhkjuePzeG=ivqBSHXnWzPLPPHJdoC91c+u-WjAg@mail.gmail.com>
Date: Fri, 30 Jan 2026 12:04:45 -0800
From: Andrii Nakryiko <andrii.nakryiko@...il.com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: Tao Chen <chen.dylane@...ux.dev>, mingo@...hat.com, acme@...nel.org, 
	namhyung@...nel.org, mark.rutland@....com, alexander.shishkin@...ux.intel.com, 
	jolsa@...nel.org, irogers@...gle.com, adrian.hunter@...el.com, 
	kan.liang@...ux.intel.com, song@...nel.org, ast@...nel.org, 
	daniel@...earbox.net, andrii@...nel.org, martin.lau@...ux.dev, 
	eddyz87@...il.com, yonghong.song@...ux.dev, john.fastabend@...il.com, 
	kpsingh@...nel.org, sdf@...ichev.me, haoluo@...gle.com, 
	linux-perf-users@...r.kernel.org, linux-kernel@...r.kernel.org, 
	bpf@...r.kernel.org
Subject: Re: [PATCH bpf-next v8 2/3] perf: Refactor get_perf_callchain

On Fri, Jan 30, 2026 at 3:31 AM Peter Zijlstra <peterz@...radead.org> wrote:
>
> On Wed, Jan 28, 2026 at 11:12:09AM -0800, Andrii Nakryiko wrote:
> > On Wed, Jan 28, 2026 at 1:10 AM Peter Zijlstra <peterz@...radead.org> wrote:
> > >
> > > On Mon, Jan 26, 2026 at 03:43:30PM +0800, Tao Chen wrote:
> > > > From BPF stack map, we want to ensure that the callchain buffer
> > > > will not be overwritten by other preemptive tasks and we also aim
> > > > to reduce the preempt disable interval, Based on the suggestions from Peter
> > > > and Andrrii, export new API __get_perf_callchain and the usage scenarios
> > > > are as follows from BPF side:
> > > >
> > > > preempt_disable()
> > > > entry = get_callchain_entry()
> > > > preempt_enable()
> > > > __get_perf_callchain(entry)
> > > > put_callchain_entry(entry)
> > >
> > > That makes no sense, this means any other task on that CPU is getting
> > > screwed over.
> >
> > Yes, unfortunately, but unless we dynamically allocate new entry each
> > time and/or keep per-current entry cached there isn't much choice we
> > have here, no?
> >
> > Maybe that's what we have to do, honestly, because
> > get_perf_callchain() usage we have right now from sleepable BPF isn't
> > great no matter with or without changes in this patch set.
>
> All of the perf stuff is based on the fact that if we do it from IRQ/NMI
> context, we can certainly do it with preemption disabled.
>
> Bending the interface this way is just horrible.

Sure, but I'm just trying to help mitigate the issue at hand (way too
long preemption disabled region). I agree that we should do something
better in terms of perf_callchain_entry retrieval and reuse, but maybe
one thing at a time?

>
> > > Why are you worried about the preempt_disable() here? If this were an
> > > interrupt context we'd still do that unwind -- but then with IRQs
> > > disabled.
> >
> > Because __bpf_get_stack() from kernel/bpf/stackmap.c can be called
> > from sleepable/faultable context and also we can do a rather expensive
> > build ID resolution (either in sleepable or not, which only changes if
> > build ID parsing logic waits for file backed pages to be paged in or
> > not).
>
> <rant>
> So stack_map_get_build_id_offset() is a piece of crap -- and I've always
> said it was. And I hate that any of that ever got merged -- its the
> pinnacle of bad engineering and simply refusing to do the right thing in
> the interest of hack now, fix never :/
> </rant>

<hug><nod></hug>

>
> Anyway, as you well know, we now *do* have lockless vma lookups and
> should be able to do this buildid thing much saner.

Yes, probably, and I am aware of a mmap_lock use inside
stack_map_get_build_id_offset() being problematic, we'll need to fix
this as well. One step at a time.

>
> Also, there appears to be no buildid caching what so ever, surely that
> would help some.

Jiri Olsa proposed caching build id per file or per inode some time
back, there was vehement opposition to it. And doing some locked
global resizable hash that might need to be used from NMI sounds
horrible, tbh. So we have what we have today.

>
> (and I'm not sure I've ever understood why the buildid crap needs to be
> in this path in any case)

Yeah, perhaps, I haven't dealt with stack_map_get_build_id_offset()
much, so will need to go a bit deeper and analyze. As I said we have
mmap_lock problem there, so we need to address that. I'll think if/how
we can improve this.

Do I understand correctly that you'd rather just not touch all this
for now and we should just drop this patch set?

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ