linux-kernel - Re: [PATCH] objtool: Fix stack overflow in validate

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <aTCLevOLZ69EpXNF@gmail.com>
Date: Wed, 3 Dec 2025 20:11:54 +0100
From: Ingo Molnar <mingo@...nel.org>
To: Josh Poimboeuf <jpoimboe@...nel.org>
Cc: x86@...nel.org, linux-kernel@...r.kernel.org,
	Nathan Chancellor <nathan@...nel.org>,
	Peter Zijlstra <peterz@...radead.org>,
	Alexandre Chartre <alexandre.chartre@...cle.com>,
	David Laight <david.laight.linux@...il.com>,
	Linus Torvalds <torvalds@...ux-foundation.org>
Subject: Re: [PATCH] objtool: Fix stack overflow in validate_branch()

* Josh Poimboeuf <jpoimboe@...nel.org> wrote:

> On Wed, Dec 03, 2025 at 10:25:34AM +0100, Ingo Molnar wrote:
> > * Josh Poimboeuf <jpoimboe@...nel.org> wrote:
> > > On Tue, Dec 02, 2025 at 05:20:22PM +0100, Ingo Molnar wrote:
> > > > * Josh Poimboeuf <jpoimboe@...nel.org> wrote:
> > > > > On an allmodconfig kernel compiled with Clang, objtool is
> > > > > segfaulting in drivers/scsi/qla2xxx/qla2xxx.o due to a stack
> > > > > overflow in validate_branch().
> > > > >
> > > > > Due in part to KASAN being enabled, the qla2xxx code has a large
> > > > > number of conditional jumps, causing objtool to go quite deep in
> > > > > its recursion.
> > > > >
> > > > > By far the biggest offender of stack usage is the recently added
> > > > > 'prev_state' stack variable in validate_insn(), coming in at 328
> > > > > bytes.
> > > >
> > > > That's weird - how can a user-space tool run into stack limits, are
> > > > they set particularly conservatively?
> > >
> > > On my Fedora system, "ulimit -s" is 8MB.  You'd think that would be
> > > enough :-)
> > >
> > > In this case, objtool had over 20,000 stack frames caused by
> > > recursively following over 7,000(!) conditional jumps in a single
> > > function.
> >
> > BTW., I just instrumented it, and it's even worse: on current upstream,
> > the allmodconfig qla2xxx.o code built with clang-20.1.8 has a worst-case
> > recursion depth of 50,944 (!), for the qla83xx_fw_dump() function.
>
> Is that number of loops or total stack frames?

So I tracked the depth of validate_insn() recursion directly:

                ret = validate_insn(file, func, insn, &state, prev_insn, next_insn,
-                                   &dead_end);
+                                   &dead_end, depth++);

                if (!insn->trace) {
                        if (ret)

And in validate_insn():

	if (depth > max_depth) {
		max_depth = depth;
		printf("# objtool new max depth: %ld for %s()\n", max_depth, func->name);
	}

Actual function recursion depth may be deeper, if any of the helper
functions get uninlined.

> kernel and clang 20.1.8 I'm getting a max recursion depth of 7,165 loops
> (not frames).  See the below patch for how I measured that.

Your patch seems to be similar, except that I passed in 'depth'
directly, because as a kernel developer I don't trust globals :-)

But it should measure the same thing AFAICS, right?

> You may be underestimating the amount of memory usage objtool needs.
> Running objtool on that binary with "/usr/bin/time -v" shows the maximum
> resident set size is 140M.  So the stack usage of 5.5MB is only about
> 4.4% of the total memory usage.

Still, the stack is some of the cache-hottest pieces of memory
in that workload - and the biggest negative impact from the
current recursion pattern comes from the sparse parsing, which
suffers an even worse negative effect with a 140MB working set.

> > One relatively simple method to 'straighten out' the parsing flow would
> > be to add an internal 'branch queue' with a limited size of say 16 or 32
> > entries, and defer the parsing of these branch targets and continue with
> > the next instruction, until one of these conditions is true:
> >
> >   - 'branch queue' is full
> >
> >   - JMP, CALL, RET or any other branching/trapping instruction is found
> >
> >   - already validated instruction is found
> >
> >   - end of symbol/section/file/etc.
> >
> > At which point the current 'branch queue' is flushed. (It might even be
> > implemented as a branch-target stack, which may have a bit better
> > locality.)
>
> Objtool tracks a considerable amount of state across branches.  The
> recursion works well for keeping that state at hand.  So there is a
> certain level of dependency there which I have a feeling might be
> difficult to extricate.  I haven't really looked at it though.

I'm not against recursion for branches at all, I just suggest
to change the order of how the recursion is fed: instead of parsing
the two instruction streams of a branch point in this order:

	verify target recursively
	verify next instruction

(Which is arguably the simplest.)

I suggest the following recursion pattern:

	verify a batch of serial sequence of instruction(s) and save conditional branch targets (if any)
	verify saved branch targets, recursively

This change to the recursion pattern should make a very large
impact on max recursion depth, in addition to substantially better
cache locality.

Thanks,

	Ingo