netdev - Re: [PATCH bpf-next 1/2] bpf: fix maximum stack depth tracking logic

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20171223030328.pcff7lis6um52zs3@ast-mbp>
Date:   Fri, 22 Dec 2017 19:03:30 -0800
From:   Alexei Starovoitov <alexei.starovoitov@...il.com>
To:     Jann Horn <jannh@...gle.com>
Cc:     Alexei Starovoitov <ast@...nel.org>,
        "David S . Miller" <davem@...emloft.net>,
        Daniel Borkmann <daniel@...earbox.net>,
        Network Development <netdev@...r.kernel.org>,
        kernel-team@...com
Subject: Re: [PATCH bpf-next 1/2] bpf: fix maximum stack depth tracking logic

On Sat, Dec 23, 2017 at 03:26:27AM +0100, Jann Horn wrote:
> On Sat, Dec 23, 2017 at 3:07 AM, Alexei Starovoitov
> <alexei.starovoitov@...il.com> wrote:
> > On Sat, Dec 23, 2017 at 02:38:29AM +0100, Jann Horn wrote:
> >> On Fri, Dec 22, 2017 at 10:33 PM, Alexei Starovoitov <ast@...nel.org> wrote:
> >> > instead of computing max stack depth for current call chain only
> >> > track the maximum possible stack depth of any function at given
> >> > frame position. Such algorithm is simple and fast, but conservative,
> >> > since it overestimates amount of stack used. Consider:
> >> > main() // stack 32
> >> > {
> >> >   A();
> >> >   B();
> >> > }
> >> >
> >> > A(){} // stack 256
> >> >
> >> > B()  // stack 64
> >> > {
> >> >   A();
> >> > }
> >> >
> >> > since A() is called at frame[1] and frame[2], the algorithm
> >> > will estimate the max stack depth as 32 + 256 + 256 and will reject
> >> > such program, though real max is 32 + 64 + 256.
> >> >
> >> > Fortunately the algorithm is good enough in practice. The alternative
> >> > would be to track max stack of every function in the fast pass through
> >> > the verifier and then do additional CFG walk just to compute max total.
> >> >
> >> > Fixes: f4d7e40a5b71 ("bpf: introduce function calls (verification)")
> >> > Reported-by: Jann Horn <jannh@...gle.com>
> >> > Signed-off-by: Alexei Starovoitov <ast@...nel.org>
> >>
> >> Does this work in cases where multiple invocations of a function have
> >> different stack access patterns because their inputs have different
> >> bounds?
> >>
> >> Consider this pseudocode example:
> >>
> >> void main(void) {
> >>   func1(0);
> >>   func1(1);
> >>   func2(1);
> >> }
> >> void func1(int alloc_or_recurse) {
> >>   if (alloc_or_recurse) {
> >>     frame_pointer[-300] = 1;
> >>   } else {
> >>     func2(alloc_or_recurse);
> >>   }
> >> }
> >> void func2(int alloc_or_recurse) {
> >>   if (alloc_or_recurse) {
> >>     frame_pointer[-300] = 1;
> >>   }
> >> }
> >>
> >> AFAICS this will work as follows:
> >>
> >> Call to func1->func2 runs without any stack accesses because the
> >> verifier can prove that alloc_or_recurse is 0.
> >
> > argh. right.
> > I guess that ruins my attemp to do the stack check inline
> > with the main verifier pass.
> > Do you see an algorithm that can do it without extra
> > cfg walk at the end?
> 
> A crappy heuristic would be to forbid recursion (calling a function
> that is already present somewhere in the call stack) 

the recursion is already forbidden. It's a 'back-edge' from cfg point
of view. There are 3 tests that cover that in few variants.

> and then sum up
> the maximum stack depths of all functions at the end and see whether
> the sum is bigger than the maximum stack size. While it'd be horribly
> conservative, it might work for now? 512 bytes are a lot of stack.

That's what I tried first while developing bpf calls.
It should have been good enough, but round up to 32-byte makes even
tiny function into sizeable and both *_noinline.c tests fail to load.
Arguably they are artificial stress test that push the limits with
number of calls, argument passing and pointer accesses, but I don't
want to scale them down.

> Or as a more complicated, but slightly less conservative heuristic,
> you could forbid recursion, record the maximum number of stack frames
> (max_stack_frames), then at the end select the top max_stack_frames
> functions with the biggest stack sizes and sum up their sizes?

it's pretty much the same as previous one. In these two tests I'm
building long stack chain out of all functions.

> Anything else I can come up with is probably more complicated than an
> extra cfg walk.

I guess we have to generalize (or remember) cfg anyway for future use,
so will code it up now and see how it looks.

Thank you for the feedback!