netdev - Re: [PATCH bpf-next 1/3] bpf: Introduce bpf

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20210603015809.l2dez754vzxjueew@ast-mbp.dhcp.thefacebook.com>
Date:   Wed, 2 Jun 2021 18:58:09 -0700
From:   Alexei Starovoitov <alexei.starovoitov@...il.com>
To:     Toke Høiland-Jørgensen <toke@...hat.com>
Cc:     davem@...emloft.net, daniel@...earbox.net, andrii@...nel.org,
        netdev@...r.kernel.org, bpf@...r.kernel.org, kernel-team@...com
Subject: Re: [PATCH bpf-next 1/3] bpf: Introduce bpf_timer

On Thu, Jun 03, 2021 at 12:21:05AM +0200, Toke Høiland-Jørgensen wrote:
> Alexei Starovoitov <alexei.starovoitov@...il.com> writes:
> 
> > On Thu, May 27, 2021 at 06:57:17PM +0200, Toke Høiland-Jørgensen wrote:
> >> >     if (val) {
> >> >         bpf_timer_init(&val->timer, timer_cb2, 0);
> >> >         bpf_timer_start(&val->timer, 1000 /* call timer_cb2 in 1 msec */);
> >> 
> >> nit: there are 1M nanoseconds in a millisecond :)
> >
> > oops :)
> >
> >> >     }
> >> > }
> >> >
> >> > This patch adds helper implementations that rely on hrtimers
> >> > to call bpf functions as timers expire.
> >> > The following patch adds necessary safety checks.
> >> >
> >> > Only programs with CAP_BPF are allowed to use bpf_timer.
> >> >
> >> > The amount of timers used by the program is constrained by
> >> > the memcg recorded at map creation time.
> >> >
> >> > The bpf_timer_init() helper is receiving hidden 'map' and 'prog' arguments
> >> > supplied by the verifier. The prog pointer is needed to do refcnting of bpf
> >> > program to make sure that program doesn't get freed while timer is armed.
> >> >
> >> > Signed-off-by: Alexei Starovoitov <ast@...nel.org>
> >> 
> >> Overall this LGTM, and I believe it will be usable for my intended use
> >> case. One question:
> >> 
> >> With this, it will basically be possible to create a BPF daemon, won't
> >> it? I.e., if a program includes a timer and the callback keeps re-arming
> >> itself this will continue indefinitely even if userspace closes all refs
> >> to maps and programs? Not saying this is a problem, just wanted to check
> >> my understanding (i.e., that there's not some hidden requirement on
> >> userspace keeping a ref open that I'm missing)...
> >
> > That is correct.
> > Another option would be to auto-cancel the timer when the last reference
> > to the prog drops. That may feel safer, since forever
> > running bpf daemon is a certainly new concept.
> > The main benefits of doing prog_refcnt++ from bpf_timer_start are ease
> > of use and no surprises.
> > Disappearing timer callback when prog unloads is outside of bpf prog control.
> > For example the tracing bpf prog might collect some data and periodically
> > flush it to user space. The prog would arm the timer and when callback
> > is invoked it would send the data via ring buffer and start another
> > data collection cycle.
> > When the user space part of the service exits it doesn't have
> > an ability to enforce the flush of the last chunk of data.
> > It could do prog_run cmd that will call the timer callback,
> > but it's racy.
> > The solution to this problem could be __init and __fini
> > sections that will be invoked when the prog is loaded
> > and when the last refcnt drops.
> > It's a complementary feature though.
> > The prog_refcnt++ from bpf_timer_start combined with a prog
> > explicitly doing bpf_timer_cancel from __fini
> > would be the most flexible combination.
> > This way the prog can choose to be a daemon or it can choose
> > to cancel its timers and do data flushing when the last prog
> > reference drops.
> > The prog refcnt would be split (similar to uref). The __fini callback
> > will be invoked when refcnt reaches zero, but all increments
> > done by bpf_timer_start will be counted separately.
> > The user space wouldn't need to do the prog_run command.
> > It would detach the prog and close(prog_fd).
> > That will trigger __fini callback that will cancel the timers
> > and the prog will be fully unloaded.
> > That would make bpf progs resemble kernel modules even more.
> 
> I like the idea of a "destructor" that will trigger on refcnt drop to
> zero. And I do think a "bpf daemon" is potentially a useful, if novel,
> concept.

I think so too. Long ago folks requested periodic bpf progs to
do sampling in tracing. All these years attaching bpf prog
to a perf_event was a workaround for such feature request.
perf_event bpf prog can be pinned in perf_event array,
so "bpf daemon" kinda exist today. Just more convoluted.

> The __fini thing kinda supposes a well-behaved program, though, right?
> I.e., it would be fairly trivial to write a program that spins forever
> by repeatedly scheduling the timer with a very short interval (whether
> by malice or bugginess).

It's already possible without bpf_timer.

> So do we need a 'bpfkill' type utility to nuke
> buggy programs, or how would resource constraints be enforced?

That is possible without 'bpfkill'.
bpftool can delete map element that contains bpf_timer and
that will cancel it. I'll add tests to make sure it's the case.