linux-kernel - Re: [PATCH bpf] bpf: Don't WARN_ON_ONCE in bpf_bprintf

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <875174b0-c0f1-8a41-ef00-3f0fe0396288@iogearbox.net>
Date:   Thu, 6 May 2021 23:38:38 +0200
From:   Daniel Borkmann <daniel@...earbox.net>
To:     Florent Revest <revest@...omium.org>,
        Andrii Nakryiko <andrii.nakryiko@...il.com>
Cc:     bpf <bpf@...r.kernel.org>, Alexei Starovoitov <ast@...nel.org>,
        Andrii Nakryiko <andrii@...nel.org>,
        KP Singh <kpsingh@...nel.org>,
        Brendan Jackman <jackmanb@...gle.com>,
        Stanislav Fomichev <sdf@...gle.com>,
        open list <linux-kernel@...r.kernel.org>,
        syzbot <syzbot@...kaller.appspotmail.com>
Subject: Re: [PATCH bpf] bpf: Don't WARN_ON_ONCE in bpf_bprintf_prepare

On 5/6/21 10:17 PM, Florent Revest wrote:
> On Thu, May 6, 2021 at 8:52 PM Andrii Nakryiko
> <andrii.nakryiko@...il.com> wrote:
>> On Wed, May 5, 2021 at 3:29 PM Florent Revest <revest@...omium.org> wrote:
>>> On Wed, May 5, 2021 at 10:52 PM Andrii Nakryiko
>>> <andrii.nakryiko@...il.com> wrote:
>>>> On Wed, May 5, 2021 at 1:48 PM Andrii Nakryiko
>>>> <andrii.nakryiko@...il.com> wrote:
>>>>> On Wed, May 5, 2021 at 1:00 PM Daniel Borkmann <daniel@...earbox.net> wrote:
>>>>>> On 5/5/21 8:55 PM, Andrii Nakryiko wrote:
>>>>>>> On Wed, May 5, 2021 at 9:23 AM Florent Revest <revest@...omium.org> wrote:
>>>>>>>>
>>>>>>>> The bpf_seq_printf, bpf_trace_printk and bpf_snprintf helpers share one
>>>>>>>> per-cpu buffer that they use to store temporary data (arguments to
>>>>>>>> bprintf). They "get" that buffer with try_get_fmt_tmp_buf and "put" it
>>>>>>>> by the end of their scope with bpf_bprintf_cleanup.
>>>>>>>>
>>>>>>>> If one of these helpers gets called within the scope of one of these
>>>>>>>> helpers, for example: a first bpf program gets called, uses
>>>>>>>
>>>>>>> Can we afford having few struct bpf_printf_bufs? They are just 512
>>>>>>> bytes, so can we have 3-5 of them? Tracing low-level stuff isn't the
>>>>>>> only situation where this can occur, right? If someone is doing
>>>>>>> bpf_snprintf() and interrupt occurs and we run another BPF program, it
>>>>>>> will be impossible to do bpf_snprintf() or bpf_trace_printk() from the
>>>>>>> second BPF program, etc. We can't eliminate the probability, but
>>>>>>> having a small stack of buffers would make the probability so
>>>>>>> miniscule as to not worry about it at all.
>>>>>>>
>>>>>>> Good thing is that try_get_fmt_tmp_buf() abstracts all the details, so
>>>>>>> the changes are minimal. Nestedness property is preserved for
>>>>>>> non-sleepable BPF programs, right? If we want this to work for
>>>>>>> sleepable we'd need to either: 1) disable migration or 2) instead of
>>>>>
>>>>> oh wait, we already disable migration for sleepable BPF progs, so it
>>>>> should be good to do nestedness level only
>>>>
>>>> actually, migrate_disable() might not be enough. Unless it is
>>>> impossible for some reason I miss, worst case it could be that two
>>>> sleepable programs (A and B) can be intermixed on the same CPU: A
>>>> starts&sleeps - B starts&sleeps - A continues&returns - B continues
>>>> and nestedness doesn't work anymore. So something like "reserving a
>>>> slot" would work better.
>>>
>>> Iiuc try_get_fmt_tmp_buf does preempt_enable to avoid that situation ?
>>>
>>>>>>> assuming a stack of buffers, do a loop to find unused one. Should be
>>>>>>> acceptable performance-wise, as it's not the fastest code anyway
>>>>>>> (printf'ing in general).
>>>>>>>
>>>>>>> In any case, re-using the same buffer for sort-of-optional-to-work
>>>>>>> bpf_trace_printk() and probably-important-to-work bpf_snprintf() is
>>>>>>> suboptimal, so seems worth fixing this.
>>>>>>>
>>>>>>> Thoughts?
>>>>>>
>>>>>> Yes, agree, it would otherwise be really hard to debug. I had the same
>>>>>> thought on why not allowing nesting here given users very likely expect
>>>>>> these helpers to just work for all the contexts.
>>>>>>
>>>>>> Thanks,
>>>>>> Daniel
>>>
>>> What would you think of just letting the helpers own these 512 bytes
>>> buffers as local variables on their stacks ? Then bpf_prepare_bprintf
>>> would only need to write there, there would be no acquire semantic
>>> (like try_get_fmt_tmp_buf) and the stack frame would just be freed on
>>> the helper return so there would be no bpf_printf_cleanup either. We
>>> would also not pre-reserve static memory for all CPUs and it becomes
>>> trivial to handle re-entrant helper calls.
>>>
>>> I inherited this per-cpu buffer from the pre-existing bpf_seq_printf
>>> code but I've not been convinced of its necessity.
>>
>> I got the impression that extra 512 bytes on the kernel stack is quite
>> a lot and that's why we have per-cpu buffers. Especially that
>> bpf_trace_printk() can be called from any context, including NMI.
> 
> Ok, I understand.
> 
> What about having one buffer per helper, synchronized with a spinlock?
> Actually, bpf_trace_printk already has that, not for the bprintf
> arguments but for the bprintf output so this wouldn't change much to
> the performance of the helpers anyway:
> https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/tree/kernel/trace/bpf_trace.c?id=9d31d2338950293ec19d9b095fbaa9030899dcb4#n385
> 
> These helpers are not performance sensitive so a per-cpu stack of
> buffers feels over-engineered to me (and is also complexity I feel a
> bit uncomfortable with).

But wouldn't this have same potential of causing a deadlock? Simple example
would be if you have a tracing prog attached to bstr_printf(), and one of
the other helpers using the same lock called from a non-tracing prog. If
it can be avoided fairly easily, I'd also opt for per-cpu buffers as Andrii
mentioned earlier. We've had few prior examples with similar issues [0].

   [0] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9594dc3c7e71b9f52bee1d7852eb3d4e3aea9e99