lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAD8CoPCfPmqZH6BJCk3Y1-02BLVVsbQ6OeaNOhcfGWmdF0oX8A@mail.gmail.com>
Date:   Mon, 15 May 2023 13:59:59 +0800
From:   Ze Gao <zegao2021@...il.com>
To:     Yonghong Song <yhs@...a.com>
Cc:     Song Liu <song@...nel.org>, Jiri Olsa <jolsa@...nel.org>,
        Alexei Starovoitov <ast@...nel.org>,
        Daniel Borkmann <daniel@...earbox.net>,
        Andrii Nakryiko <andrii@...nel.org>,
        Martin KaFai Lau <martin.lau@...ux.dev>,
        Yonghong Song <yhs@...com>,
        John Fastabend <john.fastabend@...il.com>,
        KP Singh <kpsingh@...nel.org>,
        Stanislav Fomichev <sdf@...gle.com>,
        Hao Luo <haoluo@...gle.com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Masami Hiramatsu <mhiramat@...nel.org>,
        Ze Gao <zegao@...cent.com>, bpf@...r.kernel.org,
        linux-kernel@...r.kernel.org, linux-trace-kernel@...r.kernel.org
Subject: Re: [PATCH] bpf: reject blacklisted symbols in kprobe_multi to avoid
 recursive trap

Dear all,

On Thu, May 11, 2023 at 9:06 AM Ze Gao <zegao2021@...il.com> wrote:
>
> I'm afraid filtering in user space tools is not enough, cause it's a kernel BUG.
>
> it would 100% trigger a kernel crash if you run cmd like
>
> retsnoop -e 'pick_next_task_fair' -a ':kernel/sched/*.c' -vvv
>
> which is caused by that BPF_LINK_TYPE_KPROBE_MULTI accidentally
> attaches bpf progs
> to preempt_count_{add, sub}, which in turn triggers stackoverflow
> because the handler itself
> calls those functions.

I managed to see the big picture of this problem by digging into the code.

here is what it goes:

rethook_trampoline_handler{
   preempt_disable() {
      preempt_count_add() {  <-- trace
          fprobe_handler() {
            ftrace_test_recursion_trylock
            ...
            ftrace_test_recursion_unlock    <- it fails to detect the
recursion caused by rethook (rethook_trampoline_handler precisely for
this case)  routines.
          }
          ...
          rethook_trampoline_handler {
          [ wash, rinse, repeat, CRASH!!! ]

There are some pitfalls here:
1.  fprobe exit callback should be guarded by ftrace recursion check
as well since user might call any traceable functions
just like kprobe_multi_link_prog_run calls migrate_{disable, enable}.
In this case,  detection in fprobe_handler only is not
enough.
2. rethook_trampoline_handler should use preempt_{disable,
enable}_notrace instead because it's beyond recursion-free
region guarded like 1
3. many rethook helper functions are also used outside the
recursion-free regions and therefore they should be marked
notrace

I've post a new series of patches to resolve cases mentioned above:
  [Link]: https://lkml.org/lkml/2023/5/14/452

In theory, bpf_kprobe as the user of fprobe+ rethook, is spared from
suffering recursion again by applying these. And
  [Link]: https://lore.kernel.org/all/20230513090548.376522-1-zegao@tencent.com/T/#u
becomes optional.

That leaves one final question, whether we need probe blacklist or
bpf_kprobe blacklist depends on how we deal with
user requests if one of its expected hook points fails because of
recursion detected. Do we need to reject them in the first place
by blacklist or  let it fail to execute the callback silently, which
needs your gentle advice.

Regards,
Ze

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ