netdev - Re: BUG: unable to handle kernel paging request in bpf_int_jit

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20180624100249.GA9493@gmail.com>
Date:   Sun, 24 Jun 2018 12:02:50 +0200
From:   Ingo Molnar <mingo@...nel.org>
To:     David Miller <davem@...emloft.net>
Cc:     tglx@...utronix.de,
        syzbot+a4eb8c7766952a1ca872@...kaller.appspotmail.com,
        ast@...nel.org, daniel@...earbox.net, hpa@...or.com,
        kuznet@....inr.ac.ru, linux-kernel@...r.kernel.org,
        mingo@...hat.com, netdev@...r.kernel.org,
        syzkaller-bugs@...glegroups.com, x86@...nel.org,
        yoshfuji@...ux-ipv6.org, peterz@...radead.org
Subject: Re: BUG: unable to handle kernel paging request in
 bpf_int_jit_compile

* David Miller <davem@...emloft.net> wrote:

> From: Thomas Gleixner <tglx@...utronix.de>
> Date: Sun, 24 Jun 2018 09:09:09 +0200 (CEST)
> 
> > I'm really tempted to make the BPF config switch depend on BROKEN. 
> 
> This really isn't necessary Thomas.
> 
> Whoever wrote the code didn't understand that set ro can legitimately
> fail.

No, that's *NOT* the only thing that happened, according to the Git history.

The first use of set_memory_ro() in include/linux/filter.h was added by
this commit almost four years ago:

  # 2014/09
  60a3b2253c41 ("net: bpf: make eBPF interpreter images read-only")

... and yes, that commit didn't anticipate the (in hindsight) obvious property of 
a function that changes global kernel mappings that if it is used after bootup 
without locking it 'may fail'. So that commit slipping through is 'shit happens' 
and I don't think we ever complained about such things slipping through.

But what happened after that is not so good:

A bit over two years later a crash was found:

    Eric and Willem reported that they recently saw random crashes when
    JIT was in use and bisected this to 74451e66d516 ("bpf: make jited
    programs visible in traces"). Issue was that the consolidation part
    added bpf_jit_binary_unlock_ro() that would unlock previously made
    read-only memory back to read-write.

... but instead of fixing it for real, it was only tinkered with:

  # 2017//02
  9d876e79df6a ("bpf: fix unlocking of jited image when module ronx not set")

... but the problems persisted:

    Improve bpf_{prog,jit_binary}_{un,}lock_ro() by throwing a
    one-time warning in case of an error when the image couldn't
    be set read-only, and also mark struct bpf_prog as locked when
    bpf_prog_lock_ro() was called.

... so the warnings Thomas complained about here were then added a month later:

  # 2017/03
  65869a47f348 ("bpf: improve read-only handling")

It 'improved' nothing of the sort, and the warnings and 'debug code' shows that
the author was aware that these functions could actually fail. To quote the fine
code, introduced a year ago:

                WARN_ON_ONCE(set_memory_rw((unsigned long)fp, fp->pages));
                /* In case set_memory_rw() fails, we want to be the first
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                 * to crash here instead of some random place later on.
                 */
                fp->locked = 0;

... and then, this month, it was tweaked *YET ANOTHER TIME*:

    bpf: reject any prog that failed read-only lock

    We currently lock any JITed image as read-only via bpf_jit_binary_lock_ro()
    as well as the BPF image as read-only through bpf_prog_lock_ro(). In
    the case any of these would fail we throw a WARN_ON_ONCE() in order to
    yell loudly to the log. Perhaps, to some extend, this may be comparable
    to an allocation where __GFP_NOWARN is explicitly not set.

  # 2018/06
  9facc336876f ("bpf: reject any prog that failed read-only lock")

The tone of uncertainty of the changelog, combined with the unfixed typo in it, 
suggests that this commit too was just waved through to upstream without any real 
review and without much design thinking behind it.

And yes, this was still not the right fix, as the fuzzer crash reported in this 
thread outlines - we'll probably need a 5th commit?

> So let's correct that instead of flaming a feature.

So accusing Thomas of 'flaming a feature' is a really unfair attack in light of 
all the details above.

Thanks,

	Ingo