netdev - Re: 4-year old off-by-two bug in the BPF verifier's boundary checks?

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <9c2407eb-9d67-acd6-a38d-8d390eedfb5a@nvidia.com>
Date:   Fri, 26 Nov 2021 19:07:16 +0200
From:   Maxim Mikityanskiy <maximmi@...dia.com>
To:     Daniel Borkmann <daniel@...earbox.net>,
        Alexei Starovoitov <ast@...nel.org>,
        John Fastabend <john.fastabend@...il.com>
CC:     "David S. Miller" <davem@...emloft.net>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        Tariq Toukan <tariqt@...dia.com>
Subject: Re: 4-year old off-by-two bug in the BPF verifier's boundary checks?

On 2021-11-25 16:33, Maxim Mikityanskiy wrote:
> On 2021-11-09 13:34, Daniel Borkmann wrote:
>> Hi Maxim,
>>
>> On 11/2/21 4:12 PM, Maxim Mikityanskiy wrote:
>>> Hi guys,
>>>
>>> I think I found cases where the BPF verifier mistakenly rejects valid 
>>> BPF programs when doing pkt_end boundary checks, and the selftests 
>>> for these cases test wrong things as well.
>>>
>>> Daniel's commit fb2a311a31d3 ("bpf: fix off by one for range markings 
>>> with L{T, E} patterns") [1] attempts to fix an off-by-one bug in 
>>> boundary checks, but I think it shifts the index by 1 in a wrong 
>>> direction, so instead of fixing, the bug becomes off-by-two.
>>>
>>> A following commit b37242c773b2 ("bpf: add test cases to bpf 
>>> selftests to cover all access tests") [2] adds unit tests to check 
>>> the new behavior, but the tests look also wrong to me.
>>>
>>> Let me analyze these two tests:
>>>
>>> {
>>>          "XDP pkt read, pkt_data' > pkt_end, good access",
>>>          .insns = {
>>>          BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1, offsetof(struct 
>>> xdp_md, data)),
>>>          BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_1,
>>>                      offsetof(struct xdp_md, data_end)),
>>>          BPF_MOV64_REG(BPF_REG_1, BPF_REG_2),
>>>          BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 8),
>>>          BPF_JMP_REG(BPF_JGT, BPF_REG_1, BPF_REG_3, 1),
>>>          BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1, -8),
>>>          BPF_MOV64_IMM(BPF_REG_0, 0),
>>>          BPF_EXIT_INSN(),
>>>          },
>>>          .result = ACCEPT,
>>>          .prog_type = BPF_PROG_TYPE_XDP,
>>>          .flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS,
>>> },
>>>
>>> {
>>>          "XDP pkt read, pkt_data' >= pkt_end, bad access 1",
>>>          .insns = {
>>>          BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1, offsetof(struct 
>>> xdp_md, data)),
>>>          BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_1,
>>>                      offsetof(struct xdp_md, data_end)),
>>>          BPF_MOV64_REG(BPF_REG_1, BPF_REG_2),
>>>          BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 8),
>>>          BPF_JMP_REG(BPF_JGE, BPF_REG_1, BPF_REG_3, 1),
>>>          BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1, -8),
>>>          BPF_MOV64_IMM(BPF_REG_0, 0),
>>>          BPF_EXIT_INSN(),
>>>          },
>>>          .errstr = "R1 offset is outside of the packet",
>>>          .result = REJECT,
>>>          .prog_type = BPF_PROG_TYPE_XDP,
>>>          .flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS,
>>> },
>>>
>>> The first program looks good both to me and the verifier: if data + 8 
>>> > data_end, we bail out, otherwise, if data + 8 <= data_end, we read 
>>> 8 bytes: [data; data+7].
>>>
>>> The second program doesn't pass the verifier, and the test expects it 
>>> to be rejected, but the program itself still looks fine to me: if 
>>> data + 8 >= data_end, we bail out, otherwise, if data + 8 < data_end, 
>>> we read 8 bytes: [data; data+7], and this is fine, because data + 7 
>>> is for sure < data_end. The verifier considers data + 7 to be out of 
>>> bounds, although both data + 7 and data + 8 are still valid offsets, 
>>> hence the off-by-two bug.
>>>
>>> Are my considerations valid, or am I stupidly missing anything?
>>
>> Sorry for my late reply, bit too swamped lately. So we have the two 
>> variants:
>>
>>    r2 = data;
>>    r2 += 8;
>>    if (r2 > data_end) goto <handle exception>
>>      <access okay>
>>
>>    r2 = data;
>>    r2 += 8;
>>    if (r2 >= data_end) goto <handle exception>
>>      <access okay>
>>
>> Technically, the first option is the more correct way to check, 
>> meaning, we have 8 bytes of
>> access in the <access okay> branch. The second one is overly 
>> pessimistic in that if r2 equals
>> data_end we bail out even though we wouldn't have to. So in that case 
>> <access okay> branch
>> would have 9 bytes for access since r2 with offset 8 is already < 
>> data_end.
>>
>> Anyway, please send a fix and updated test cases. Thanks Maxim!
> 
> Just pinging with my status: I'm still on it, I returned from vacation 
> and back to work, but I'm currently struggling with running the BPF 
> selftests.
> 
> I'm using tools/testing/selftests/bpf/vmtest.sh, I've hit a few issues 
> trying to make it work, especially the glibc version issue (I have glibc 
> 2.33 on my host, but the VM image has 2.32 and can't run binaries 
> compiled on the host), for which I applied this workaround to build the 
> test progs statically:
> 
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.spinics.net%2Flists%2Fbpf%2Fmsg41647.html&amp;data=04%7C01%7Cmaximmi%40nvidia.com%7C13976fd5b93e4df1a6ca08d9b020eaaf%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637734477702442983%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=sklOvJNabJJtUzktnCw1s0M4pLb7UJnd0xezhvvH8os%3D&amp;reserved=0 
> 
> 
> However, the test suite just hangs after:
> 
> ...
> + /etc/rcS.d/S50-startup
> ./test_progs
> [    1.639277] bpf_testmod: loading out-of-tree module taints kernel.
> #1 align:OK
> #2 atomic_bounds:OK
> [    1.824515] tsc: Refined TSC clocksource calibration: 2399.983 MHz
> [    1.826421] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 
> 0x22982765f14, max_idle_ns: 440795222551 ns
> [    1.829486] clocksource: Switched to clocksource tsc
> #3 atomics:OK
> #4 attach_probe:OK
> #5 autoload:OK
> #6 bind_perm:OK
> #7 bloom_filter_map:OK
> #8 bpf_cookie:OK

I figured out that I actually had to run test_verifier, rather than 
test_progs, but I also found the issue with test_progs:

tools/testing/selftests/bpf/prog_tests/bpf_iter.c:

/* Read CMP_BUFFER_SIZE (1kB) from bpf_iter. Read in small chunks
  * to trigger seq_file corner cases. The expected output is much
  * longer than 1kB, so the while loop will terminate.
  */
len = 0;
while (len < CMP_BUFFER_SIZE) {
         err = read_fd_into_buffer(iter_fd, task_vma_output + len,
                                   min(read_size, CMP_BUFFER_SIZE - len));
         if (CHECK(err < 0, "read_iter_fd", "read_iter_fd failed\n"))
                 goto out;
         len += err;
}

The output was actually shorter than 1K, err was 0, and the loop was 
infinite. I think a simple `if (!err) break;` should fix it. I'll submit 
it as well.

> 
> Any hint would be much appreciated. I'm trying to do my debugging too.
> 
> Thanks,
> Max