[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7c27726c-9bba-8d7c-55b4-69d7af287382@fb.com>
Date: Wed, 15 Jul 2020 13:59:56 -0700
From: Yonghong Song <yhs@...com>
To: Jakub Sitnicki <jakub@...udflare.com>
CC: <bpf@...r.kernel.org>, <netdev@...r.kernel.org>,
<kernel-team@...udflare.com>, Alexei Starovoitov <ast@...nel.org>,
Daniel Borkmann <daniel@...earbox.net>
Subject: Re: [PATCH bpf] bpf: Shift and mask loads narrower than context field
size
On 7/15/20 12:26 PM, Jakub Sitnicki wrote:
> On Wed, Jul 15, 2020 at 08:44 AM CEST, Yonghong Song wrote:
>> On 7/10/20 10:31 AM, Jakub Sitnicki wrote:
>>> When size of load from context is the same as target field size, but less
>>> than context field size, the verifier does not emit the shift and mask
>>> instructions for loads at non-zero offset.
>>>
>>> This has the unexpected effect of loading the same data no matter what the
>>> offset was. While the expected behavior would be to load zeros for offsets
>>> that are greater than target field size.
>>>
>>> For instance, u16 load from u32 context field backed by u16 target field at
>>> an offset of 2 bytes results in:
>>>
>>> SEC("sk_reuseport/narrow_half")
>>> int reuseport_narrow_half(struct sk_reuseport_md *ctx)
>>> {
>>> __u16 *half;
>>>
>>> half = (__u16 *)&ctx->ip_protocol;
>>> if (half[0] == 0xaaaa)
>>> return SK_DROP;
>>> if (half[1] == 0xbbbb)
>>> return SK_DROP;
>>> return SK_PASS;
>>> }
>>
>> It would be good if you can include llvm asm output like below so people
>> can correlate source => asm => xlated codes:
>>
>> 0: w0 = 0
>> 1: r2 = *(u16 *)(r1 + 24)
>> 2: if w2 == 43690 goto +4 <LBB0_3>
>> 3: r1 = *(u16 *)(r1 + 26)
>> 4: w0 = 1
>> 5: if w1 != 48059 goto +1 <LBB0_3>
>> 6: w0 = 0
>>
>> 0000000000000038 <LBB0_3>:
>> 7: exit
>
> Sure, not a problem, if it makes reasoning about the problem easier.
> I'm assuming that it is the narrow load at an offset that you wanted to
> confirm with asm output.
>
>>
>>>
>>> int reuseport_narrow_half(struct sk_reuseport_md * ctx):
>>> ; int reuseport_narrow_half(struct sk_reuseport_md *ctx)
>>> 0: (b4) w0 = 0
>>> ; if (half[0] == 0xaaaa)
>>> 1: (79) r2 = *(u64 *)(r1 +8)
>>> 2: (69) r2 = *(u16 *)(r2 +924)
>>> ; if (half[0] == 0xaaaa)
>>> 3: (16) if w2 == 0xaaaa goto pc+5
>>> ; if (half[1] == 0xbbbb)
>>> 4: (79) r1 = *(u64 *)(r1 +8)
>>> 5: (69) r1 = *(u16 *)(r1 +924)
>>> 6: (b4) w0 = 1
>>> ; if (half[1] == 0xbbbb)
>>> 7: (56) if w1 != 0xbbbb goto pc+1
>>> 8: (b4) w0 = 0
>>> ; }
>>> 9: (95) exit
>>
>> Indeed we have an issue here. The insn 5 is not correct.
>> The original assembly is correct.
>>
>> Internally ip_protocol is backed by 2 bytes in sk_reuseport_kern.
>> The current verifier implementation makes an important assumption:
>> all user load requests are within the size of kernel internal range
>> In this case, the verifier actually only correctly supports
>> . one byte from offset 0
>> . one byte from offset 1
>> . two bytes from offset 0
>
> I don't think that's true. For a field that has target size of 2 bytes,
> like ip_protocol, 1-byte load at any offset is correctly supported
> because right shifting and masking takes place. That is because we hit
> the "size < target_size" condition in this case. Only loads of size >=
> target size at an offset != give surprising results.
okay, yes, you are right. We relied on the fact even the inst is to load
2 bytes but effectively we load 8 bytes always for 64bit load and 4
bytes always for 32bit load. For any other one byte access, shifting and
masking will kick in to make it work although the value is 0.
>
>>
>> The original assembly code tries to access 2 bytes from offset 2
>> and the verifier did incorrect transformation.
>>
>> This actually makes sense since any other read is
>> misleading. For example, for ip_protocol, if people wants to
>> load 2 bytes from offset 2, what should we return? 0? In this case,
>> actually verifier can convert it to 0 with doing a load.
>
> Yes, IMHO, if you are loading 2 bytes at offset of 2 from a 4 byte
> context field that holds an unsigned value, then it should return 0 for
> for a field that is backed by a 2 byte kernel field.
>
> I agree that it could be optimized to load an immediate value instead of
> performing a load from memory. It didn't occur to me, so thanks for the
> suggestion.
>
>>> In this case half[0] == half[1] == sk->sk_protocol that backs the
>>> ctx->ip_protocol field.
>>>
>>> Fix it by shifting and masking any load from context that is narrower than
>>> context field size (is_narrower_load = size < ctx_field_size), in addition
>>> to loads that are narrower than target field size.
>>
>> The fix can workaround the issue, but I think we should generate better codes
>> for such cases.
>
> Not sure I'd go as far as calling it a workaround. After all I
> understand that in BPF loading a half word into a register is well
> defined, and you can rely on upper word being zero. But please correct
> me if not.
This is true. I should have said that "the fix can fix the issue".
>
> You're right, though, that approach can be smarter here, that is we can
> emit just a single instruction that doesn't access memory.
>
>>> The "size < target_size" check is left in place to cover the case when a
>>> context field is narrower than its target field, even if we might not have
>>> such case now. (It would have to be a u32 context field backed by a u64
>>> target field, with context fields all being 4-bytes or wider.)
>>>
>>> Going back to the example, with the fix in place, the upper half load from
>>> ctx->ip_protocol yields zero:
>>>
>>> int reuseport_narrow_half(struct sk_reuseport_md * ctx):
>>> ; int reuseport_narrow_half(struct sk_reuseport_md *ctx)
>>> 0: (b4) w0 = 0
>>> ; if (half[0] == 0xaaaa)
>>> 1: (79) r2 = *(u64 *)(r1 +8)
>>> 2: (69) r2 = *(u16 *)(r2 +924)
>>> 3: (54) w2 &= 65535
>>> ; if (half[0] == 0xaaaa)
>>> 4: (16) if w2 == 0xaaaa goto pc+7
>>> ; if (half[1] == 0xbbbb)
>>> 5: (79) r1 = *(u64 *)(r1 +8)
>>> 6: (69) r1 = *(u16 *)(r1 +924)
>>
>> The load is still from offset 0, 2 bytes with upper 48 bits as 0.
>
> Yes, this is how narrow loads currently work, right? It is not specific
> to the case I'm fixing.
>
> To give an example - if you do a 1-byte load at offset 1, it will load
> the value from offset 0, and shift it right by 1 byte. So it is expected
> that the load is always from offset 0 with current implementation.
Yes, the load is always from offset 0. The confusion part is
it load offset 0 with 2 bytes and then right shifting 2 bytes
to get 0...
>
> SEC("sk_reuseport/narrow_byte")
> int reuseport_narrow_byte(struct sk_reuseport_md *ctx)
> {
> __u8 *byte;
>
> byte = (__u8 *)&ctx->ip_protocol;
> if (byte[0] == 0xaa)
> return SK_DROP;
> if (byte[1] == 0xbb)
> return SK_DROP;
> if (byte[2] == 0xcc)
> return SK_DROP;
> if (byte[3] == 0xdd)
> return SK_DROP;
> return SK_PASS;
> }
>
> int reuseport_narrow_byte(struct sk_reuseport_md * ctx):
> ; int reuseport_narrow_byte(struct sk_reuseport_md *ctx)
> 0: (b4) w0 = 0
> ; if (byte[0] == 0xaa)
> 1: (79) r2 = *(u64 *)(r1 +8)
> 2: (69) r2 = *(u16 *)(r2 +924)
> 3: (54) w2 &= 255
> ; if (byte[0] == 0xaa)
> 4: (16) if w2 == 0xaa goto pc+17
> ; if (byte[1] == 0xbb)
> 5: (79) r2 = *(u64 *)(r1 +8)
> 6: (69) r2 = *(u16 *)(r2 +924)
> 7: (74) w2 >>= 8
> 8: (54) w2 &= 255
> ; if (byte[1] == 0xbb)
> 9: (16) if w2 == 0xbb goto pc+12
> ; if (byte[2] == 0xcc)
> 10: (79) r2 = *(u64 *)(r1 +8)
> 11: (69) r2 = *(u16 *)(r2 +924)
> 12: (74) w2 >>= 16
> 13: (54) w2 &= 255
> ; if (byte[2] == 0xcc)
> 14: (16) if w2 == 0xcc goto pc+7
> ; if (byte[3] == 0xdd)
> 15: (79) r1 = *(u64 *)(r1 +8)
> 16: (69) r1 = *(u16 *)(r1 +924)
> 17: (74) w1 >>= 24
> 18: (54) w1 &= 255
> 19: (b4) w0 = 1
> ; if (byte[3] == 0xdd)
> 20: (56) if w1 != 0xdd goto pc+1
> 21: (b4) w0 = 0
> ; }
> 22: (95) exit
>
>>
>>> 7: (74) w1 >>= 16
>>
>> w1 will be 0 now. so this will work.
>>
>>> 8: (54) w1 &= 65535
>>
>> For the above insns 5-8, verifier, based on target information can
>> directly generate w1 = 0 since:
>> . target kernel field size is 2, ctx field size is 4.
>> . user tries to access offset 2 size 2.
>>
>> Here, we need to decide whether we permits user to do partial read beyond of
>> kernel narrow field or not (e.g., this example)? I would
>> say yes, but Daniel or Alexei can provide additional comments.
>>
>> If we allow such accesses, I would like verifier to generate better
>> code as I illustrated in the above. This can be implemented in
>> verifier itself with target passing additional kernel field size
>> to the verifier. The target already passed the ctx field size back
>> to the verifier.
>
> Keep in mind that the BPF user is writing their code under the
> assumption that the context field has 4 bytes. IMHO it's reasonable to
> expect that I can load 2 bytes at offset of 2 from a 4 byte field.
>
> Restricting it now to loads below the target field size, which is
> unknown to the user, would mean rejecting programs that are working
> today. Even if they are getting funny values.
>
> I think implementing what you suggest is doable without major
> changes. We have load size, target field size, and context field size at
> hand in convert_ctx_accesses(), so it seems like a matter of adding an
> 'if' branch to handle better the case when we know the end result must
> be 0. I'll give it a try.
Sounds good. The target_size is returned in convert_ctx_access(), which
is too late as the verifier already generated load instructions. You
need to get it earlier in is_valid_access().
>
> But I do want to empahsize that I still think the fix in current form is
> correct, or at least not worse than what we have already in place narrow
> loads.
I did agree that the fix in this patch is correct. It is just that we
could do better to fix this problem.
>
>>
>>> 9: (b4) w0 = 1
>>> ; if (half[1] == 0xbbbb)
>>> 10: (56) if w1 != 0xbbbb goto pc+1
>>> 11: (b4) w0 = 0
>>> ; }
>>> 12: (95) exit
>>>
>>> Fixes: f96da09473b5 ("bpf: simplify narrower ctx access")
>>> Signed-off-by: Jakub Sitnicki <jakub@...udflare.com>
>>> ---
>>> kernel/bpf/verifier.c | 2 +-
>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
>>> index 94cead5a43e5..1c4d0e24a5a2 100644
>>> --- a/kernel/bpf/verifier.c
>>> +++ b/kernel/bpf/verifier.c
>>> @@ -9760,7 +9760,7 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env)
>>> return -EINVAL;
>>> }
>>> - if (is_narrower_load && size < target_size) {
>>> + if (is_narrower_load || size < target_size) {
>>> u8 shift = bpf_ctx_narrow_access_offset(
>>> off, size, size_default) * 8;
>>> if (ctx_field_size <= 4) {
>>>
Powered by blists - more mailing lists