[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <0cd71ea3-c529-7007-1f93-4442484834df@fb.com>
Date: Thu, 16 Jul 2020 10:38:13 -0700
From: Yonghong Song <yhs@...com>
To: Jakub Sitnicki <jakub@...udflare.com>
CC: <bpf@...r.kernel.org>, <netdev@...r.kernel.org>,
<kernel-team@...udflare.com>, Alexei Starovoitov <ast@...nel.org>,
Daniel Borkmann <daniel@...earbox.net>
Subject: Re: [PATCH bpf] bpf: Shift and mask loads narrower than context field
size
On 7/16/20 4:48 AM, Jakub Sitnicki wrote:
> On Wed, Jul 15, 2020 at 10:59 PM CEST, Yonghong Song wrote:
>> On 7/15/20 12:26 PM, Jakub Sitnicki wrote:
>>> On Wed, Jul 15, 2020 at 08:44 AM CEST, Yonghong Song wrote:
>>>> On 7/10/20 10:31 AM, Jakub Sitnicki wrote:
>
> [...]
>
>>>>> The "size < target_size" check is left in place to cover the case when a
>>>>> context field is narrower than its target field, even if we might not have
>>>>> such case now. (It would have to be a u32 context field backed by a u64
>>>>> target field, with context fields all being 4-bytes or wider.)
>>>>>
>>>>> Going back to the example, with the fix in place, the upper half load from
>>>>> ctx->ip_protocol yields zero:
>>>>>
>>>>> int reuseport_narrow_half(struct sk_reuseport_md * ctx):
>>>>> ; int reuseport_narrow_half(struct sk_reuseport_md *ctx)
>>>>> 0: (b4) w0 = 0
>>>>> ; if (half[0] == 0xaaaa)
>>>>> 1: (79) r2 = *(u64 *)(r1 +8)
>>>>> 2: (69) r2 = *(u16 *)(r2 +924)
>>>>> 3: (54) w2 &= 65535
>>>>> ; if (half[0] == 0xaaaa)
>>>>> 4: (16) if w2 == 0xaaaa goto pc+7
>>>>> ; if (half[1] == 0xbbbb)
>>>>> 5: (79) r1 = *(u64 *)(r1 +8)
>>>>> 6: (69) r1 = *(u16 *)(r1 +924)
>>>>
>>>> The load is still from offset 0, 2 bytes with upper 48 bits as 0.
>>>
>>> Yes, this is how narrow loads currently work, right? It is not specific
>>> to the case I'm fixing.
>>>
>>> To give an example - if you do a 1-byte load at offset 1, it will load
>>> the value from offset 0, and shift it right by 1 byte. So it is expected
>>> that the load is always from offset 0 with current implementation.
>>
>> Yes, the load is always from offset 0. The confusion part is
>> it load offset 0 with 2 bytes and then right shifting 2 bytes
>> to get 0...
>
> Right, I see how silly is the generated instruction sequence. I guess
> I've accepted how <prog_type>_convert_ctx_access functions emit loads
> and didn't stop and question this part before.
>
>>> SEC("sk_reuseport/narrow_byte")
>>> int reuseport_narrow_byte(struct sk_reuseport_md *ctx)
>>> {
>>> __u8 *byte;
>>>
>>> byte = (__u8 *)&ctx->ip_protocol;
>>> if (byte[0] == 0xaa)
>>> return SK_DROP;
>>> if (byte[1] == 0xbb)
>>> return SK_DROP;
>>> if (byte[2] == 0xcc)
>>> return SK_DROP;
>>> if (byte[3] == 0xdd)
>>> return SK_DROP;
>>> return SK_PASS;
>>> }
>>>
>>> int reuseport_narrow_byte(struct sk_reuseport_md * ctx):
>>> ; int reuseport_narrow_byte(struct sk_reuseport_md *ctx)
>>> 0: (b4) w0 = 0
>>> ; if (byte[0] == 0xaa)
>>> 1: (79) r2 = *(u64 *)(r1 +8)
>>> 2: (69) r2 = *(u16 *)(r2 +924)
>>> 3: (54) w2 &= 255
>>> ; if (byte[0] == 0xaa)
>>> 4: (16) if w2 == 0xaa goto pc+17
>>> ; if (byte[1] == 0xbb)
>>> 5: (79) r2 = *(u64 *)(r1 +8)
>>> 6: (69) r2 = *(u16 *)(r2 +924)
>>> 7: (74) w2 >>= 8
>>> 8: (54) w2 &= 255
>>> ; if (byte[1] == 0xbb)
>>> 9: (16) if w2 == 0xbb goto pc+12
>>> ; if (byte[2] == 0xcc)
>>> 10: (79) r2 = *(u64 *)(r1 +8)
>>> 11: (69) r2 = *(u16 *)(r2 +924)
>>> 12: (74) w2 >>= 16
>>> 13: (54) w2 &= 255
>>> ; if (byte[2] == 0xcc)
>>> 14: (16) if w2 == 0xcc goto pc+7
>>> ; if (byte[3] == 0xdd)
>>> 15: (79) r1 = *(u64 *)(r1 +8)
>>> 16: (69) r1 = *(u16 *)(r1 +924)
>>> 17: (74) w1 >>= 24
>>> 18: (54) w1 &= 255
>>> 19: (b4) w0 = 1
>>> ; if (byte[3] == 0xdd)
>>> 20: (56) if w1 != 0xdd goto pc+1
>>> 21: (b4) w0 = 0
>>> ; }
>>> 22: (95) exit
>>>
>>>>
>>>>> 7: (74) w1 >>= 16
>>>>
>>>> w1 will be 0 now. so this will work.
>>>>
>>>>> 8: (54) w1 &= 65535
>>>>
>>>> For the above insns 5-8, verifier, based on target information can
>>>> directly generate w1 = 0 since:
>>>> . target kernel field size is 2, ctx field size is 4.
>>>> . user tries to access offset 2 size 2.
>>>>
>>>> Here, we need to decide whether we permits user to do partial read beyond of
>>>> kernel narrow field or not (e.g., this example)? I would
>>>> say yes, but Daniel or Alexei can provide additional comments.
>>>>
>>>> If we allow such accesses, I would like verifier to generate better
>>>> code as I illustrated in the above. This can be implemented in
>>>> verifier itself with target passing additional kernel field size
>>>> to the verifier. The target already passed the ctx field size back
>>>> to the verifier.
>>>
>>> Keep in mind that the BPF user is writing their code under the
>>> assumption that the context field has 4 bytes. IMHO it's reasonable to
>>> expect that I can load 2 bytes at offset of 2 from a 4 byte field.
>>>
>>> Restricting it now to loads below the target field size, which is
>>> unknown to the user, would mean rejecting programs that are working
>>> today. Even if they are getting funny values.
>>>
>>> I think implementing what you suggest is doable without major
>>> changes. We have load size, target field size, and context field size at
>>> hand in convert_ctx_accesses(), so it seems like a matter of adding an
>>> 'if' branch to handle better the case when we know the end result must
>>> be 0. I'll give it a try.
>>
>> Sounds good. The target_size is returned in convert_ctx_access(), which
>> is too late as the verifier already generated load instructions. You need to get
>> it earlier in is_valid_access().
>
> I have a feeling that I'm not following what you have in mind.
>
> True, target_size is only known after convert_ctx_access generated
> instructions. At this point, if we want to optimize the narrow loads
> that must return 0, we can pop however many instructions
> convert_ctx_access appended to insn_buf and emit BPF_MOV32/64_IMM.
>
> However, it sounds a bit more complex than what I hoped for initially,
> so I'm starting to doubt the value. Considering that narrow loads at an
> offset that matches or exceeds target field size must be a corner case,
> if the current "broken" behavior went unnoticed so far.
>
> I'll need to play with the code and see how it turns out. But for the
> moment please consider acking/nacking this one, as a simple way to fix
> the issue targeted at 'bpf' branch and stable kernels.
Ack the current patch as it does fix the problem. See below comments
with a slight change to avoid penalize existing common case like
__u16 proto = ctx->ip_protocol;
>
>>
>>>
>>> But I do want to empahsize that I still think the fix in current form is
>>> correct, or at least not worse than what we have already in place narrow
>>> loads.
>>
>> I did agree that the fix in this patch is correct. It is just that we
>> could do better to fix this problem.
>
> I agree with your sentiment. Sorry if I got too defensive there.
>
>>
>>>
>>>>
>>>>> 9: (b4) w0 = 1
>>>>> ; if (half[1] == 0xbbbb)
>>>>> 10: (56) if w1 != 0xbbbb goto pc+1
>>>>> 11: (b4) w0 = 0
>>>>> ; }
>>>>> 12: (95) exit
>>>>>
>>>>> Fixes: f96da09473b5 ("bpf: simplify narrower ctx access")
>>>>> Signed-off-by: Jakub Sitnicki <jakub@...udflare.com>
>>>>> ---
>>>>> kernel/bpf/verifier.c | 2 +-
>>>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
>>>>> index 94cead5a43e5..1c4d0e24a5a2 100644
>>>>> --- a/kernel/bpf/verifier.c
>>>>> +++ b/kernel/bpf/verifier.c
>>>>> @@ -9760,7 +9760,7 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env)
>>>>> return -EINVAL;
>>>>> }
>>>>> - if (is_narrower_load && size < target_size) {
>>>>> + if (is_narrower_load || size < target_size) {
Maybe
if (is_narrower_load &&
(size < target_size || (off & (size_default - 1))
!= 0)) {
The original patch is any narrow load will do shift/mask, which is not
good. For narrow load, we only need to shift/mask if
- size < target_size or
- the off is not in the field boundary.
I still prefer better xlated codes. But if it is too complex, the above
change is also acceptable. I just do not like generated xlated byte
codes, it is very easy for people to get confused.
>>>>> u8 shift = bpf_ctx_narrow_access_offset(
>>>>> off, size, size_default) * 8;
>>>>> if (ctx_field_size <= 4) {
>>>>>
Powered by blists - more mailing lists