netdev - Re: [PATCH bpf] bpf: Shift and mask loads narrower than context field size

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <0cd71ea3-c529-7007-1f93-4442484834df@fb.com>
Date:   Thu, 16 Jul 2020 10:38:13 -0700
From:   Yonghong Song <yhs@...com>
To:     Jakub Sitnicki <jakub@...udflare.com>
CC:     <bpf@...r.kernel.org>, <netdev@...r.kernel.org>,
        <kernel-team@...udflare.com>, Alexei Starovoitov <ast@...nel.org>,
        Daniel Borkmann <daniel@...earbox.net>
Subject: Re: [PATCH bpf] bpf: Shift and mask loads narrower than context field
 size



On 7/16/20 4:48 AM, Jakub Sitnicki wrote:
> On Wed, Jul 15, 2020 at 10:59 PM CEST, Yonghong Song wrote:
>> On 7/15/20 12:26 PM, Jakub Sitnicki wrote:
>>> On Wed, Jul 15, 2020 at 08:44 AM CEST, Yonghong Song wrote:
>>>> On 7/10/20 10:31 AM, Jakub Sitnicki wrote:
> 
> [...]
> 
>>>>> The "size < target_size" check is left in place to cover the case when a
>>>>> context field is narrower than its target field, even if we might not have
>>>>> such case now. (It would have to be a u32 context field backed by a u64
>>>>> target field, with context fields all being 4-bytes or wider.)
>>>>>
>>>>> Going back to the example, with the fix in place, the upper half load from
>>>>> ctx->ip_protocol yields zero:
>>>>>
>>>>>      int reuseport_narrow_half(struct sk_reuseport_md * ctx):
>>>>>      ; int reuseport_narrow_half(struct sk_reuseport_md *ctx)
>>>>>         0: (b4) w0 = 0
>>>>>      ; if (half[0] == 0xaaaa)
>>>>>         1: (79) r2 = *(u64 *)(r1 +8)
>>>>>         2: (69) r2 = *(u16 *)(r2 +924)
>>>>>         3: (54) w2 &= 65535
>>>>>      ; if (half[0] == 0xaaaa)
>>>>>         4: (16) if w2 == 0xaaaa goto pc+7
>>>>>      ; if (half[1] == 0xbbbb)
>>>>>         5: (79) r1 = *(u64 *)(r1 +8)
>>>>>         6: (69) r1 = *(u16 *)(r1 +924)
>>>>
>>>> The load is still from offset 0, 2 bytes with upper 48 bits as 0.
>>>
>>> Yes, this is how narrow loads currently work, right? It is not specific
>>> to the case I'm fixing.
>>>
>>> To give an example - if you do a 1-byte load at offset 1, it will load
>>> the value from offset 0, and shift it right by 1 byte. So it is expected
>>> that the load is always from offset 0 with current implementation.
>>
>> Yes, the load is always from offset 0. The confusion part is
>> it load offset 0 with 2 bytes and then right shifting 2 bytes
>> to get 0...
> 
> Right, I see how silly is the generated instruction sequence. I guess
> I've accepted how <prog_type>_convert_ctx_access functions emit loads
> and didn't stop and question this part before.
> 
>>> SEC("sk_reuseport/narrow_byte")
>>> int reuseport_narrow_byte(struct sk_reuseport_md *ctx)
>>> {
>>> 	__u8 *byte;
>>>
>>> 	byte = (__u8 *)&ctx->ip_protocol;
>>> 	if (byte[0] == 0xaa)
>>> 		return SK_DROP;
>>> 	if (byte[1] == 0xbb)
>>> 		return SK_DROP;
>>> 	if (byte[2] == 0xcc)
>>> 		return SK_DROP;
>>> 	if (byte[3] == 0xdd)
>>> 		return SK_DROP;
>>> 	return SK_PASS;
>>> }
>>>
>>> int reuseport_narrow_byte(struct sk_reuseport_md * ctx):
>>> ; int reuseport_narrow_byte(struct sk_reuseport_md *ctx)
>>>      0: (b4) w0 = 0
>>> ; if (byte[0] == 0xaa)
>>>      1: (79) r2 = *(u64 *)(r1 +8)
>>>      2: (69) r2 = *(u16 *)(r2 +924)
>>>      3: (54) w2 &= 255
>>> ; if (byte[0] == 0xaa)
>>>      4: (16) if w2 == 0xaa goto pc+17
>>> ; if (byte[1] == 0xbb)
>>>      5: (79) r2 = *(u64 *)(r1 +8)
>>>      6: (69) r2 = *(u16 *)(r2 +924)
>>>      7: (74) w2 >>= 8
>>>      8: (54) w2 &= 255
>>> ; if (byte[1] == 0xbb)
>>>      9: (16) if w2 == 0xbb goto pc+12
>>> ; if (byte[2] == 0xcc)
>>>     10: (79) r2 = *(u64 *)(r1 +8)
>>>     11: (69) r2 = *(u16 *)(r2 +924)
>>>     12: (74) w2 >>= 16
>>>     13: (54) w2 &= 255
>>> ; if (byte[2] == 0xcc)
>>>     14: (16) if w2 == 0xcc goto pc+7
>>> ; if (byte[3] == 0xdd)
>>>     15: (79) r1 = *(u64 *)(r1 +8)
>>>     16: (69) r1 = *(u16 *)(r1 +924)
>>>     17: (74) w1 >>= 24
>>>     18: (54) w1 &= 255
>>>     19: (b4) w0 = 1
>>> ; if (byte[3] == 0xdd)
>>>     20: (56) if w1 != 0xdd goto pc+1
>>>     21: (b4) w0 = 0
>>> ; }
>>>     22: (95) exit
>>>
>>>>
>>>>>         7: (74) w1 >>= 16
>>>>
>>>> w1 will be 0 now. so this will work.
>>>>
>>>>>         8: (54) w1 &= 65535
>>>>
>>>> For the above insns 5-8, verifier, based on target information can
>>>> directly generate w1 = 0 since:
>>>>     . target kernel field size is 2, ctx field size is 4.
>>>>     . user tries to access offset 2 size 2.
>>>>
>>>> Here, we need to decide whether we permits user to do partial read beyond of
>>>> kernel narrow field or not (e.g., this example)? I would
>>>> say yes, but Daniel or Alexei can provide additional comments.
>>>>
>>>> If we allow such accesses, I would like verifier to generate better
>>>> code as I illustrated in the above. This can be implemented in
>>>> verifier itself with target passing additional kernel field size
>>>> to the verifier. The target already passed the ctx field size back
>>>> to the verifier.
>>>
>>> Keep in mind that the BPF user is writing their code under the
>>> assumption that the context field has 4 bytes. IMHO it's reasonable to
>>> expect that I can load 2 bytes at offset of 2 from a 4 byte field.
>>>
>>> Restricting it now to loads below the target field size, which is
>>> unknown to the user, would mean rejecting programs that are working
>>> today. Even if they are getting funny values.
>>>
>>> I think implementing what you suggest is doable without major
>>> changes. We have load size, target field size, and context field size at
>>> hand in convert_ctx_accesses(), so it seems like a matter of adding an
>>> 'if' branch to handle better the case when we know the end result must
>>> be 0. I'll give it a try.
>>
>> Sounds good. The target_size is returned in convert_ctx_access(), which
>> is too late as the verifier already generated load instructions. You need to get
>> it earlier in is_valid_access().
> 
> I have a feeling that I'm not following what you have in mind.
> 
> True, target_size is only known after convert_ctx_access generated
> instructions. At this point, if we want to optimize the narrow loads
> that must return 0, we can pop however many instructions
> convert_ctx_access appended to insn_buf and emit BPF_MOV32/64_IMM.
> 
> However, it sounds a bit more complex than what I hoped for initially,
> so I'm starting to doubt the value. Considering that narrow loads at an
> offset that matches or exceeds target field size must be a corner case,
> if the current "broken" behavior went unnoticed so far.
> 
> I'll need to play with the code and see how it turns out. But for the
> moment please consider acking/nacking this one, as a simple way to fix
> the issue targeted at 'bpf' branch and stable kernels.

Ack the current patch as it does fix the problem. See below comments
with a slight change to avoid penalize existing common case like
    __u16 proto = ctx->ip_protocol;

> 
>>
>>>
>>> But I do want to empahsize that I still think the fix in current form is
>>> correct, or at least not worse than what we have already in place narrow
>>> loads.
>>
>> I did agree that the fix in this patch is correct. It is just that we
>> could do better to fix this problem.
> 
> I agree with your sentiment. Sorry if I got too defensive there.
> 
>>
>>>
>>>>
>>>>>         9: (b4) w0 = 1
>>>>>      ; if (half[1] == 0xbbbb)
>>>>>        10: (56) if w1 != 0xbbbb goto pc+1
>>>>>        11: (b4) w0 = 0
>>>>>      ; }
>>>>>        12: (95) exit
>>>>>
>>>>> Fixes: f96da09473b5 ("bpf: simplify narrower ctx access")
>>>>> Signed-off-by: Jakub Sitnicki <jakub@...udflare.com>
>>>>> ---
>>>>>     kernel/bpf/verifier.c | 2 +-
>>>>>     1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
>>>>> index 94cead5a43e5..1c4d0e24a5a2 100644
>>>>> --- a/kernel/bpf/verifier.c
>>>>> +++ b/kernel/bpf/verifier.c
>>>>> @@ -9760,7 +9760,7 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env)
>>>>>     			return -EINVAL;
>>>>>     		}
>>>>>     -		if (is_narrower_load && size < target_size) {
>>>>> +		if (is_narrower_load || size < target_size) {

Maybe
                 if (is_narrower_load &&
                     (size < target_size || (off & (size_default - 1)) 
!= 0)) {

The original patch is any narrow load will do shift/mask, which is not 
good. For narrow load, we only need to shift/mask if
   - size < target_size or
   - the off is not in the field boundary.

I still prefer better xlated codes. But if it is too complex, the above
change is also acceptable. I just do not like generated xlated byte 
codes, it is very easy for people to get confused.


>>>>>     			u8 shift = bpf_ctx_narrow_access_offset(
>>>>>     				off, size, size_default) * 8;
>>>>>     			if (ctx_field_size <= 4) {
>>>>>