[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <93f4eef1-959c-959d-320c-7d7112863c6d@gmail.com>
Date: Fri, 16 Mar 2018 09:47:05 -0700
From: John Fastabend <john.fastabend@...il.com>
To: Daniel Borkmann <daniel@...earbox.net>,
Alexei Starovoitov <alexei.starovoitov@...il.com>
Cc: davem@...emloft.net, ast@...nel.org, davejwatson@...com,
netdev@...r.kernel.org
Subject: Re: [bpf-next PATCH v2 05/18] bpf: create tcp_bpf_ulp allowing BPF to
monitor socket TX/RX data
On 03/15/2018 05:37 PM, Daniel Borkmann wrote:
> On 03/16/2018 12:06 AM, Alexei Starovoitov wrote:
>> On Thu, Mar 15, 2018 at 11:55:39PM +0100, Daniel Borkmann wrote:
>>> On 03/15/2018 11:20 PM, Alexei Starovoitov wrote:
>>>> On Thu, Mar 15, 2018 at 11:17:12PM +0100, Daniel Borkmann wrote:
>>>>> On 03/15/2018 10:59 PM, Alexei Starovoitov wrote:
>>>>>> On Mon, Mar 12, 2018 at 12:23:29PM -0700, John Fastabend wrote:
>>>>>>>
>>>>>>> +/* User return codes for SK_MSG prog type. */
>>>>>>> +enum sk_msg_action {
>>>>>>> + SK_MSG_DROP = 0,
>>>>>>> + SK_MSG_PASS,
>>>>>>> +};
>>>>>>
>>>>>> do we really need new enum here?
>>>>>> It's the same as 'enum sk_action' and SK_DROP == SK_MSG_DROP
>>>>>> and there will be only drop/pass in both enums.
>>>>>> Also I don't see where these two new SK_MSG_* are used...
>>>>>>
>>>>>>> +
>>>>>>> +/* user accessible metadata for SK_MSG packet hook, new fields must
>>>>>>> + * be added to the end of this structure
>>>>>>> + */
>>>>>>> +struct sk_msg_md {
>>>>>>> + __u32 data;
>>>>>>> + __u32 data_end;
>>>>>>> +};
>>>>>>
>>>>>> I think it's time for me to ask for forgiveness :)
>>>>>
>>>>> :-)
>>>>>
>>>>>> I used __u32 for data and data_end only because all other fields
>>>>>> in __sk_buff were __u32 at the time and I couldn't easily figure out
>>>>>> how to teach verifier to recognize 8-byte rewrites.
>>>>>> Unfortunately my mistake stuck and was copied over into xdp.
>>>>>> Since this is new struct let's do it right and add
>>>>>> 'void *data, *data_end' here,
>>>>>> since bpf prog will use them as 'void *' pointers.
>>>>>> There are no compat issues here, since bpf is always 64-bit.
>>>>>
>>>>> But at least offset-wise when you do the ctx rewrite this would then
>>>>> be a bit more tricky when you have 64 bit kernel with 32 bit user
>>>>> space since void * members are in each cases at different offset. So
>>>>> unless I'm missing something, this still should either be __u32 or
>>>>> __u64 instead of void *, no?
>>>>
>>>> there is no 32-bit user space. these structs are seen by bpf progs only
>>>> and bpf is 64-bit only too.
>>>> unless I'm missing your point.
>>>
>>> Ok, so lets say you have 32 bit LLVM binary and compile the prog where
>>> you access md->data_end. Given the void * in the struct will that access
>>> end up being BPF_W at ctx offset 4 or BPF_DW at ctx offset 8 from clang
>>> perspective (iow, is the back end treating this special and always use
>>> fixed BPF_DW in such case)? If not and it would be the first case with
>>> offset 4, then we could have the case that underlying 64 bit kernel is
>>> expecting ctx offset 8 for doing the md ctx conversion.
>>
>> i'm still not quite following.
>> Whether llvm itself is 32-bit binary or it's arm32 or sprac32 binary
>> doesn't matter. It will produce the same 64-bit bpf code.
>> It will see 'void *' deref from this struct and will emit DW.
>> May be confusion is from newly added -mattr=+alu32 flag?
>> That option doesn't change that sizeof(void*)==8.
>> It only allows backend to emit 32-bit alu insns.
>
> Ok, so conclusion we had is that while BPF target is unconditionally 64 bit,
> it depends which clang front end you use for compilation wrt structs. E.g.
> on 32 bit native (e.g. arm) clang front end it would compile the ctx void *
> pointers as 4 byte while using clang -target bpf it would compile it as 8
> byte. The native clang front end is needed in case of tracing when accessing
> pt_regs for walking data structures, but not for networking use case, so
> always using -target bpf there is proper way. Meaning there would be no
> confusion on the void * since size will always be 8 regardless of underlying
> arch being 32 or 64 bit or clang/llvm binary being 32 bit on 64 bit kernel.
> Thus, sticking to void * would be fine, but definitely samples/sockmap/Makefile
> must be fixed as well, such that people don't copy it wrongly.
>
> Cheers,
> Danie
I'll send a fix for sockmap/Makefile then as a separate series. And
go ahead and change this series to use 'void *'.
Thanks for the follow-up on this.
Powered by blists - more mailing lists