netdev - Re: [PATCH net-next] bpf: expose sk_priority through struct bpf_sock

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:   Sun, 12 Nov 2017 06:38:21 +0800
From:   Alexei Starovoitov <ast@...com>
To:     Daniel Borkmann <daniel@...earbox.net>,
        Vlad Dumitrescu <vlad@...itrescu.ro>, <davem@...emloft.net>,
        <brakmo@...com>
CC:     <netdev@...r.kernel.org>, <kraigatgoog@...il.com>
Subject: Re: [PATCH net-next] bpf: expose sk_priority through struct
 bpf_sock_ops

On 11/12/17 4:46 AM, Daniel Borkmann wrote:
> On 11/11/2017 05:06 AM, Alexei Starovoitov wrote:
>> On 11/11/17 6:07 AM, Daniel Borkmann wrote:
>>> On 11/10/2017 08:17 PM, Vlad Dumitrescu wrote:
>>>> From: Vlad Dumitrescu <vladum@...gle.com>
>>>>
>>>> Allows BPF_PROG_TYPE_SOCK_OPS programs to read sk_priority.
>>>>
>>>> Signed-off-by: Vlad Dumitrescu <vladum@...gle.com>
>>>> ---
>>>>   include/uapi/linux/bpf.h       |  1 +
>>>>   net/core/filter.c              | 11 +++++++++++
>>>>   tools/include/uapi/linux/bpf.h |  1 +
>>>>   3 files changed, 13 insertions(+)
>>>>
>>>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>>>> index e880ae6434ee..9757a2002513 100644
>>>> --- a/include/uapi/linux/bpf.h
>>>> +++ b/include/uapi/linux/bpf.h
>>>> @@ -947,6 +947,7 @@ struct bpf_sock_ops {
>>>>       __u32 local_ip6[4];    /* Stored in network byte order */
>>>>       __u32 remote_port;    /* Stored in network byte order */
>>>>       __u32 local_port;    /* stored in host byte order */
>>>> +    __u32 priority;
>>>>   };
>>>>     /* List of known BPF sock_ops operators.
>>>> diff --git a/net/core/filter.c b/net/core/filter.c
>>>> index 61c791f9f628..a6329642d047 100644
>>>> --- a/net/core/filter.c
>>>> +++ b/net/core/filter.c
>>>> @@ -4449,6 +4449,17 @@ static u32 sock_ops_convert_ctx_access(enum
>>>> bpf_access_type type,
>>>>           *insn++ = BPF_LDX_MEM(BPF_H, si->dst_reg, si->dst_reg,
>>>>                         offsetof(struct sock_common, skc_num));
>>>>           break;
>>>> +
>>>> +    case offsetof(struct bpf_sock_ops, priority):
>>>> +        BUILD_BUG_ON(FIELD_SIZEOF(struct sock, sk_priority) != 4);
>>>> +
>>>> +        *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(
>>>> +                        struct bpf_sock_ops_kern, sk),
>>>> +                      si->dst_reg, si->src_reg,
>>>> +                      offsetof(struct bpf_sock_ops_kern, sk));
>>>> +        *insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->dst_reg,
>>>> +                      offsetof(struct sock, sk_priority));
>>>> +        break;
>>>
>>> Hm, I don't think this would work, I actually think your initial patch
>>> was ok.
>>> bpf_setsockopt() as well as bpf_getsockopt() check for sk_fullsock(sk)
>>> right
>>> before accessing options on either socket or TCP level, and bail out
>>> with error
>>> otherwise; in such cases we'd read something else here and assume it's
>>> sk_priority.
>>
>> even if it's not fullsock, it will just read zero, no? what's a problem
>> with that?
>> In non-fullsock hooks like BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB
>> the program author will know that it's meaningless to read sk_priority,
>> so returning zero with minimal checks is fine.
>> While adding extra runtime if (sk_fullsock(sk)) is unnecessary,
>> since the safety is not compromised.
>
> Hm, on my kernel, struct sock has the 4 bytes sk_priority at offset 440,
> struct request_sock itself is only 232 byte long in total, and the struct
> inet_timewait_sock is 208 byte long, so you'd be accessing out of bounds
> that way, so it cannot be ignored and assumed zero.

I thought we always pass fully allocated sock but technically not 
fullsock yet. My mistake. We do: tcp_timeout_init((struct sock *)req))
so yeah ctx rewrite approach won't work.
Let's go back to access via helper.