netdev - Re: [PATCH net-next] bpf: expose sk_priority through struct bpf_sock

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4997dac4-c80a-8f99-c053-ee9499cf5caf@iogearbox.net>
Date:   Mon, 13 Nov 2017 21:20:45 +0100
From:   Daniel Borkmann <daniel@...earbox.net>
To:     Lawrence Brakmo <brakmo@...com>,
        Vlad Dumitrescu <vlad@...itrescu.ro>,
        Alexei Starovoitov <ast@...com>,
        "davem@...emloft.net" <davem@...emloft.net>
Cc:     "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        Craig Gallek <kraigatgoog@...il.com>
Subject: Re: [PATCH net-next] bpf: expose sk_priority through struct
 bpf_sock_ops

On 11/13/2017 09:09 PM, Lawrence Brakmo wrote:
> On 11/13/17, 11:01 AM, "Vlad Dumitrescu" <vlad@...itrescu.ro> wrote:
> 
>     On Sat, Nov 11, 2017 at 2:38 PM, Alexei Starovoitov <ast@...com> wrote:
>     >
>     > On 11/12/17 4:46 AM, Daniel Borkmann wrote:
>     >>
>     >> On 11/11/2017 05:06 AM, Alexei Starovoitov wrote:
>     >>>
>     >>> On 11/11/17 6:07 AM, Daniel Borkmann wrote:
>     >>>>
>     >>>> On 11/10/2017 08:17 PM, Vlad Dumitrescu wrote:
>     >>>>>
>     >>>>> From: Vlad Dumitrescu <vladum@...gle.com>
>     >>>>>
>     >>>>> Allows BPF_PROG_TYPE_SOCK_OPS programs to read sk_priority.
>     >>>>>
>     >>>>> Signed-off-by: Vlad Dumitrescu <vladum@...gle.com>
>     >>>>> ---
>     >>>>>   include/uapi/linux/bpf.h       |  1 +
>     >>>>>   net/core/filter.c              | 11 +++++++++++
>     >>>>>   tools/include/uapi/linux/bpf.h |  1 +
>     >>>>>   3 files changed, 13 insertions(+)
>     >>>>>
>     >>>>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>     >>>>> index e880ae6434ee..9757a2002513 100644
>     >>>>> --- a/include/uapi/linux/bpf.h
>     >>>>> +++ b/include/uapi/linux/bpf.h
>     >>>>> @@ -947,6 +947,7 @@ struct bpf_sock_ops {
>     >>>>>       __u32 local_ip6[4];    /* Stored in network byte order */
>     >>>>>       __u32 remote_port;    /* Stored in network byte order */
>     >>>>>       __u32 local_port;    /* stored in host byte order */
>     >>>>> +    __u32 priority;
>     >>>>>   };
>     >>>>>     /* List of known BPF sock_ops operators.
>     >>>>> diff --git a/net/core/filter.c b/net/core/filter.c
>     >>>>> index 61c791f9f628..a6329642d047 100644
>     >>>>> --- a/net/core/filter.c
>     >>>>> +++ b/net/core/filter.c
>     >>>>> @@ -4449,6 +4449,17 @@ static u32 sock_ops_convert_ctx_access(enum
>     >>>>> bpf_access_type type,
>     >>>>>           *insn++ = BPF_LDX_MEM(BPF_H, si->dst_reg, si->dst_reg,
>     >>>>>                         offsetof(struct sock_common, skc_num));
>     >>>>>           break;
>     >>>>> +
>     >>>>> +    case offsetof(struct bpf_sock_ops, priority):
>     >>>>> +        BUILD_BUG_ON(FIELD_SIZEOF(struct sock, sk_priority) != 4);
>     >>>>> +
>     >>>>> +        *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(
>     >>>>> +                        struct bpf_sock_ops_kern, sk),
>     >>>>> +                      si->dst_reg, si->src_reg,
>     >>>>> +                      offsetof(struct bpf_sock_ops_kern, sk));
>     >>>>> +        *insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->dst_reg,
>     >>>>> +                      offsetof(struct sock, sk_priority));
>     >>>>> +        break;
>     >>>>
>     >>>>
>     >>>> Hm, I don't think this would work, I actually think your initial patch
>     >>>> was ok.
>     >>>> bpf_setsockopt() as well as bpf_getsockopt() check for sk_fullsock(sk)
>     >>>> right
>     >>>> before accessing options on either socket or TCP level, and bail out
>     >>>> with error
>     >>>> otherwise; in such cases we'd read something else here and assume it's
>     >>>> sk_priority.
>     >>>
>     >>>
>     >>> even if it's not fullsock, it will just read zero, no? what's a problem
>     >>> with that?
>     >>> In non-fullsock hooks like BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB
>     >>> the program author will know that it's meaningless to read sk_priority,
>     >>> so returning zero with minimal checks is fine.
>     >>> While adding extra runtime if (sk_fullsock(sk)) is unnecessary,
>     >>> since the safety is not compromised.
>     >>
>     >>
>     >> Hm, on my kernel, struct sock has the 4 bytes sk_priority at offset 440,
>     >> struct request_sock itself is only 232 byte long in total, and the struct
>     >> inet_timewait_sock is 208 byte long, so you'd be accessing out of bounds
>     >> that way, so it cannot be ignored and assumed zero.
>     >
>     >
>     > I thought we always pass fully allocated sock but technically not fullsock yet. My mistake. We do: tcp_timeout_init((struct sock *)req))
>     > so yeah ctx rewrite approach won't work.
>     > Let's go back to access via helper.
>     >
>     
>     TIL. Thanks!
>     
>     Is there anything else needed from me to get the helper approach accepted?
> 
> I plan to add access to TCP state variables (cwnd, rtt, etc.) and I have been thinking
> about this issue. I think it is possible to access it directly as long as we use a value
> like 0xffffffff to represent an invalid value (e.g. not fullsock). The ctx rewrite just
> needs to add a conditional to determine what to return. I would probably add a
> field into the internal kernel struct to indicate if it is fullsock or not (we should
> know when we call tcp_call_bpf whether it is a fullsock or not based on context).
> 
> Let me do a sample patch that I can send for review and get feedback from
> Alexi and Daniel.

Agree, if the mov op from the ctx rewrite to read(/write) a sk member, for
example, is just a BPF_W, then we know upper reg bits are zero anyway for the
success case, so we might be able to utilize this for writing a signed error
back to the user if !fullsk.