netdev - Re: handling unsupported optlen in cgroup bpf getsockopt: (was [PATCH net-next v4 2/4] net: socket: add sockopts blacklist for BPF cgroup hook)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <b453462a-3d98-8d0f-9cc0-543032de5a5f@gmail.com>
Date:   Tue, 25 Apr 2023 10:59:36 -0700
From:   Kui-Feng Lee <sinquersw@...il.com>
To:     Stanislav Fomichev <sdf@...gle.com>,
        Martin KaFai Lau <martin.lau@...ux.dev>
Cc:     Eric Dumazet <edumazet@...gle.com>, davem@...emloft.net,
        linux-kernel@...r.kernel.org, netdev@...r.kernel.org,
        daniel@...earbox.net, Jakub Kicinski <kuba@...nel.org>,
        Paolo Abeni <pabeni@...hat.com>,
        Leon Romanovsky <leon@...nel.org>,
        David Ahern <dsahern@...nel.org>,
        Arnd Bergmann <arnd@...db.de>,
        Kees Cook <keescook@...omium.org>,
        Christian Brauner <brauner@...nel.org>,
        Kuniyuki Iwashima <kuniyu@...zon.com>,
        Lennart Poettering <mzxreary@...inter.de>,
        linux-arch@...r.kernel.org,
        Aleksandr Mikhalitsyn <aleksandr.mikhalitsyn@...onical.com>,
        bpf <bpf@...r.kernel.org>
Subject: Re: handling unsupported optlen in cgroup bpf getsockopt: (was [PATCH
 net-next v4 2/4] net: socket: add sockopts blacklist for BPF cgroup hook)



On 4/18/23 09:47, Stanislav Fomichev wrote:
> On 04/17, Martin KaFai Lau wrote:
>> On 4/14/23 6:55 PM, Stanislav Fomichev wrote:
>>> On 04/13, Stanislav Fomichev wrote:
>>>> On Thu, Apr 13, 2023 at 7:38 AM Aleksandr Mikhalitsyn
>>>> <aleksandr.mikhalitsyn@...onical.com> wrote:
>>>>>
>>>>> On Thu, Apr 13, 2023 at 4:22 PM Eric Dumazet <edumazet@...gle.com> wrote:
>>>>>>
>>>>>> On Thu, Apr 13, 2023 at 3:35 PM Alexander Mikhalitsyn
>>>>>> <aleksandr.mikhalitsyn@...onical.com> wrote:
>>>>>>>
>>>>>>> During work on SO_PEERPIDFD, it was discovered (thanks to Christian),
>>>>>>> that bpf cgroup hook can cause FD leaks when used with sockopts which
>>>>>>> install FDs into the process fdtable.
>>>>>>>
>>>>>>> After some offlist discussion it was proposed to add a blacklist of
>>>>>>
>>>>>> We try to replace this word by either denylist or blocklist, even in changelogs.
>>>>>
>>>>> Hi Eric,
>>>>>
>>>>> Oh, I'm sorry about that. :( Sure.
>>>>>
>>>>>>
>>>>>>> socket options those can cause troubles when BPF cgroup hook is enabled.
>>>>>>>
>>>>>>
>>>>>> Can we find the appropriate Fixes: tag to help stable teams ?
>>>>>
>>>>> Sure, I will add next time.
>>>>>
>>>>> Fixes: 0d01da6afc54 ("bpf: implement getsockopt and setsockopt hooks")
>>>>>
>>>>> I think it's better to add Stanislav Fomichev to CC.
>>>>
>>>> Can we use 'struct proto' bpf_bypass_getsockopt instead? We already
>>>> use it for tcp zerocopy, I'm assuming it should work in this case as
>>>> well?
>>>
>>> Jakub reminded me of the other things I wanted to ask here bug forgot:
>>>
>>> - setsockopt is probably not needed, right? setsockopt hook triggers
>>>     before the kernel and shouldn't leak anything
>>> - for getsockopt, instead of bypassing bpf completely, should we instead
>>>     ignore the error from the bpf program? that would still preserve
>>>     the observability aspect
>>
>> stealing this thread to discuss the optlen issue which may make sense to
>> bypass also.
>>
>> There has been issue with optlen. Other than this older post related to
>> optlen > PAGE_SIZE:
>> https://lore.kernel.org/bpf/5c8b7d59-1f28-2284-f7b9-49d946f2e982@linux.dev/,
>> the recent one related to optlen that we have seen is
>> NETLINK_LIST_MEMBERSHIPS. The userspace passed in optlen == 0 and the kernel
>> put the expected optlen (> 0) and 'return 0;' to userspace. The userspace
>> intention is to learn the expected optlen. This makes 'ctx.optlen >
>> max_optlen' and __cgroup_bpf_run_filter_getsockopt() ends up returning
>> -EFAULT to the userspace even the bpf prog has not changed anything.
> 
> (ignoring -EFAULT issue) this seems like it needs to be
> 
> 	if (optval && (ctx.optlen > max_optlen || ctx.optlen < 0)) {
> 		/* error */
> 	}
> 
> ?
> 
>> Does it make sense to also bypass the bpf prog when 'ctx.optlen >
>> max_optlen' for now (and this can use a separate patch which as usual
>> requires a bpf selftests)?
> 
> Yeah, makes sense. Replacing this -EFAULT with WARN_ON_ONCE or something
> seems like the way to go. It caused too much trouble already :-(
> 
> Should I prepare a patch or do you want to take a stab at it?
> 
>> In the future, does it make sense to have a specific cgroup-bpf-prog (a
>> specific attach type?) that only uses bpf_dynptr kfunc to access the optval
>> such that it can enforce read-only for some optname and potentially also
>> track if bpf-prog has written a new optval? The bpf-prog can only return 1
>> (OK) and only allows using bpf_set_retval() instead. Likely there is still
>> holes but could be a seed of thought to continue polishing the idea.
> 
> Ack, let's think about it.
> 
> Maybe we should re-evaluate 'getsockopt-happens-after-the-kernel' idea
> as well? If we can have a sleepable hook that can copy_from_user/copy_to_user,
> and we have a mostly working bpf_getsockopt (after your refactoring),
> I don't see why we need to continue the current scheme of triggering
> after the kernel?

Since a sleepable hook would cause some restrictions, perhaps, we could 
introduce something like the promise pattern.  In our case here, BPF 
program call an async version of copy_from_user()/copy_to_user() to 
return a promise.

> 
>>> - or maybe we can even have a per-proto bpf_getsockopt_cleanup call that
>>>     gets called whenever bpf returns an error to make sure protocols have
>>>     a chance to handle that condition (and free the fd)
>>>
>>
>>