lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 20 Jan 2021 17:30:09 +0100
From:   Toke Høiland-Jørgensen <toke@...hat.com>
To:     Björn Töpel <bjorn.topel@...el.com>,
        Björn Töpel <bjorn.topel@...il.com>,
        ast@...nel.org, daniel@...earbox.net, netdev@...r.kernel.org,
        bpf@...r.kernel.org
Cc:     magnus.karlsson@...el.com, maciej.fijalkowski@...el.com,
        kuba@...nel.org, jonathan.lemon@...il.com, maximmi@...dia.com,
        davem@...emloft.net, hawk@...nel.org, john.fastabend@...il.com,
        ciara.loftus@...el.com, weqaar.a.janjua@...el.com
Subject: Re: [PATCH bpf-next v2 1/8] xdp: restructure redirect actions

Björn Töpel <bjorn.topel@...el.com> writes:

> On 2021-01-20 15:52, Toke Høiland-Jørgensen wrote:
>> Björn Töpel <bjorn.topel@...el.com> writes:
>> 
>>> On 2021-01-20 13:44, Toke Høiland-Jørgensen wrote:
>>>> Björn Töpel <bjorn.topel@...il.com> writes:
>>>>
>>>>> From: Björn Töpel <bjorn.topel@...el.com>
>>>>>
>>>>> The XDP_REDIRECT implementations for maps and non-maps are fairly
>>>>> similar, but obviously need to take different code paths depending on
>>>>> if the target is using a map or not. Today, the redirect targets for
>>>>> XDP either uses a map, or is based on ifindex.
>>>>>
>>>>> Future commits will introduce yet another redirect target via the a
>>>>> new helper, bpf_redirect_xsk(). To pave the way for that, we introduce
>>>>> an explicit redirect type to bpf_redirect_info. This makes the code
>>>>> easier to follow, and makes it easier to add new redirect targets.
>>>>>
>>>>> Further, using an explicit type in bpf_redirect_info has a slight
>>>>> positive performance impact by avoiding a pointer indirection for the
>>>>> map type lookup, and instead use the hot cacheline for
>>>>> bpf_redirect_info.
>>>>>
>>>>> The bpf_redirect_info flags member is not used by XDP, and not
>>>>> read/written any more. The map member is only written to when
>>>>> required/used, and not unconditionally.
>>>>
>>>> I like the simplification. However, the handling of map clearing becomes
>>>> a bit murky with this change:
>>>>
>>>> You're not changing anything in bpf_clear_redirect_map(), and you're
>>>> removing most of the reads and writes of ri->map. Instead,
>>>> bpf_xdp_redirect_map() will store the bpf_dtab_netdev pointer in
>>>> ri->tgt_value, which xdp_do_redirect() will just read and use without
>>>> checking. But if the map element (or the entire map) has been freed in
>>>> the meantime that will be a dangling pointer. I *think* the RCU callback
>>>> in dev_map_delete_elem() and the rcu_barrier() in dev_map_free()
>>>> protects against this, but that is by no means obvious. So confirming
>>>> this, and explaining it in a comment would be good.
>>>>
>>>
>>> Yes, *most* of the READ_ONCE(ri->map) are removed, it's pretty much only
>>> the bpf_redirect_map(), and as you write, the tracepoints.
>>>
>>> The content/element of the map is RCU protected, and actually even the
>>> map will be around until the XDP processing is complete. Note the
>>> synchronize_rcu() followed after all bpf_clear_redirect_map() calls.
>>>
>>> I'll try to make it clearer in the commit message! Thanks for pointing
>>> that out!
>>>
>>>> Also, as far as I can tell after this, ri->map is only used for the
>>>> tracepoint. So how about just storing the map ID and getting rid of the
>>>> READ/WRITE_ONCE() entirely?
>>>>
>>>
>>> ...and the bpf_redirect_map() helper. Don't you think the current
>>> READ_ONCE(ri->map) scheme is more obvious/clear?
>> 
>> Yeah, after your patch we WRITE_ONCE() the pointer in
>> bpf_redirect_map(), but the only place it is actually *read* is in the
>> tracepoint. So the only purpose of bpf_clear_redirect_map() is to ensure
>> that an invalid pointer is not read in the tracepoint function. Which
>> seems a bit excessive when we could just store the map ID for direct use
>> in the tracepoint and get rid of bpf_clear_redirect_map() entirely, no?
>> 
>> Besides, from a UX point of view, having the tracepoint display the map
>> ID even if that map ID is no longer valid seems to me like it makes more
>> sense than just displaying a map ID of 0 and leaving it up to the user
>> to figure out that this is because the map was cleared. I mean, at the
>> time the redirect was made, that *was* the map ID that was used...
>>
>
> Convinced! Getting rid of bpf_clear_redirect_map() would be good! I'll
> take a stab at this for v3!

Cool!

>> Oh, and as you say due to the synchronize_rcu() call in dev_map_free() I
>> think this whole discussion is superfluous anyway, since it can't
>> actually happen that the map gets freed between the setting and reading
>> of ri->map, no?
>>
>
> It can't be free'd but, ri->map can be cleared via
> bpf_clear_redirect_map(). So, between the helper (setting) and the
> tracepoint in xdp_do_redirect() it can be cleared (say if the XDP
> program is swapped out, prior running xdp_do_redirect()).

But xdp_do_redirect() should be called on driver flush before exiting
the NAPI cycle, so how can the XDP program be swapped out?

> Moving to the scheme you suggested, does make the discussion
> superfluous. :-)

Yup, awesome :)

-Toke

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ