[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKOZueu5VdzCEkEv_nR4D0CtYi5r9XOdKOfpKvYWDGPnQGqMHQ@mail.gmail.com>
Date: Tue, 31 Jul 2018 02:36:39 -0700
From: Daniel Colascione <dancol@...gle.com>
To: Daniel Borkmann <daniel@...earbox.net>
Cc: Jakub Kicinski <jakub.kicinski@...ronome.com>,
Joel Fernandes <joelaf@...gle.com>,
linux-kernel <linux-kernel@...r.kernel.org>,
Tim Murray <timmurray@...gle.com>,
netdev <netdev@...r.kernel.org>,
Alexei Starovoitov <alexei.starovoitov@...il.com>,
Lorenzo Colitti <lorenzo@...gle.com>,
Chenbo Feng <fengc@...gle.com>,
Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
Alexei Starovoitov <ast@...com>
Subject: Re: [PATCH v3] Add BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES bpf(2) command
On Tue, Jul 31, 2018 at 1:34 AM, Daniel Borkmann <daniel@...earbox.net> wrote:
> On 07/31/2018 02:33 AM, Daniel Colascione wrote:
>> On Mon, Jul 30, 2018 at 5:26 PM, Jakub Kicinski
>> <jakub.kicinski@...ronome.com> wrote:
>>> On Mon, 30 Jul 2018 03:25:43 -0700, Daniel Colascione wrote:
>>>> On Mon, Jul 30, 2018 at 3:04 AM, Daniel Borkmann <daniel@...earbox.net> wrote:
>>>>> Hmm, I don't think such UAPI as above is future-proof. In case we would want
>>>>> a similar mechanism in future for other maps, we would need a whole new bpf
>>>>> command or reuse BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES as a workaround though
>>>>> the underlying map may not even be a map-to-map. Additionally, we don't have
>>>>> any map object at hand in the above, so we couldn't make any finer grained
>>>>> decisions either. Something like below would be more suitable and leaves room
>>>>> for extending this further in future.
>>>>
>>>> YAGNI. Your proposed mechanism doesn't add anything under the current
>>>> implementation.
>>>
>>> FWIW in case of HW offload targeting a particular map may allow users
>>> to avoid a potentially slow sync with all the devices on the system.
>>
>> Sure. But such a thing doesn't exist right now (right?), and we can
>> add that more-efficient-in-that-one-case BPF interface when it lands.
>> I'd rather keep things simple for now.
>
> I don't see a reason why that is even more complicated.
Both the API and the implementation are much more complicated in the
per-map ops version: just look at the patch size. The size argument
isn't necessarily a dealbreaker, but I still don't see what the extra
code size and complexity is buying.
> An API command name
> such as BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES is simply non-generic, and
> exposes specific map details (here: map-in-map) into the UAPI whereas it
> should reside within a specific implementation instead similar to other ops
> we have for maps.
But synchronize isn't conceptually a command that applies to a
specific map. It waits on all references. Did you address my point
about your proposed map-specific interface requiring redundant
synchronize_rcu calls in the case where we swap multiple maps and want
to wait for all the references to drain? Under my proposal, you'd just
BPF_SYNCHRONIZE_WHATEVER and call schedule_rcu once. Under your
proposal, we'd make it a per-map operation, so we'd issue one
synchronize_rcu per map.
> If in future other maps would be added that would have
> similar mechanisms of inner objects they return to the BPF program, we'll
> be adding yet another command just for this.
And that's why my personal preference is to just calling this thing
BPF_SYNCHRONIZE, which I'd define to wait for all such "inner
objects". Alexei is the one who asked for the very specific naming, I
believe.
Anyway, we have a very simple patch that we could apply today. It
addresses a real need, and it doesn't preclude adding something more
specific later, when we know we need it. Besides, it's not as if
adding a BPF command is particularly expensive.
> Also, union bpf_attr is extensible,
> e.g. additional members could be added in future whenever needed for this
> subcommand instead of forcing it to NULL as done here.
We fail with EINVAL when attr != NULL now, which means that we can
safely accept a non-NULL attr-based subcommand later without breaking
anyone. The interface is already extensible.
> All I'm saying is to
> keep it generic so it can be extended later.
Sure, but no more extensible than it has to be. Prematurely-added
extension points tend to cause trouble later.
Powered by blists - more mailing lists