[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAM_iQpU4ULPqo60o7CuZqqqdrybkqNd5GNufep57UhBpmMGuPg@mail.gmail.com>
Date: Wed, 16 Dec 2020 21:06:43 -0800
From: Cong Wang <xiyou.wangcong@...il.com>
To: Alexei Starovoitov <alexei.starovoitov@...il.com>
Cc: Daniel Borkmann <daniel@...earbox.net>,
Andrii Nakryiko <andrii.nakryiko@...il.com>,
Networking <netdev@...r.kernel.org>, bpf <bpf@...r.kernel.org>,
Cong Wang <cong.wang@...edance.com>,
Alexei Starovoitov <ast@...nel.org>,
Dongdong Wang <wangdongdong.6@...edance.com>
Subject: Re: [Patch bpf-next v2 2/5] bpf: introduce timeout map
On Tue, Dec 15, 2020 at 6:35 PM Alexei Starovoitov
<alexei.starovoitov@...il.com> wrote:
>
> On Tue, Dec 15, 2020 at 6:10 PM Cong Wang <xiyou.wangcong@...il.com> wrote:
> >
> > Sure, people also implement CT on native hash map too and timeout
> > with user-space timers. ;)
>
> exactly. what's wrong with that?
> Perfectly fine way to do CT.
Seriously? When we have 8 millions of entries in a hash map, it is
definitely seriously wrong to purge entries one by one from user-space.
In case you don't believe me, take a look at what cilium CT GC does,
which is precisely expires entries one by one in user-space:
https://github.com/cilium/cilium/blob/0f57292c0037ee23ba1ca2f9abb113f36a664645/pkg/bpf/map_linux.go#L728
https://github.com/cilium/cilium/blob/master/pkg/maps/ctmap/ctmap.go#L398
and of course what people complained:
https://github.com/cilium/cilium/issues/5048
>
> > > Anything extra can be added on top from user space
> > > which can easily copy with 1 sec granularity.
> >
> > The problem is never about granularity, it is about how efficient we can
> > GC. User-space has to scan the whole table one by one, while the kernel
> > can just do this behind the scene with a much lower overhead.
> >
> > Let's say we arm a timer for each entry in user-space, it requires a syscall
> > and locking buckets each time for each entry. Kernel could do it without
> > any additional syscall and batching. Like I said above, we could have
> > millions of entries, so the overhead would be big in this scenario.
>
> and the user space can pick any other implementation instead
> of trivial entry by entry gc with timer.
Unless they don't have to, right? With timeout implementation in kernel,
user space does not need to invent any wheel.
>
> > > Say the kernel does GC and deletes htab entries.
> > > How user space will know that it's gone? There would need to be
> >
> > By a lookup.
> >
> > > an event sent to user space when entry is being deleted by the kernel.
> > > But then such event will be racy. Instead when timers and expirations
> > > are done by user space everything is in sync.
> >
> > Why there has to be an event?
>
> because when any production worthy implementation moves
> past the prototype stage there is something that user space needs to keep
> as well. Sometimes the bpf map in the kernel is alone.
> But a lot of times there is a user space mirror of the map in c++ or golang
> with the same key where user space keeps extra data.
So... what event does LRU map send when it deletes a different entry
when the map is full?
Thanks.
Powered by blists - more mailing lists