[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20180502023326.zl3cxudmz4nl4slc@ast-mbp>
Date: Tue, 1 May 2018 19:33:28 -0700
From: Alexei Starovoitov <alexei.starovoitov@...il.com>
To: Lorenzo Colitti <lorenzo@...gle.com>
Cc: Chenbo Feng <fengc@...gle.com>, netdev@...r.kernel.org,
Daniel Borkmann <daniel@...earbox.net>,
Joel Fernandes <joelaf@...gle.com>
Subject: Re: Suggestions on iterating eBPF maps
On Wed, May 02, 2018 at 11:05:19AM +0900, Lorenzo Colitti wrote:
> On Sat, Apr 28, 2018 at 10:04 AM, Alexei Starovoitov
> <alexei.starovoitov@...il.com> wrote:
> > Another approach could be to use map-in-map and have almost atomic
> > replace of the whole map with new potentially empty map. The prog
> > can continue using the new map, while user space walks no longer
> > accessed old map.
>
> That sounds like a promising approach. I assume this would be
> functionally equivalent to an approach where there is a map containing
> a boolean that says whether to write to map A or map B? We'd then do
> the following:
>
> 0. Kernel program is writing to map A.
> 1. Userspace pushes config that says to write to map B.
> 2. Kernel program starts to write to map B.
> 3. Userspace scans map A, collecting stats and deleting everything it finds.
>
> One problem with this is: if the effects of #1 are not immediately
> visible to the programs running on all cores, the program could still
> be writing to map A and the deletes in #3 would result in loss of
> data. Are there any guarantees around this? I know that hash map
> writes are atomic, but I'm not aware of any other guarantees here. Are
> there memory barriers around map writes and reads?
>
> In the absence of guarantees, userspace could put a sleep between #1
> and #3 and things would be correct Most Of The Time(TM), but if the
> kernel is busy doing other things that might not be sufficient.
> Thoughts?
if you use map-in-map you don't need extra boolean map.
0. bpf prog can do
inner_map = lookup(map_in_map, key=0);
lookup(inner_map, your_real_key);
1. user space writes into map_in_map[0] <- FD of new map
2. some cpus are using old inner map and some a new
3. user space does sys_membarrier(CMD_GLOBAL) which will do synchronize_sched()
which in CONFIG_PREEMPT_NONE=y servers is the same as synchronize_rcu()
which will guarantee that progs finished.
4. scan old inner map
Powered by blists - more mailing lists