netdev - Re: Suggestions on iterating eBPF maps

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20180502023326.zl3cxudmz4nl4slc@ast-mbp>
Date:   Tue, 1 May 2018 19:33:28 -0700
From:   Alexei Starovoitov <alexei.starovoitov@...il.com>
To:     Lorenzo Colitti <lorenzo@...gle.com>
Cc:     Chenbo Feng <fengc@...gle.com>, netdev@...r.kernel.org,
        Daniel Borkmann <daniel@...earbox.net>,
        Joel Fernandes <joelaf@...gle.com>
Subject: Re: Suggestions on iterating eBPF maps

On Wed, May 02, 2018 at 11:05:19AM +0900, Lorenzo Colitti wrote:
> On Sat, Apr 28, 2018 at 10:04 AM, Alexei Starovoitov
> <alexei.starovoitov@...il.com> wrote:
> > Another approach could be to use map-in-map and have almost atomic
> > replace of the whole map with new potentially empty map. The prog
> > can continue using the new map, while user space walks no longer
> > accessed old map.
> 
> That sounds like a promising approach. I assume this would be
> functionally equivalent to an approach where there is a map containing
> a boolean that says whether to write to map A or map B? We'd then do
> the following:
> 
> 0. Kernel program is writing to map A.
> 1. Userspace pushes config that says to write to map B.
> 2. Kernel program starts to write to map B.
> 3. Userspace scans map A, collecting stats and deleting everything it finds.
> 
> One problem with this is: if the effects of #1 are not immediately
> visible to the programs running on all cores, the program could still
> be writing to map A and the deletes in #3 would result in loss of
> data. Are there any guarantees around this? I know that hash map
> writes are atomic, but I'm not aware of any other guarantees here. Are
> there memory barriers around map writes and reads?
> 
> In the absence of guarantees, userspace could put a sleep between #1
> and #3 and things would be correct Most Of The Time(TM), but if the
> kernel is busy doing other things that might not be sufficient.
> Thoughts?

if you use map-in-map you don't need extra boolean map.
0. bpf prog can do
   inner_map = lookup(map_in_map, key=0);
   lookup(inner_map, your_real_key);
1. user space writes into map_in_map[0] <- FD of new map
2. some cpus are using old inner map and some a new
3. user space does sys_membarrier(CMD_GLOBAL) which will do synchronize_sched()
   which in CONFIG_PREEMPT_NONE=y servers is the same as synchronize_rcu()
   which will guarantee that progs finished.
4. scan old inner map