lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Tue, 15 Dec 2020 16:22:21 -0800 From: Cong Wang <xiyou.wangcong@...il.com> To: Daniel Borkmann <daniel@...earbox.net> Cc: Andrii Nakryiko <andrii.nakryiko@...il.com>, Networking <netdev@...r.kernel.org>, bpf <bpf@...r.kernel.org>, Cong Wang <cong.wang@...edance.com>, Alexei Starovoitov <ast@...nel.org>, Dongdong Wang <wangdongdong.6@...edance.com> Subject: Re: [Patch bpf-next v2 2/5] bpf: introduce timeout map On Tue, Dec 15, 2020 at 3:23 PM Daniel Borkmann <daniel@...earbox.net> wrote: > > On 12/15/20 11:03 PM, Andrii Nakryiko wrote: > > On Tue, Dec 15, 2020 at 12:06 PM Cong Wang <xiyou.wangcong@...il.com> wrote: > >> > >> On Tue, Dec 15, 2020 at 11:27 AM Andrii Nakryiko > >> <andrii.nakryiko@...il.com> wrote: > >>> > >>> On Mon, Dec 14, 2020 at 12:17 PM Cong Wang <xiyou.wangcong@...il.com> wrote: > >>>> > >>>> From: Cong Wang <cong.wang@...edance.com> > >>>> > >>>> This borrows the idea from conntrack and will be used for conntrack in > >>>> bpf too. Each element in a timeout map has a user-specified timeout > >>>> in secs, after it expires it will be automatically removed from the map. > [...] > >>>> char key[] __aligned(8); > >>>> }; > >>>> > >>>> @@ -143,6 +151,7 @@ static void htab_init_buckets(struct bpf_htab *htab) > >>>> > >>>> for (i = 0; i < htab->n_buckets; i++) { > >>>> INIT_HLIST_NULLS_HEAD(&htab->buckets[i].head, i); > >>>> + atomic_set(&htab->buckets[i].pending, 0); > >>>> if (htab_use_raw_lock(htab)) { > >>>> raw_spin_lock_init(&htab->buckets[i].raw_lock); > >>>> lockdep_set_class(&htab->buckets[i].raw_lock, > >>>> @@ -431,6 +440,14 @@ static int htab_map_alloc_check(union bpf_attr *attr) > >>>> return 0; > >>>> } > >>>> > >>>> +static void htab_sched_gc(struct bpf_htab *htab, struct bucket *b) > >>>> +{ > >>>> + if (atomic_fetch_or(1, &b->pending)) > >>>> + return; > >>>> + llist_add(&b->gc_node, &htab->gc_list); > >>>> + queue_work(system_unbound_wq, &htab->gc_work); > >>>> +} > >>> > >>> I'm concerned about each bucket being scheduled individually... And > >>> similarly concerned that each instance of TIMEOUT_HASH will do its own > >>> scheduling independently. Can you think about the way to have a > >>> "global" gc/purging logic, and just make sure that buckets that need > >>> processing would be just internally chained together. So the purging > >>> routing would iterate all the scheduled hashmaps, and within each it > >>> will have a linked list of buckets that need processing? And all that > >>> is done just once each GC period. Not N times for N maps or N*M times > >>> for N maps with M buckets in each. > >> > >> Our internal discussion went to the opposite actually, people here argued > >> one work is not sufficient for a hashtable because there would be millions > >> of entries (max_entries, which is also number of buckets). ;) > > > > I was hoping that it's possible to expire elements without iterating > > the entire hash table every single time, only items that need to be > > processed. Hashed timing wheel is one way to do something like this, > > kernel has to solve similar problems with timeouts as well, why not > > taking inspiration there? > > Couldn't this map be coupled with LRU map for example through flag on map > creation so that the different LRU map flavors can be used with it? For BPF > CT use case we do rely on LRU map to purge 'inactive' entries once full. I > wonder if for that case you then still need to schedule a GC at all.. e.g. > if you hit the condition time_after_eq64(now, entry->expires) you'd just > re-link the expired element from the public htab to e.g. the LRU's local > CPU's free/pending-list instead. I doubt we can use size as a limit to kick off GC or LRU, it must be time-based. And in case of idle, there has to be an async GC, right? Thanks.
Powered by blists - more mailing lists