[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAM0EoM=2ObA1yrasNWRFoSzB+JZ0su2TKrXH-D0k+Pth=aOUxg@mail.gmail.com>
Date: Mon, 9 Oct 2023 10:02:11 -0400
From: Jamal Hadi Salim <jhs@...atatu.com>
To: Vlad Buslov <vladbu@...dia.com>
Cc: netdev@...r.kernel.org, deb.chatterjee@...el.com, anjali.singhai@...el.com,
namrata.limaye@...el.com, tom@...anda.io, mleitner@...hat.com,
Mahesh.Shirshyad@....com, tomasz.osinski@...el.com, jiri@...nulli.us,
xiyou.wangcong@...il.com, davem@...emloft.net, edumazet@...gle.com,
kuba@...nel.org, pabeni@...hat.com, horms@...nel.org, kernel@...atatu.com,
khalidm@...dia.com, toke@...hat.com, mattyk@...dia.com
Subject: Re: [PATCH RFC v6 net-next 13/17] p4tc: add table entry create,
update, get, delete, flush and dump
Hi Vlad,
On Sun, Oct 8, 2023 at 12:36 PM Vlad Buslov <vladbu@...dia.com> wrote:
>
> On Sat 30 Sep 2023 at 10:35, Jamal Hadi Salim <jhs@...atatu.com> wrote:
[..trimmed...]
> > +/* Invoked from both control and data path */
> > +static int __p4tc_table_entry_update(struct p4tc_pipeline *pipeline,
> > + struct p4tc_table *table,
> > + struct p4tc_table_entry *entry,
> > + struct p4tc_table_entry_mask *mask,
> > + u16 whodunnit, bool from_control)
> > +__must_hold(RCU)
> > +{
> > + struct p4tc_table_entry_mask *mask_found = NULL;
> > + struct p4tc_table_entry_work *entry_work;
> > + struct p4tc_table_entry_value *value_old;
> > + struct p4tc_table_entry_value *value;
> > + struct p4tc_table_entry *entry_old;
> > + struct p4tc_table_entry_tm *tm_old;
> > + struct p4tc_table_entry_tm *tm;
> > + int ret;
> > +
> > + value = p4tc_table_entry_value(entry);
> > + /* We set it to zero on create an update to avoid having entry
> > + * deletion in parallel before we report to user space.
> > + */
> > + refcount_set(&value->entries_ref, 0);
>
> TBH I already commented on one of the previous versions of this series
> that it is very hard to understand and review tons of different atomic
> reference counters, especially when they are modified with functions
> like refcount_dec_not_one() or unconditional set like in this place.
>
> I chose specifically this function because __must_hold(RCU) makes it
> look like it can be accessed concurrently from datapath, which is not
> obvious on multiple previous usages of reference counters in the series.
True, tables can be manipulated from control plane/user space,
datapath as well as timers (mostly for delete).
Would using wrappers around these incr/decr help? i mean meaningful
inlines that will provide clarity as to what an incr/decr is?
> So what happens here if entries_ref was 0 to begin with? Or is possible
> for this function to be executed concurrently by multiple tasks, in
> which case all of them set entries_ref to 0, but first one that finishes
> resets the counter back to 1 at which point I assume it can be deleted
> in parallel by control path while some concurrent
> __p4tc_table_entry_update() are still running (at least that is what the
> comment here indicates)?
It's rtnl-lockless, so you can imagine what would happen there ;->
Multiple concurent user space, kernel, timers all contending for this.
Exactly what you said: its zero in this case because some entity could
delete it in parallel.
See comment further down which says "In case of parallel update, the
thread that arrives here first will..."
Consider it a poor man's lock. Does that help? Perhaps we could have
more discussion at the monthly tc meetup..
We have been testing this code a lot for concurrency and wrote some
user space tooling to catch such issues.
>
> > +
> > + if (table->tbl_type != P4TC_TABLE_TYPE_EXACT) {
> > + mask_found = p4tc_table_entry_mask_add(table, entry, mask);
> > + if (IS_ERR(mask_found)) {
> > + ret = PTR_ERR(mask_found);
> > + goto out;
> > + }
> > + }
> > +
> > + p4tc_table_entry_build_key(table, &entry->key, mask_found);
> > +
> > + entry_old = p4tc_entry_lookup(table, &entry->key, value->prio);
> > + if (!entry_old) {
> > + ret = -ENOENT;
> > + goto rm_masks_idr;
> > + }
> > +
> > + /* In case of parallel update, the thread that arrives here first will
> > + * get the right to update.
> > + *
> > + * In case of a parallel get/update, whoever is second will fail appropriately
> > + */
> > + value_old = p4tc_table_entry_value(entry_old);
> > + if (!p4tc_tbl_entry_put(value_old)) {
> > + ret = -EAGAIN;
> > + goto rm_masks_idr;
> > + }
> > +
> > + if (from_control) {
> > + if (!p4tc_ctrl_update_ok(value_old->permissions)) {
> > + ret = -EPERM;
> > + goto set_entries_refcount;
> > + }
> > + } else {
> > + if (!p4tc_data_update_ok(value_old->permissions)) {
> > + ret = -EPERM;
> > + goto set_entries_refcount;
> > + }
> > + }
> > +
> > + tm = kzalloc(sizeof(*tm), GFP_ATOMIC);
> > + if (unlikely(!tm)) {
> > + ret = -ENOMEM;
> > + goto set_entries_refcount;
> > + }
> > +
> > + tm_old = rcu_dereference_protected(value_old->tm, 1);
> > + *tm = *tm_old;
> > +
> > + tm->lastused = jiffies;
> > + tm->who_updated = whodunnit;
> > +
> > + if (value->permissions == P4TC_PERMISSIONS_UNINIT)
> > + value->permissions = value_old->permissions;
> > +
> > + rcu_assign_pointer(value->tm, tm);
> > +
> > + entry_work = kzalloc(sizeof(*(entry_work)), GFP_ATOMIC);
> > + if (unlikely(!entry_work)) {
> > + ret = -ENOMEM;
> > + goto free_tm;
> > + }
> > +
> > + entry_work->pipeline = pipeline;
> > + entry_work->table = table;
> > + entry_work->entry = entry;
> > + value->entry_work = entry_work;
> > + if (!value->is_dyn)
> > + value->is_dyn = value_old->is_dyn;
> > +
> > + if (value->is_dyn) {
> > + /* Only use old entry value if user didn't specify new one */
> > + value->aging_ms = value->aging_ms ?: value_old->aging_ms;
> > +
> > + hrtimer_init(&value->entry_timer, CLOCK_MONOTONIC,
> > + HRTIMER_MODE_REL);
> > + value->entry_timer.function = &entry_timer_handle;
> > +
> > + hrtimer_start(&value->entry_timer, ms_to_ktime(value->aging_ms),
> > + HRTIMER_MODE_REL);
> > + }
> > +
> > + INIT_WORK(&entry_work->work, p4tc_table_entry_del_work);
> > +
> > + if (rhltable_insert(&table->tbl_entries, &entry->ht_node,
> > + entry_hlt_params) < 0) {
> > + ret = -EEXIST;
> > + goto free_entry_work;
> > + }
> > +
> > + p4tc_table_entry_destroy_noida(table, entry_old);
> > +
> > + if (!from_control)
> > + p4tc_tbl_entry_emit_event(entry_work, RTM_P4TC_UPDATE,
> > + GFP_ATOMIC);
> > +
> > + return 0;
> > +
> > +free_entry_work:
> > + kfree(entry_work);
> > +
> > +free_tm:
> > + kfree(tm);
> > +
> > +set_entries_refcount:
> > + refcount_set(&value_old->entries_ref, 1);
> > +
> > +rm_masks_idr:
> > + if (table->tbl_type != P4TC_TABLE_TYPE_EXACT)
> > + p4tc_table_entry_mask_del(table, entry);
> > +
> > +out:
> > + return ret;
> > +}
> > +
[..trimmed..]
cheers,
jamal
Powered by blists - more mailing lists