[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAM0EoMnz-vFOizxqwgRocqiQtzwJ_6uk1fDvGHZce0yDpnRAGA@mail.gmail.com>
Date: Mon, 9 Oct 2023 11:39:04 -0400
From: Jamal Hadi Salim <jhs@...atatu.com>
To: Vlad Buslov <vladbu@...dia.com>
Cc: netdev@...r.kernel.org, deb.chatterjee@...el.com, anjali.singhai@...el.com,
namrata.limaye@...el.com, tom@...anda.io, mleitner@...hat.com,
Mahesh.Shirshyad@....com, tomasz.osinski@...el.com, jiri@...nulli.us,
xiyou.wangcong@...il.com, davem@...emloft.net, edumazet@...gle.com,
kuba@...nel.org, pabeni@...hat.com, horms@...nel.org, kernel@...atatu.com,
khalidm@...dia.com, toke@...hat.com, mattyk@...dia.com
Subject: Re: [PATCH RFC v6 net-next 13/17] p4tc: add table entry create,
update, get, delete, flush and dump
On Mon, Oct 9, 2023 at 10:59 AM Vlad Buslov <vladbu@...dia.com> wrote:
>
>
> On Mon 09 Oct 2023 at 10:02, Jamal Hadi Salim <jhs@...atatu.com> wrote:
> > Hi Vlad,
> >
> > On Sun, Oct 8, 2023 at 12:36 PM Vlad Buslov <vladbu@...dia.com> wrote:
> >>
> >> On Sat 30 Sep 2023 at 10:35, Jamal Hadi Salim <jhs@...atatu.com> wrote:
> >
> > [..trimmed...]
> >
> >> > +/* Invoked from both control and data path */
> >> > +static int __p4tc_table_entry_update(struct p4tc_pipeline *pipeline,
> >> > + struct p4tc_table *table,
> >> > + struct p4tc_table_entry *entry,
> >> > + struct p4tc_table_entry_mask *mask,
> >> > + u16 whodunnit, bool from_control)
> >> > +__must_hold(RCU)
Mark B - comment further down.
> >> > +{
> >> > + struct p4tc_table_entry_mask *mask_found = NULL;
> >> > + struct p4tc_table_entry_work *entry_work;
> >> > + struct p4tc_table_entry_value *value_old;
> >> > + struct p4tc_table_entry_value *value;
> >> > + struct p4tc_table_entry *entry_old;
> >> > + struct p4tc_table_entry_tm *tm_old;
> >> > + struct p4tc_table_entry_tm *tm;
> >> > + int ret;
> >> > +
> >> > + value = p4tc_table_entry_value(entry);
> >> > + /* We set it to zero on create an update to avoid having entry
> >> > + * deletion in parallel before we report to user space.
> >> > + */
> >> > + refcount_set(&value->entries_ref, 0);
> >>
> >> TBH I already commented on one of the previous versions of this series
> >> that it is very hard to understand and review tons of different atomic
> >> reference counters, especially when they are modified with functions
> >> like refcount_dec_not_one() or unconditional set like in this place.
> >>
> >> I chose specifically this function because __must_hold(RCU) makes it
> >> look like it can be accessed concurrently from datapath, which is not
> >> obvious on multiple previous usages of reference counters in the series.
> >
> > True, tables can be manipulated from control plane/user space,
> > datapath as well as timers (mostly for delete).
> > Would using wrappers around these incr/decr help? i mean meaningful
> > inlines that will provide clarity as to what an incr/decr is?
>
> Considering existing wrappers I don't believe adding more would help.
> Looking at wrappers for entries_ref:
>
> static inline int p4tc_tbl_entry_get(struct p4tc_table_entry_value *value)
> {
> return refcount_inc_not_zero(&value->entries_ref);
> }
>
> static inline bool p4tc_tbl_entry_put(struct p4tc_table_entry_value *value)
> {
> return refcount_dec_if_one(&value->entries_ref);
> }
>
> static inline bool p4tc_tbl_entry_put_ref(struct p4tc_table_entry_value *value)
> {
> return refcount_dec_not_one(&value->entries_ref);
> }
>
> it isn't obvious to me why 'put' is dec_if_one and put_ref is dec_not
> one.
>
Ok, so maybe answers to your questions below will help to clarify.
Also: Would it help if we also put comments in the data structs to say
what each refcnt, atomic and ref is for (they are clear to us but
we've been staring at this for too long).
> >> So what happens here if entries_ref was 0 to begin with? Or is possible
> >> for this function to be executed concurrently by multiple tasks, in
> >> which case all of them set entries_ref to 0, but first one that finishes
> >> resets the counter back to 1 at which point I assume it can be deleted
> >> in parallel by control path while some concurrent
> >> __p4tc_table_entry_update() are still running (at least that is what the
> >> comment here indicates)?
> >
> > It's rtnl-lockless, so you can imagine what would happen there ;->
> > Multiple concurent user space, kernel, timers all contending for this.
> > Exactly what you said: its zero in this case because some entity could
> > delete it in parallel.
> > See comment further down which says "In case of parallel update, the
> > thread that arrives here first will..."
>
> See below.
>
> > Consider it a poor man's lock. Does that help? Perhaps we could have
> > more discussion at the monthly tc meetup..
> > We have been testing this code a lot for concurrency and wrote some
> > user space tooling to catch such issues.
>
> I'm not arguing that it is somehow broken or not well-tested, just
> saying I read this and previous versions several times and still have no
> idea what is going on here.
Fair enough - hopefully we can rectify this.
> >
> >>
> >> > +
> >> > + if (table->tbl_type != P4TC_TABLE_TYPE_EXACT) {
> >> > + mask_found = p4tc_table_entry_mask_add(table, entry, mask);
> >> > + if (IS_ERR(mask_found)) {
> >> > + ret = PTR_ERR(mask_found);
> >> > + goto out;
> >> > + }
> >> > + }
> >> > +
>
> Mark A, see next comment.
>
> >> > + p4tc_table_entry_build_key(table, &entry->key, mask_found);
> >> > +
> >> > + entry_old = p4tc_entry_lookup(table, &entry->key, value->prio);
> >> > + if (!entry_old) {
> >> > + ret = -ENOENT;
> >> > + goto rm_masks_idr;
> >> > + }
> >> > +
> >> > + /* In case of parallel update, the thread that arrives here first will
> >> > + * get the right to update.
> >> > + *
> >> > + * In case of a parallel get/update, whoever is second will fail appropriately
> >> > + */
> >> > + value_old = p4tc_table_entry_value(entry_old);
> >> > + if (!p4tc_tbl_entry_put(value_old)) {
> >> > + ret = -EAGAIN;
>
> I get that this prevents you from updating the old entry in parallel,
> but I was asking what would happen if concurrent update already finished
> by this time, set value->entries_ref to 1 and concurrent delete
> deallocated the new entry? Isn't use after free possible when calling
> p4tc_table_entry_build_key() and p4tc_entry_lookup() if all those events
> happen before mark A? (for example, if task spent a lot of time
> allocating new mask in p4tc_table_entry_mask_add())
The table entry is RCU and this function can only be called with RCU
read lock(See "Mark B" above), so a use-after-free scenario shouldn't
be possible.
I think we actually have a test case for the scenario you are
describing - but sometimes with these things it is based on luck to
hit the exact scenario; we could add delays at strategic code paths to
make it more deterministic to hit the scenario. Code review always
helps as you are doing right now (if you cant find issues in this
area, noone else could ;->).
cheers,
jamal
> >> > + goto rm_masks_idr;
> >> > + }
> >> > +
> >> > + if (from_control) {
> >> > + if (!p4tc_ctrl_update_ok(value_old->permissions)) {
> >> > + ret = -EPERM;
> >> > + goto set_entries_refcount;
> >> > + }
> >> > + } else {
> >> > + if (!p4tc_data_update_ok(value_old->permissions)) {
> >> > + ret = -EPERM;
> >> > + goto set_entries_refcount;
> >> > + }
> >> > + }
> >> > +
> >> > + tm = kzalloc(sizeof(*tm), GFP_ATOMIC);
> >> > + if (unlikely(!tm)) {
> >> > + ret = -ENOMEM;
> >> > + goto set_entries_refcount;
> >> > + }
> >> > +
> >> > + tm_old = rcu_dereference_protected(value_old->tm, 1);
> >> > + *tm = *tm_old;
> >> > +
> >> > + tm->lastused = jiffies;
> >> > + tm->who_updated = whodunnit;
> >> > +
> >> > + if (value->permissions == P4TC_PERMISSIONS_UNINIT)
> >> > + value->permissions = value_old->permissions;
> >> > +
> >> > + rcu_assign_pointer(value->tm, tm);
> >> > +
> >> > + entry_work = kzalloc(sizeof(*(entry_work)), GFP_ATOMIC);
> >> > + if (unlikely(!entry_work)) {
> >> > + ret = -ENOMEM;
> >> > + goto free_tm;
> >> > + }
> >> > +
> >> > + entry_work->pipeline = pipeline;
> >> > + entry_work->table = table;
> >> > + entry_work->entry = entry;
> >> > + value->entry_work = entry_work;
> >> > + if (!value->is_dyn)
> >> > + value->is_dyn = value_old->is_dyn;
> >> > +
> >> > + if (value->is_dyn) {
> >> > + /* Only use old entry value if user didn't specify new one */
> >> > + value->aging_ms = value->aging_ms ?: value_old->aging_ms;
> >> > +
> >> > + hrtimer_init(&value->entry_timer, CLOCK_MONOTONIC,
> >> > + HRTIMER_MODE_REL);
> >> > + value->entry_timer.function = &entry_timer_handle;
> >> > +
> >> > + hrtimer_start(&value->entry_timer, ms_to_ktime(value->aging_ms),
> >> > + HRTIMER_MODE_REL);
> >> > + }
> >> > +
> >> > + INIT_WORK(&entry_work->work, p4tc_table_entry_del_work);
> >> > +
> >> > + if (rhltable_insert(&table->tbl_entries, &entry->ht_node,
> >> > + entry_hlt_params) < 0) {
> >> > + ret = -EEXIST;
> >> > + goto free_entry_work;
> >> > + }
> >> > +
> >> > + p4tc_table_entry_destroy_noida(table, entry_old);
> >> > +
> >> > + if (!from_control)
> >> > + p4tc_tbl_entry_emit_event(entry_work, RTM_P4TC_UPDATE,
> >> > + GFP_ATOMIC);
> >> > +
> >> > + return 0;
> >> > +
> >> > +free_entry_work:
> >> > + kfree(entry_work);
> >> > +
> >> > +free_tm:
> >> > + kfree(tm);
> >> > +
> >> > +set_entries_refcount:
> >> > + refcount_set(&value_old->entries_ref, 1);
> >> > +
> >> > +rm_masks_idr:
> >> > + if (table->tbl_type != P4TC_TABLE_TYPE_EXACT)
> >> > + p4tc_table_entry_mask_del(table, entry);
> >> > +
> >> > +out:
> >> > + return ret;
> >> > +}
> >> > +
> >
> > [..trimmed..]
> >
> > cheers,
> > jamal
>
Powered by blists - more mailing lists