[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.LFD.2.20.1705122151001.2835@ja.home.ssi.bg>
Date: Sat, 13 May 2017 00:27:52 +0300 (EEST)
From: Julian Anastasov <ja@....bg>
To: Cong Wang <xiyou.wangcong@...il.com>
cc: Eric Dumazet <eric.dumazet@...il.com>,
David Miller <davem@...emloft.net>,
Linux Kernel Network Developers <netdev@...r.kernel.org>,
Andrey Konovalov <andreyknvl@...gle.com>,
Eric Dumazet <edumazet@...gle.com>
Subject: Re: [Patch net] ipv4: restore rt->fi for reference counting
Hello,
On Fri, 12 May 2017, Cong Wang wrote:
> On Thu, May 11, 2017 at 11:39 PM, Julian Anastasov <ja@....bg> wrote:
> >
> > fib_flush will unlink the FIB infos at NETDEV_UNREGISTER
> > time, so we can not see them in any hash tables later on
> > NETDEV_UNREGISTER_FINAL. fib_put_nh_devs() can not work
> > except if moving NHs in another hash table but that should
> > not be needed.
>
> Ah, I did miss the hlist_del() in fib_table_flush(), so we just need some
> way to link those fib_nh together for NETDEV_UNREGISTER_FINAL,
> a linked list should be sufficient, but requires adding list_head to fib_nh.
It could be a fib_info_devhash_dead hash table, similar
to fib_info_devhash where we can hash NHs for dead fib infos
but then we will need a fib_clntref reference or otherwise last
user can drop the fib info before NETDEV_UNREGISTER_FINAL is called.
It will add more code in fib_sync_down_dev to know when
to call fib_info_hold. But OTOH, it will reduce the work
needed for careful release of the references in fib_release_info.
So, we have 2 possible cases to consider:
1. route deleted, fib_release_info called, fi held by socket,
NETDEV_UNREGISTER can not see the NHs because they are
unlinked in fib_release_info. dev unreg delayed for long time...
2. NETDEV_UNREGISTER called first, before the NHs are
unlinked.
Now the main question: is FIB_LOOKUP_NOREF used
everywhere in IPv4? I guess so. If not, it means
someone can walk its res->fi NHs which is bad. I think,
this will delay the unregistration for long time and we
can not solve the problem.
If yes, free_fib_info() should not use call_rcu.
Instead, fib_release_info() will start RCU callback to
drop everything via a common function for fib_release_info
and free_fib_info. As result, the last fib_info_put will
just need to free fi->fib_metrics and fi.
Something like:
/* Long living data */
fib_info_destroy:
{
if (fi->fib_dead == 0) {
pr_warn("Freeing alive fib_info %p\n", fi);
return;
}
if (fi->fib_metrics != (u32 *) dst_default_metrics)
kfree(fi->fib_metrics);
kfree(fi);
}
/* Do not imply RCU grace period anymore, last user is last */
fib_info_put():
{
if (atomic_dec_and_test(&fi->fib_clntref))
fib_info_destroy(fi);
}
/* Release everything except long living fields (fib_metrics) */
fib_info_release():
{
change_nexthops(fi) {
if (nexthop_nh->nh_dev)
dev_put(nexthop_nh->nh_dev);
lwtstate_put(nexthop_nh->nh_lwtstate);
free_nh_exceptions(nexthop_nh);
rt_fibinfo_free_cpus(nexthop_nh->nh_pcpu_rth_output);
rt_fibinfo_free(&nexthop_nh->nh_rth_input);
} endfor_nexthops(fi);
}
/* RCU grace period after unlink */
fib_release_info_rcu():
{
struct fib_info *fi = container_of(head, struct fib_info, rcu);
fib_info_release(fi);
fib_info_put(fi);
}
/* Called just on error */
free_fib_info():
{
fib_info_release(fi);
fib_info_put(fi);
}
fib_release_info():
{
...
fi->fib_dead = 1;
- fib_info_put(fi);
+ call_rcu(&fi->rcu, fib_release_info_rcu);
}
> > I'm thinking for the following solution which
> > is a bit hackish:
> >
> > - on NETDEV_UNREGISTER we want to put the nh_dev references,
> > so fib_release_info is a good place. But as fib_release_info
> > is not always called, we will have two places that put
> > references. We can use such hack:
> >
> > - for example, use nh_oif to know if dev_put is
> > already called
> >
> > - fib_release_info() should set nh_oif to 0 because
> > it will now call dev_put without clearing nh_dev
>
> Are you sure we are safe to call dev_put() in fib_release_info()
> for _all_ paths, especially non-unregister paths? See:
Yep, dev_put is safe there...
> commit e49cc0da7283088c5e03d475ffe2fdcb24a6d5b1
> Author: Yanmin Zhang <yanmin_zhang@...ux.intel.com>
> Date: Wed May 23 15:39:45 2012 +0000
>
> ipv4: fix the rcu race between free_fib_info and ip_route_output_slow
...as long as we do not set nh_dev to NULL
> But, hmm, very interesting, I always thought dev_put() triggers a
> kfree() potentially, but here your suggestion actually leverages the fact
> that it is merely a pcpu counter decrease, so for unregister path,
> this is just giving refcnt back, which means it is safe as long as
> we don't have any parallel unregister? We should because of RTNL.
Yes, we are in the context of unregistration and
there is a rcu_barrier() that prevents device to be
destroyed while there are RCU users. Refcnt reaching 0 is
not enough to free dev, RCU users must be considered,
they do not get reference.
> I see why you say this is hackish, really it is. ;) We have to ensure
> the evil dev_put() is only called on unregister path.
Or after a RCU grace period but not later...
> > - fib_dev/nh_dev != NULL checks are not needed, so this addresses
> > Eric's concerns. BTW, fib_route_seq_show actually checks
> > for fi->fib_dev, may be not in a safe way (lockless_dereference
> > needed?) but as we don't set nh_dev to NULL this is not
> > needed.
> >
>
> I think Eric was complaining about the lack of rcu_dereference()
> there.
Not needed if we don't set nh_dev to NULL.
> > What more? What about nh_pcpu_rth_output and
> > nh_rth_input holding routes? We should think about
> > them too. I should think more if nh_oif trick can work
> > for them, may be not because nh_oif is optional...
> > May be we should purge them somehow?
> >
>
> Or maybe don't touch nh_oif but using a new flag in
> nh_flags?? We only need to know if we should call
> dev_put() in free_fib_info_rcu().
fib_dead is a better option if we decide to use
such solution: 1=not called by fib_release_info,
2=called by fib_release_info.
Regards
--
Julian Anastasov <ja@....bg>
Powered by blists - more mailing lists