[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20171204165304.GC28667@breakpoint.cc>
Date: Mon, 4 Dec 2017 17:53:04 +0100
From: Florian Westphal <fw@...len.de>
To: David Miller <davem@...emloft.net>
Cc: fw@...len.de, netdev@...r.kernel.org
Subject: Re: [PATCH next-next 0/4] rtnetlink: rework handler (un)registering
David Miller <davem@...emloft.net> wrote:
> From: Florian Westphal <fw@...len.de>
> Date: Sat, 2 Dec 2017 21:44:04 +0100
>
> > Peter Zijlstra reported (referring to commit 019a316992ee0d983,
> > "rtnetlink: add reference counting to prevent module unload while dump is in progress"):
> >
> > 1) it not in fact a refcount, so using refcount_t is silly
> > 2) there is a distinct lack of memory barriers, so we can easily
> > observe the decrement while the msg_handler is still in progress.
> > 3) waiting with a schedule()/yield() loop is complete crap and subject
> > life-locks, imagine doing that rtnl_unregister_all() from a RT task.
> >
> > In ancient times rtnetlink exposed a statically-sized table with
> > preset doit/dumpit handlers to be called for a protocol/type pair.
> >
> > Later the rtnl_register interface was added and the table was allocated
> > on demand. Eventually these were also used by modules.
> >
> > Problem is that nothing prevents module unload while a netlink dump
> > is in progress. netlink dumps can be span multiple recv calls and
> > netlink core saves the to-be-repeated dumper address for later invocation.
> >
> > To prevent rmmod the netlink core expects callers to pass in the owning
> > module so a reference can be taken.
> >
> > So far rtnetlink wasn't doing this, add new interface to pass THIS_MODULE.
> > Moreover, when converting parts of the rtnetlink handling to rcu this code
> > gained way too many READ_ONCE spots, remove them and the extra refcounting.
> >
> > Take a module reference when running dumpit and doit callbacks
> > and never alter content of rtnl_link structures after they have been
> > published via rcu_assign_pointer.
> >
> > Based partially on earlier patch from Peter.
>
> Series applied, thanks Florian.
>
> BTW, why did you use rtnl_register_module() for most of ipv6 but
> specifically not for the net/ipv6/addrconf.c stuff?
Right, originally I did not convert ipv6 at all, as even if its
a module it cannot be unloaded.
Then I thought it might be good to also get rid of of panic() in
rtnl_register (which made __rtnl_register redundant).
I switched to _module() where error handling wasn't too hard
(i.e., where we can make module insertion fail without adding too many
error unwinding stunts).
The only difference is that theoretically some addrlabel rtnetlink
ops are not available if registration fails, but thats not really worse
than before as this used to panic() instead :-)
I can send a followup patch, to convert all of ipv6, should be easy
to propagate errors from addrlabel initialisation to its callsite.
Powered by blists - more mailing lists