[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <5e173fa42e1d5_35982ae92e9d45bc4b@john-XPS-13-9370.notmuch>
Date: Thu, 09 Jan 2020 06:58:44 -0800
From: John Fastabend <john.fastabend@...il.com>
To: Toshiaki Makita <toshiaki.makita1@...il.com>,
John Fastabend <john.fastabend@...il.com>,
bjorn.topel@...il.com, bpf@...r.kernel.org, toke@...hat.com,
jbrouer@...hat.com
Cc: netdev@...r.kernel.org, ast@...nel.org, daniel@...earbox.net
Subject: Re: [bpf PATCH 2/2] bpf: xdp, remove no longer required
rcu_read_{un}lock()
Toshiaki Makita wrote:
> On 2020/01/09 15:35, John Fastabend wrote:
> > Toshiaki Makita wrote:
> >> On 2020/01/09 6:35, John Fastabend wrote:
> >>> Now that we depend on rcu_call() and synchronize_rcu() to also wait
> >>> for preempt_disabled region to complete the rcu read critical section
> >>> in __dev_map_flush() is no longer relevant.
> >>>
> >>> These originally ensured the map reference was safe while a map was
> >>> also being free'd. But flush by new rules can only be called from
> >>> preempt-disabled NAPI context. The synchronize_rcu from the map free
> >>> path and the rcu_call from the delete path will ensure the reference
> >>> here is safe. So lets remove the rcu_read_lock and rcu_read_unlock
> >>> pair to avoid any confusion around how this is being protected.
> >>>
> >>> If the rcu_read_lock was required it would mean errors in the above
> >>> logic and the original patch would also be wrong.
> >>>
> >>> Fixes: 0536b85239b84 ("xdp: Simplify devmap cleanup")
> >>> Signed-off-by: John Fastabend <john.fastabend@...il.com>
> >>> ---
> >>> kernel/bpf/devmap.c | 2 --
> >>> 1 file changed, 2 deletions(-)
> >>>
> >>> diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
> >>> index f0bf525..0129d4a 100644
> >>> --- a/kernel/bpf/devmap.c
> >>> +++ b/kernel/bpf/devmap.c
> >>> @@ -378,10 +378,8 @@ void __dev_map_flush(void)
> >>> struct list_head *flush_list = this_cpu_ptr(&dev_map_flush_list);
> >>> struct xdp_bulk_queue *bq, *tmp;
> >>>
> >>> - rcu_read_lock();
> >>> list_for_each_entry_safe(bq, tmp, flush_list, flush_node)
> >>> bq_xmit_all(bq, XDP_XMIT_FLUSH);
> >>> - rcu_read_unlock();
> >>
> >> I introduced this lock because some drivers have assumption that
> >> .ndo_xdp_xmit() is called under RCU. (commit 86723c864063)
> >>
> >> Maybe devmap deletion logic does not need this anymore, but is it
> >> OK to drivers?
> >
> > Ah OK thanks for catching this. So its a strange requirement from
> > virto_net to need read_lock like this. Quickly scanned the drivers
> > and seems its the only one.
> >
> > I think the best path forward is to fix virtio_net so it doesn't
> > need rcu_read_lock() here then the locking is much cleaner IMO.
>
> Actually veth is calling rcu_dereference in .ndo_xdp_xmit() so it needs
> the same treatment. In the reference I sent in another mail, Jesper
> said mlx5 also has some RCU dependency.
So veth, virtio and tun seem to need rcu_read_lock/unlock because
they use an rcu_dereference() in the xdp_xmit path. I'll audit the
rest today.
@Jesper, recall why mlx5 would require rcu_read_lock()/unlock()
pair? I just looked at mlx5_xdp_xmit and I'm not seeing a
rcu_dereference in there so if it is required I would want
to understand why.
>
> > I'll send a v2 and either move the xdp enabled check (the piece
> > using the rcu_read_lock) into a bitmask flag or push the
> > rcu_read_lock() into virtio_net so its clear that this is a detail
> > of virtio_net and not a general thing. FWIW I don't think the
> > rcu_read_lock is actually needed in the virtio_net case anymore
> > either but pretty sure the rcu_dereference will cause an rcu
> > splat. Maybe there is another annotation we can use. I'll dig
> > into it tomorrow. Thanks
>
> I'm thinking we can just move the rcu_lock to wrap around only ndo_xdp_xmit.
> But as you suggest if we can identify all drivers which depends on RCU and move the
> rcu_lock into the drivers (or remove the dependency) it's better.
I think we are working in bpf-next tree here so it would be best
to identify the minimal set of drivers that require the read_lock
and push that into the driver. I prefer these things are a precise
so its easy to understand when reading the code. Otherwise its
really not clear without grepping through the code or walking
the git history why we wrapped this in a rcu_read_lock/unlock.
At minimum we want a comment but that feels like a big hammer
that is not needed.
Most drivers should not care about the rcu_read_lock it seems
to just be special cases in the software devices where this happens.
veth for example is dereferencing the peer netdev. tun is dereference
the tun_file. virtio_net usage seems to be arbitrary to me and
is simply used to decide if xdp is enabled but we can do that
in a cleaner way.
I'll put a v2 together today and send it out for review.
Powered by blists - more mailing lists