[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZCR+5p8TGvS+GQsP@corigine.com>
Date: Wed, 29 Mar 2023 20:09:42 +0200
From: Simon Horman <simon.horman@...igine.com>
To: Liang He <windhl@....com>
Cc: Jakub Kicinski <kuba@...nel.org>, davem@...emloft.net,
edumazet@...gle.com, pabeni@...hat.com, netdev@...r.kernel.org
Subject: Re: [PATCH] rionet: Fix refcounting bugs
On Wed, Mar 29, 2023 at 02:01:30PM +0800, Liang He wrote:
>
> Hi, Jakub,
>
> First, thanks for you reviewing our patch again.
>
> At 2023-03-29 10:10:51, "Jakub Kicinski" <kuba@...nel.org> wrote:
> >On Tue, 28 Mar 2023 12:50:06 +0800 Liang He wrote:
> >> In rionet_start_xmit(), we should put the refcount_inc()
> >> before we add *skb* into the queue, otherwise it may cause
> >> the consumer to prematurely call refcount_dec().
> >
> >Are you sure the race can happen? Look around the code, please.
>
> We commit this patch based on the pattern we learned from these commits,
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v6.3-rc4&id=bb765d1c331f62b59049d35607ed2e365802bef9
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v6.3-rc4&id=47a017f33943278570c072bc71681809b2567b3a
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v6.3-rc4&id=b780d7415aacec855e2f2370cbf98f918b224903
>
> If it is indeed in different context which makes our pattern failed, please kindly pointing out it.
The difference is that these are not start_xmit callbacks.
See below.
> >> Besides, before the next rionet_queue_tx_msg() when we
> >> meet the 'RIONET_MAC_MATCH', we should also call
> >> refcount_inc() before the skb is added into the queue.
> >
> >And why is that?
>
> We think it should be better to keep the consistent refcounting-operation
> when we put the *skb* into the queue.
It's subjective.
Due to the locking of the code in question there is no race.
The code you modify is protected by &rnet->tx_lock.
As is the code that will decrement (and free) the skb.
> >As far as I can tell your patch reorders something that doesn't matter
> >and then adds a bug :|
>
> If there is any bug, can you kindly tell us?
We are dealing with a start_xmit NDO callback.
To simplify things, let us only consider cases where
NETDEV_TX_OK is returned, which is the case for the code you modify.
In such cases the callback should consume the skb, which already
has a reference taken on it. This is done in the driver by a call
to dev_kfree_skb_irq() which is executed indirectly via
rionet_queue_tx_msg().
So currently the reference count handling is correct.
By taking an extra reference, it will never reach zero,
and thus the resources of the skb will be leaked.
Powered by blists - more mailing lists