[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <1683861904.528041-1-xuanzhuo@linux.alibaba.com>
Date: Fri, 12 May 2023 11:25:04 +0800
From: Xuan Zhuo <xuanzhuo@...ux.alibaba.com>
To: Feng Liu <feliu@...dia.com>
Cc: "Michael S . Tsirkin" <mst@...hat.com>,
Simon Horman <simon.horman@...igine.com>,
Bodong Wang <bodong@...dia.com>,
William Tu <witu@...dia.com>,
Parav Pandit <parav@...dia.com>,
virtualization@...ts.linux-foundation.org,
netdev@...r.kernel.org,
linux-kernel@...r.kernel.org,
bpf@...r.kernel.org,
Jason Wang <jasowang@...hat.com>
Subject: Re: [PATCH net v3] virtio_net: Fix error unwinding of XDP initialization
On Thu, 11 May 2023 21:54:40 -0400, Feng Liu <feliu@...dia.com> wrote:
>
>
> On 2023-05-10 a.m.1:00, Jason Wang wrote:
> > External email: Use caution opening links or attachments
> >
> >
> > 在 2023/5/9 09:43, Xuan Zhuo 写道:
> >> On Mon, 8 May 2023 11:00:10 -0400, Feng Liu <feliu@...dia.com> wrote:
> >>>
> >>> On 2023-05-07 p.m.9:45, Xuan Zhuo wrote:
> >>>> External email: Use caution opening links or attachments
> >>>>
> >>>>
> >>>> On Sat, 6 May 2023 08:08:02 -0400, Feng Liu <feliu@...dia.com> wrote:
> >>>>>
> >>>>> On 2023-05-05 p.m.10:33, Xuan Zhuo wrote:
> >>>>>> External email: Use caution opening links or attachments
> >>>>>>
> >>>>>>
> >>>>>> On Tue, 2 May 2023 20:35:25 -0400, Feng Liu <feliu@...dia.com> wrote:
> >>>>>>> When initializing XDP in virtnet_open(), some rq xdp initialization
> >>>>>>> may hit an error causing net device open failed. However, previous
> >>>>>>> rqs have already initialized XDP and enabled NAPI, which is not the
> >>>>>>> expected behavior. Need to roll back the previous rq initialization
> >>>>>>> to avoid leaks in error unwinding of init code.
> >>>>>>>
> >>>>>>> Also extract a helper function of disable queue pairs, and use newly
> >>>>>>> introduced helper function in error unwinding and virtnet_close;
> >>>>>>>
> >>>>>>> Issue: 3383038
> >>>>>>> Fixes: 754b8a21a96d ("virtio_net: setup xdp_rxq_info")
> >>>>>>> Signed-off-by: Feng Liu <feliu@...dia.com>
> >>>>>>> Reviewed-by: William Tu <witu@...dia.com>
> >>>>>>> Reviewed-by: Parav Pandit <parav@...dia.com>
> >>>>>>> Reviewed-by: Simon Horman <simon.horman@...igine.com>
> >>>>>>> Acked-by: Michael S. Tsirkin <mst@...hat.com>
> >>>>>>> Change-Id: Ib4c6a97cb7b837cfa484c593dd43a435c47ea68f
> >>>>>>> ---
> >>>>>>> drivers/net/virtio_net.c | 30 ++++++++++++++++++++----------
> >>>>>>> 1 file changed, 20 insertions(+), 10 deletions(-)
> >>>>>>>
> >>>>>>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> >>>>>>> index 8d8038538fc4..3737cf120cb7 100644
> >>>>>>> --- a/drivers/net/virtio_net.c
> >>>>>>> +++ b/drivers/net/virtio_net.c
> >>>>>>> @@ -1868,6 +1868,13 @@ static int virtnet_poll(struct napi_struct
> >>>>>>> *napi, int budget)
> >>>>>>> return received;
> >>>>>>> }
> >>>>>>>
> >>>>>>> +static void virtnet_disable_qp(struct virtnet_info *vi, int
> >>>>>>> qp_index)
> >>>>>>> +{
> >>>>>>> + virtnet_napi_tx_disable(&vi->sq[qp_index].napi);
> >>>>>>> + napi_disable(&vi->rq[qp_index].napi);
> >>>>>>> + xdp_rxq_info_unreg(&vi->rq[qp_index].xdp_rxq);
> >>>>>>> +}
> >>>>>>> +
> >>>>>>> static int virtnet_open(struct net_device *dev)
> >>>>>>> {
> >>>>>>> struct virtnet_info *vi = netdev_priv(dev);
> >>>>>>> @@ -1883,20 +1890,26 @@ static int virtnet_open(struct net_device
> >>>>>>> *dev)
> >>>>>>>
> >>>>>>> err = xdp_rxq_info_reg(&vi->rq[i].xdp_rxq, dev,
> >>>>>>> i, vi->rq[i].napi.napi_id);
> >>>>>>> if (err < 0)
> >>>>>>> - return err;
> >>>>>>> + goto err_xdp_info_reg;
> >>>>>>>
> >>>>>>> err =
> >>>>>>> xdp_rxq_info_reg_mem_model(&vi->rq[i].xdp_rxq,
> >>>>>>>
> >>>>>>> MEM_TYPE_PAGE_SHARED, NULL);
> >>>>>>> - if (err < 0) {
> >>>>>>> - xdp_rxq_info_unreg(&vi->rq[i].xdp_rxq);
> >>>>>>> - return err;
> >>>>>>> - }
> >>>>>>> + if (err < 0)
> >>>>>>> + goto err_xdp_reg_mem_model;
> >>>>>>>
> >>>>>>> virtnet_napi_enable(vi->rq[i].vq, &vi->rq[i].napi);
> >>>>>>> virtnet_napi_tx_enable(vi, vi->sq[i].vq,
> >>>>>>> &vi->sq[i].napi);
> >>>>>>> }
> >>>>>>>
> >>>>>>> return 0;
> >>>>>>> +
> >>>>>>> +err_xdp_reg_mem_model:
> >>>>>>> + xdp_rxq_info_unreg(&vi->rq[i].xdp_rxq);
> >>>>>>> +err_xdp_info_reg:
> >>>>>>> + for (i = i - 1; i >= 0; i--)
> >>>>>>> + virtnet_disable_qp(vi, i);
> >>>>>>
> >>>>>> I would to know should we handle for these:
> >>>>>>
> >>>>>> disable_delayed_refill(vi);
> >>>>>> cancel_delayed_work_sync(&vi->refill);
> >>>>>>
> >>>>>>
> >>>>>> Maybe we should call virtnet_close() with "i" directly.
> >>>>>>
> >>>>>> Thanks.
> >>>>>>
> >>>>>>
> >>>>> Can’t use i directly here, because if xdp_rxq_info_reg fails, napi has
> >>>>> not been enabled for current qp yet, I should roll back from the queue
> >>>>> pairs where napi was enabled before(i--), otherwise it will hang at
> >>>>> napi
> >>>>> disable api
> >>>> This is not the point, the key is whether we should handle with:
> >>>>
> >>>> disable_delayed_refill(vi);
> >>>> cancel_delayed_work_sync(&vi->refill);
> >>>>
> >>>> Thanks.
> >>>>
> >>>>
> >>> OK, get the point. Thanks for your careful review. And I check the code
> >>> again.
> >>>
> >>> There are two points that I need to explain:
> >>>
> >>> 1. All refill delay work calls(vi->refill, vi->refill_enabled) are based
> >>> on that the virtio interface is successfully opened, such as
> >>> virtnet_receive, virtnet_rx_resize, _virtnet_set_queues, etc. If there
> >>> is an error in the xdp reg here, it will not trigger these subsequent
> >>> functions. There is no need to call disable_delayed_refill() and
> >>> cancel_delayed_work_sync().
> >> Maybe something is wrong. I think these lines may call delay work.
> >>
> >> static int virtnet_open(struct net_device *dev)
> >> {
> >> struct virtnet_info *vi = netdev_priv(dev);
> >> int i, err;
> >>
> >> enable_delayed_refill(vi);
> >>
> >> for (i = 0; i < vi->max_queue_pairs; i++) {
> >> if (i < vi->curr_queue_pairs)
> >> /* Make sure we have some buffers: if oom use
> >> wq. */
> >> --> if (!try_fill_recv(vi, &vi->rq[i], GFP_KERNEL))
> >> --> schedule_delayed_work(&vi->refill, 0);
> >>
> >> err = xdp_rxq_info_reg(&vi->rq[i].xdp_rxq, dev, i,
> >> vi->rq[i].napi.napi_id);
> >> if (err < 0)
> >> return err;
> >>
> >> err = xdp_rxq_info_reg_mem_model(&vi->rq[i].xdp_rxq,
> >> MEM_TYPE_PAGE_SHARED,
> >> NULL);
> >> if (err < 0) {
> >> xdp_rxq_info_unreg(&vi->rq[i].xdp_rxq);
> >> return err;
> >> }
> >>
> >> virtnet_napi_enable(vi->rq[i].vq, &vi->rq[i].napi);
> >> virtnet_napi_tx_enable(vi, vi->sq[i].vq, &vi->sq[i].napi);
> >> }
> >>
> >> return 0;
> >> }
> >>
> >>
> >> And I think, if we virtnet_open() return error, then the status of
> >> virtnet
> >> should like the status after virtnet_close().
> >>
> >> Or someone has other opinion.
> >
> >
> > I agree, we need to disable and sync with the refill work.
> >
> > Thanks
> >
> >
> Hi, Jason & Xuan
>
> I will modify the patch according to the comments.
>
> But cannot call virtnet_close(), since virtnet_close cannot disable
> queue pairs from the specified error one. so still need to use disable
> helper function. The reason is as mentioned in the previous email, we
> need to roll back from the specified error queue, otherwise the queue
> pairs which has not been enabled napi will hang up at napi disable api.
>
> According to the comments, I will call disable_delayed_refill() and
> cancel_delayed_work_sync() in error unwinding, then call the disable
> helper function one by one for the queue pairs before the error one.
>
> Do you have any other comments about these?
LGTM
Thanks.
>
> Thanks
>
> >>
> >> Thanks.
> >>
> >>> The logic here is different from that of
> >>> virtnet_close. virtnet_close is based on the success of virtnet_open and
> >>> the tx and rx has been carried out normally. For error unwinding, only
> >>> disable qp is needed. Also encapuslated a helper function of disable qp,
> >>> which is used ing error unwinding and virtnet close
> >>> 2. The current error qp, which has not enabled NAPI, can only call xdp
> >>> unreg, and cannot call the interface of disable NAPI, otherwise the
> >>> kernel will be stuck. So for i-- the reason for calling disable qp on
> >>> the previous queue
> >>>
> >>> Thanks
> >>>
> >>>>>>> +
> >>>>>>> + return err;
> >>>>>>> }
> >>>>>>>
> >>>>>>> static int virtnet_poll_tx(struct napi_struct *napi, int budget)
> >>>>>>> @@ -2305,11 +2318,8 @@ static int virtnet_close(struct net_device
> >>>>>>> *dev)
> >>>>>>> /* Make sure refill_work doesn't re-enable napi! */
> >>>>>>> cancel_delayed_work_sync(&vi->refill);
> >>>>>>>
> >>>>>>> - for (i = 0; i < vi->max_queue_pairs; i++) {
> >>>>>>> - virtnet_napi_tx_disable(&vi->sq[i].napi);
> >>>>>>> - napi_disable(&vi->rq[i].napi);
> >>>>>>> - xdp_rxq_info_unreg(&vi->rq[i].xdp_rxq);
> >>>>>>> - }
> >>>>>>> + for (i = 0; i < vi->max_queue_pairs; i++)
> >>>>>>> + virtnet_disable_qp(vi, i);
> >>>>>>>
> >>>>>>> return 0;
> >>>>>>> }
> >>>>>>> --
> >>>>>>> 2.37.1 (Apple Git-137.1)
> >>>>>>>
> >
Powered by blists - more mailing lists