[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <38ccfaea-0a65-a6f3-c19a-e6f9c0d4ef76@oracle.com>
Date: Sun, 29 Jan 2017 18:09:21 -0500
From: Boris Ostrovsky <boris.ostrovsky@...cle.com>
To: Vineeth Remanan Pillai <vineethp@...zon.com>,
David Miller <davem@...emloft.net>, netdev@...r.kernel.org,
linux-kernel@...r.kernel.org, Wei Liu <wei.liu2@...rix.com>,
Paul Durrant <paul.durrant@...rix.com>,
xen-devel <xen-devel@...ts.xen.org>
Subject: Re: [PATCH v2] xen-netfront: Fix Rx stall during network stress and
OOM
On 01/19/2017 11:35 AM, Vineeth Remanan Pillai wrote:
> From: Vineeth Remanan Pillai <vineethp@...zon.com>
>
> During an OOM scenario, request slots could not be created as skb
> allocation fails. So the netback cannot pass in packets and netfront
> wrongly assumes that there is no more work to be done and it disables
> polling. This causes Rx to stall.
>
> The issue is with the retry logic which schedules the timer if the
> created slots are less than NET_RX_SLOTS_MIN. The count of new request
> slots to be pushed are calculated as a difference between new req_prod
> and rsp_cons which could be more than the actual slots, if there are
> unconsumed responses.
>
> The fix is to calculate the count of newly created slots as the
> difference between new req_prod and old req_prod.
>
> Signed-off-by: Vineeth Remanan Pillai <vineethp@...zon.com>
> Reviewed-by: Juergen Gross <jgross@...e.com>
> ---
> Changes in v2:
> - Removed the old implementation of enabling polling on
> skb allocation error.
> - Corrected the refill timer logic to schedule when newly
> created slots since last push is less than NET_RX_SLOTS_MIN.
There are couple of problems with this patch.
1. The 'if' clause now evaluates to true on pretty much every call to
xennet_alloc_rx_buffers().
2. It tickles a latent bug during resume where the timer triggers before
we re-connect. The trouble is that we now try to dereference
queue->rx.sring which is NULL since we disconnect in netfront_resume().
(Curiously, I only observe it with 32-bit guests)
I'll send a patch later that will delete the timer since it looks like a
bug to me in any case but the first problem seems to be more serious
than the problem that this patch addresses.
-boris
>
> drivers/net/xen-netfront.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
> index 40f26b6..2c7c29f 100644
> --- a/drivers/net/xen-netfront.c
> +++ b/drivers/net/xen-netfront.c
> @@ -321,7 +321,7 @@ static void xennet_alloc_rx_buffers(struct
> netfront_queue *queue)
> queue->rx.req_prod_pvt = req_prod;
>
> /* Not enough requests? Try again later. */
> - if (req_prod - queue->rx.rsp_cons < NET_RX_SLOTS_MIN) {
> + if (req_prod - queue->rx.sring->req_prod < NET_RX_SLOTS_MIN) {
> mod_timer(&queue->rx_refill_timer, jiffies + (HZ/10));
> return;
> }
Powered by blists - more mailing lists