lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Sun, 29 Jan 2017 18:09:21 -0500
From:   Boris Ostrovsky <>
To:     Vineeth Remanan Pillai <>,
        David Miller <>,,, Wei Liu <>,
        Paul Durrant <>,
        xen-devel <>
Subject: Re: [PATCH v2] xen-netfront: Fix Rx stall during network stress and

On 01/19/2017 11:35 AM, Vineeth Remanan Pillai wrote:
> From: Vineeth Remanan Pillai <>
> During an OOM scenario, request slots could not be created as skb
> allocation fails. So the netback cannot pass in packets and netfront
> wrongly assumes that there is no more work to be done and it disables
> polling. This causes Rx to stall.
> The issue is with the retry logic which schedules the timer if the
> created slots are less than NET_RX_SLOTS_MIN. The count of new request
> slots to be pushed are calculated as a difference between new req_prod
> and rsp_cons which could be more than the actual slots, if there are
> unconsumed responses.
> The fix is to calculate the count of newly created slots as the
> difference between new req_prod and old req_prod.
> Signed-off-by: Vineeth Remanan Pillai <>
> Reviewed-by: Juergen Gross <>
> ---
> Changes in v2:
>     - Removed the old implementation of enabling polling on
>       skb allocation error.
>     - Corrected the refill timer logic to schedule when newly
>       created slots since last push is less than NET_RX_SLOTS_MIN.

There are couple of problems with this patch.
1. The 'if' clause now evaluates to true on pretty much every call to 
2. It tickles a latent bug during resume where the timer triggers before 
we re-connect. The trouble is that we now try to dereference 
queue->rx.sring which is NULL since we disconnect in netfront_resume(). 
(Curiously, I only observe it with 32-bit guests)

I'll send a patch later that will delete the timer since it looks like a 
bug to me in any case but the first problem seems to be more serious 
than the problem that this patch addresses.


>  drivers/net/xen-netfront.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
> index 40f26b6..2c7c29f 100644
> --- a/drivers/net/xen-netfront.c
> +++ b/drivers/net/xen-netfront.c
> @@ -321,7 +321,7 @@ static void xennet_alloc_rx_buffers(struct
> netfront_queue *queue)
>      queue->rx.req_prod_pvt = req_prod;
>      /* Not enough requests? Try again later. */
> -    if (req_prod - queue->rx.rsp_cons < NET_RX_SLOTS_MIN) {
> +    if (req_prod - queue->rx.sring->req_prod < NET_RX_SLOTS_MIN) {
>          mod_timer(&queue->rx_refill_timer, jiffies + (HZ/10));
>          return;
>      }

Powered by blists - more mailing lists