linux-kernel - Re: [PATCH] xen-netfront: Fix Rx stall during network stress and OOM

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <0cb06b48-cb3c-47aa-2ae6-3a70197a5b64@amazon.com>
Date:   Thu, 12 Jan 2017 15:09:43 -0800
From:   Vineeth Remanan Pillai <vineethp@...zon.com>
To:     David Miller <davem@...emloft.net>
CC:     <boris.ostrovsky@...cle.com>, <jgross@...e.com>,
        <xen-devel@...ts.xenproject.org>, <netdev@...r.kernel.org>,
        <linux-kernel@...r.kernel.org>, <kamatam@...zon.com>,
        <aliguori@...zon.com>
Subject: Re: [PATCH] xen-netfront: Fix Rx stall during network stress and OOM



On 01/12/2017 12:17 PM, David Miller wrote:
> From: Vineeth Remanan Pillai <vineethp@...zon.com>
> Date: Wed, 11 Jan 2017 23:17:17 +0000
>
>> @@ -1054,7 +1059,11 @@ static int xennet_poll(struct napi_struct *napi, int budget)
>>   		napi_complete(napi);
>>   
>>   		RING_FINAL_CHECK_FOR_RESPONSES(&queue->rx, more_to_do);
>> -		if (more_to_do)
>> +
>> +		/* If there is more work to do or could not allocate
>> +		 * rx buffers, re-enable polling.
>> +		 */
>> +		if (more_to_do || err != 0)
>>   			napi_schedule(napi);
> Just polling endlessly in a loop retrying the SKB allocation over and over
> again until it succeeds is not very nice behavior.
>
> You already have that refill timer, so please use that to retry instead
> of wasting cpu cycles looping in NAPI poll.
Thanks Dave for the inputs.
On further look, I think I can fix it much simpler by correcting the 
test condition
for minimum slots for pushing requests. Existing test is like this:

<snip>
         /* Not enough requests? Try again later. */
        if (req_prod - queue->rx.rsp_cons < NET_RX_SLOTS_MIN) {
                 mod_timer(&queue->rx_refill_timer, jiffies + (HZ/10));
                 return;
         }
</snip>

Actually the above check counts more than the newly created request slots
as it counts from rsp_cons. The actual count should be the difference 
between
new req_prod and old req_prod(in the queue). If skbs cannot be created, 
this
count remains small and hence we would schedule the timer. So the fix 
could be:

         /* Not enough requests? Try again later. */
-       if (req_prod - queue->rx.rsp_cons < NET_RX_SLOTS_MIN) {
+       if (req_prod - queue->rx.sring->req_prod < NET_RX_SLOTS_MIN) {


I have done some initial testing to verify the fix. Will send out v2 
patch after couple
more round of testing.

Thanks,
Vineeth