netdev - Re: [PATCH] xen-netback: fix race between napi

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <20140326.163302.1619665103723572971.davem@davemloft.net>
Date:	Wed, 26 Mar 2014 16:33:02 -0400 (EDT)
From:	David Miller <davem@...emloft.net>
To:	wei.liu2@...rix.com
Cc:	david.vrabel@...rix.com, netdev@...r.kernel.org,
	xen-devel@...ts.xenproject.org, ian.campbell@...rix.com
Subject: Re: [PATCH] xen-netback: fix race between napi_complete() and
 interrupt handler

From: Wei Liu <wei.liu2@...rix.com>
Date: Tue, 25 Mar 2014 14:50:21 +0000

> You forgot to target this patch to "net" tree in subject line.
> 
> On Tue, Mar 25, 2014 at 02:08:25PM +0000, David Vrabel wrote:
>> When the NAPI budget was not all used, xenvif_poll() would call
>> napi_complete() /after/ enabling the interrupt.  This resulted in a
>> race between the napi_complete() and the napi_schedule() in the
>> interrupt handler.  The use of local_irq_save/restore() avoided by
>> race iff the handler is running on the same CPU but not if it was
>> running on a different CPU.
>> 
> 
> OK, I understand this issue now. You mentioned it in the other email
> which made me a bit confused.
> 
> Just curious, how do you trigger this? By re-binding the interrupt to
> another CPU when xenvif_poll is running? I used to run irqbalance (the
> one that works with xen virtual interrupt) but could not trigger a race.
> Probably the race window is too small to trigger?
> 
>> Fix this properly by calling napi_complete() before reenabling
>> interrupts (in the xenvif_check_rx_xenvif() call).
>> 
>> Signed-off-by: David Vrabel <david.vrabel@...rix.com>
>> ---
>>  drivers/net/xen-netback/interface.c |   28 ++--------------------------
>>  1 files changed, 2 insertions(+), 26 deletions(-)
>> 
>> diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c
>> index 7669d49..ee322d9 100644
>> --- a/drivers/net/xen-netback/interface.c
>> +++ b/drivers/net/xen-netback/interface.c
>> @@ -65,32 +65,8 @@ static int xenvif_poll(struct napi_struct *napi, int budget)
>>  	work_done = xenvif_tx_action(vif, budget);
>>  
>>  	if (work_done < budget) {
>> -		int more_to_do = 0;
>> -		unsigned long flags;
>> -
>> -		/* It is necessary to disable IRQ before calling
>> -		 * RING_HAS_UNCONSUMED_REQUESTS. Otherwise we might
>> -		 * lose event from the frontend.
>> -		 *
>> -		 * Consider:
>> -		 *   RING_HAS_UNCONSUMED_REQUESTS
>> -		 *   <frontend generates event to trigger napi_schedule>
>> -		 *   __napi_complete
>> -		 *
>> -		 * This handler is still in scheduled state so the
>> -		 * event has no effect at all. After __napi_complete
>> -		 * this handler is descheduled and cannot get
>> -		 * scheduled again. We lose event in this case and the ring
>> -		 * will be completely stalled.
>> -		 */
>> -
>> -		local_irq_save(flags);
>> -
>> -		RING_FINAL_CHECK_FOR_REQUESTS(&vif->tx, more_to_do);
>> -		if (!more_to_do)
>> -			__napi_complete(napi);
>> -
>> -		local_irq_restore(flags);
>> +		napi_complete(napi);
> 
> You need to add comment here to say interrupt is in fact "disabled"
> before this point, and "enabled" by xenvif_check_rx_xenvif().

Agreed.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html