netdev - Re: [PATCH net-next] xen-netback: Rework rx_work

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20140115161011.GQ5698@zion.uk.xensource.com>
Date:	Wed, 15 Jan 2014 16:10:11 +0000
From:	Wei Liu <wei.liu2@...rix.com>
To:	Zoltan Kiss <zoltan.kiss@...rix.com>
CC:	Wei Liu <wei.liu2@...rix.com>, <ian.campbell@...rix.com>,
	<xen-devel@...ts.xenproject.org>, <netdev@...r.kernel.org>,
	<linux-kernel@...r.kernel.org>, <jonathan.davies@...rix.com>
Subject: Re: [PATCH net-next] xen-netback: Rework rx_work_todo

On Wed, Jan 15, 2014 at 03:08:12PM +0000, Zoltan Kiss wrote:
> On 15/01/14 14:59, Wei Liu wrote:
> >On Wed, Jan 15, 2014 at 02:52:41PM +0000, Zoltan Kiss wrote:
> >>On 15/01/14 14:45, Wei Liu wrote:
> >>>>>>The recent patch to fix receive side flow control (11b57f) solved the spinning
> >>>>>>>>>thread problem, however caused an another one. The receive side can stall, if:
> >>>>>>>>>- xenvif_rx_action sets rx_queue_stopped to false
> >>>>>>>>>- interrupt happens, and sets rx_event to true
> >>>>>>>>>- then xenvif_kthread sets rx_event to false
> >>>>>>>>>
> >>>>>>>
> >>>>>>>If you mean "rx_work_todo" returns false.
> >>>>>>>
> >>>>>>>In this case
> >>>>>>>
> >>>>>>>(!skb_queue_empty(&vif->rx_queue) && !vif->rx_queue_stopped) || vif->rx_event;
> >>>>>>>
> >>>>>>>can still be true, can't it?
> >>>>>Sorry, I should wrote rx_queue_stopped to true
> >>>>>
> >>>In this case, if rx_queue_stopped is true, then we're expecting frontend
> >>>to notify us, right?
> >>>
> >>>rx_queue_stopped is set to true if we cannot make any progress to queue
> >>>packet into the ring. In that situation we can expect frontend will send
> >>>notification to backend after it goes through the backlog in the ring.
> >>>That means rx_event is set to true, and rx_work_todo is true again. So
> >>>the ring is actually not stalled in this case as well. Did I miss
> >>>something?
> >>>
> >>
> >>Yes, we expect the guest to notify us, and it does, and we set
> >>rx_event to true (see second point), but then the thread set it to
> >>false (see third point). Talking with Paul, another solution could
> >>be to set rx_event false before calling xenvif_rx_action. But using
> >>rx_last_skb_slots makes it quicker for the thread to see if it
> >>doesn't have to do anything.
> >>
> >
> >OK, this is a better explaination. So actually there's no bug in the
> >original implementation and your patch is sort of an improvement.
> >
> >Could you send a new version of this patch with relevant information in
> >commit message? Talking to people offline is faster, but I would like to
> >have public discussion and relevant information archived in a searchable
> >form. Thanks.
> 
> No, there is a bug in the original implementation:
> - [THREAD] xenvif_rx_action sets rx_queue_stopped to true
> - [INTERRUPT] interrupt happens, and sets rx_event to true
> - [THREAD] then xenvif_kthread sets rx_event to false
> - [THREAD] rx_work_todo never returns true anymore
> 

I see what you mean. The interrupt is "lost", that's why it's stalled.

> I will update the explanation and send in the patch again.
> 

Thanks.

Wei.

> Zoli
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html