linux-kernel - Re: [patch] xenfb: fix xenfb suspend/resume race

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 6 Jan 2011 08:02:04 +0000
From:	Ian Campbell <Ian.Campbell@...citrix.com>
To:	Joe Jin <joe.jin@...cle.com>
CC:	Konrad Rzeszutek Wilk <konrad.wilk@...cle.com>,
	"jeremy@...p.org" <jeremy@...p.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	"linux-fbdev@...r.kernel.org" <linux-fbdev@...r.kernel.org>,
	"xen-devel@...ts.xensource.com" <xen-devel@...ts.xensource.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"gurudas.pai@...cle.com" <gurudas.pai@...cle.com>,
	"greg.marsden@...cle.com" <greg.marsden@...cle.com>,
	"guru.anbalagane@...cle.com" <guru.anbalagane@...cle.com>
Subject: Re: [patch] xenfb: fix xenfb suspend/resume race

On Thu, 2011-01-06 at 07:14 +0000, Joe Jin wrote: 
> On 01/04/11 19:15, Ian Campbell wrote:
> > On Thu, 2010-12-30 at 16:40 +0000, Konrad Rzeszutek Wilk wrote:
> >> On Thu, Dec 30, 2010 at 08:56:16PM +0800, Joe Jin wrote:
> >>> Hi,
> >>
> >> Joe,
> >>
> >> Patch looks good, however..
> >>
> >> I am unclear from your description whether the patch fixes
> >> the problem (I would presume so). Or does it take a long time
> >> to hit this race?
> > 
> > I also don't see how the patch relates to the stack trace.
> > 
> > Is the issue is that xenfb_send_event is called between xenfb_resume
> > (which tears down the state, including evtchn->irq binding) and the
> > probe/connect of the new fb?
> 
> Yes, when hit this issue, with debugging kernel found irq is invalid(-1).

But why is it -1? I really don't think you have identified the root
cause here. If you really have identified the root cause then your
changelog needs to go into much greater depth regarding your analysis.

> Check if irq is valid will fix this issue.

No, it papers over the issue, the code should never have been allowed to
get this far if the connection to the backend is not yet fully resumed
(i.e. when irq == -1).

The call to xenfb_send_event should have been gated further up the call
chain, AFAICT by the check of info->update_wanted in xenfb_refresh. This
suggests that the correct fix is to set info->update_wanted = 0 in
xenfb_resume.

I said all this in my previous mail and you ignored it. Did you try this
approach?

> And, when failed to connect to backend, need to release the resource.

So the changes to xenfb_connect_backend are independent of the irq == -1
issue? In which case this part, which seems like a reasonable and valid
fix, should be split into a separate patch.

> Please review new patch for this issue.

Nacked-by: Ian Campbell <ian.campbell@...rix.com>

Ian.

> Thanks,
> Joe
> 
> 
> Signed-off-by: Joe Jin <joe.jin@...cle.com>
> Cc: Konrad Rzeszutek Wilk <konrad.wilk@...cle.com>
> Cc: Ian Campbell <ian.campbell@...rix.com>
> Cc: Jeremy Fitzhardinge <jeremy@...p.org>
> Cc: Andrew Morton <akpm@...ux-foundation.org>
> 
> ---
>  video/xen-fbfront.c |   19 +++++++++++--------
>  xen/events.c        |    4 ++++
>  2 files changed, 15 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/video/xen-fbfront.c b/drivers/video/xen-fbfront.c
> index dc72563..367fb1c 100644
> --- a/drivers/video/xen-fbfront.c
> +++ b/drivers/video/xen-fbfront.c
> @@ -561,26 +561,24 @@ static void xenfb_init_shared_page(struct xenfb_info *info,
>  static int xenfb_connect_backend(struct xenbus_device *dev,
>  				 struct xenfb_info *info)
>  {
> -	int ret, evtchn;
> +	int ret, evtchn, irq;
>  	struct xenbus_transaction xbt;
>  
>  	ret = xenbus_alloc_evtchn(dev, &evtchn);
>  	if (ret)
>  		return ret;
> -	ret = bind_evtchn_to_irqhandler(evtchn, xenfb_event_handler,
> +	irq = bind_evtchn_to_irqhandler(evtchn, xenfb_event_handler,
>  					0, dev->devicetype, info);
> -	if (ret < 0) {
> +	if (irq < 0) {
>  		xenbus_free_evtchn(dev, evtchn);
>  		xenbus_dev_fatal(dev, ret, "bind_evtchn_to_irqhandler");
> -		return ret;
> +		return irq;
>  	}
> -	info->irq = ret;
> -
>   again:
>  	ret = xenbus_transaction_start(&xbt);
>  	if (ret) {
>  		xenbus_dev_fatal(dev, ret, "starting transaction");
> -		return ret;
> +		goto unbind_irq;
>  	}
>  	ret = xenbus_printf(xbt, dev->nodename, "page-ref", "%lu",
>  			    virt_to_mfn(info->page));
> @@ -602,15 +600,20 @@ static int xenfb_connect_backend(struct xenbus_device *dev,
>  		if (ret == -EAGAIN)
>  			goto again;
>  		xenbus_dev_fatal(dev, ret, "completing transaction");
> -		return ret;
> +		goto unbind_irq;
>  	}
>  
>  	xenbus_switch_state(dev, XenbusStateInitialised);
> +	info->irq = irq;
>  	return 0;
>  
>   error_xenbus:
>  	xenbus_transaction_end(xbt, 1);
>  	xenbus_dev_fatal(dev, ret, "writing xenstore");
> + unbind_irq:
> +	printk(KERN_ERR "xenfb_connect_backend failed!\n");
> +	unbind_from_irqhandler(irq, info);
> +	xenbus_free_evtchn(dev, evtchn);
>  	return ret;
>  }
>  
> diff --git a/drivers/xen/events.c b/drivers/xen/events.c
> index ac7b42f..4028704 100644
> --- a/drivers/xen/events.c
> +++ b/drivers/xen/events.c
> @@ -175,6 +175,10 @@ static struct irq_info *info_for_irq(unsigned irq)
>  
>  static unsigned int evtchn_from_irq(unsigned irq)
>  {
> +	if (unlikely(irq < 0 || irq >= nr_irqs)) {
> +		WARN_ON(1, "[%s]: Invalid irq(%d)!\n", __func__, irq);
> +		return 0;
> +	}
>  	return info_for_irq(irq)->evtchn;
>  }
>  


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/