lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.02.1305212242540.4799@kaball.uk.xensource.com>
Date:	Tue, 21 May 2013 22:50:09 +0100
From:	Stefano Stabellini <stefano.stabellini@...citrix.com>
To:	Konrad Rzeszutek Wilk <konrad.wilk@...cle.com>
CC:	Stefano Stabellini <stefano.stabellini@...citrix.com>,
	David Vrabel <david.vrabel@...rix.com>,
	"xen-devel@...ts.xensource.com" <xen-devel@...ts.xensource.com>,
	Feng Jin <joe.jin@...cle.com>,
	Zhenzhong Duan <zhenzhong.duan@...cle.com>,
	Yuval Shaia <yuval.shaia@...cle.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	Chien Yen <chien.yen@...cle.com>,
	Ingo Molnar <mingo@...hat.com>,
	"H. Peter Anvin" <hpa@...or.com>,
	Thomas Gleixner <tglx@...utronix.de>
Subject: Re: [Xen-devel] [PATCH] xen: reuse the same pirq allocated when
 driver load first time

On Tue, 21 May 2013, Konrad Rzeszutek Wilk wrote:
> On Tue, May 21, 2013 at 05:51:02PM +0100, Stefano Stabellini wrote:
> > On Tue, 21 May 2013, Konrad Rzeszutek Wilk wrote:
> > > > Looking at the hypervisor code I couldn't see anything obviously wrong.
> > > 
> > > I think the culprit is "physdev_unmap_pirq":
> > > 
> > >    if ( is_hvm_domain(d) )                                                     
> > >     {                                                                           
> > >         spin_lock(&d->event_lock);                                              
> > >         gdprintk(XENLOG_WARNING,"d%d, pirq: %d is %x %s, irq: %d\n",            
> > >             d->domain_id, pirq, domain_pirq_to_emuirq(d, pirq),                 
> > >             domain_pirq_to_emuirq(d, pirq) == IRQ_UNBOUND ? "unbound" : "",        
> > >             domain_pirq_to_irq(d, pirq));                                       
> > >                                                                                 
> > >         if ( domain_pirq_to_emuirq(d, pirq) != IRQ_UNBOUND )                    
> > >             ret = unmap_domain_pirq_emuirq(d, pirq);                            
> > >         spin_unlock(&d->event_lock);                                            
> > >         if ( domid == DOMID_SELF || ret )                                       
> > >             goto free_domain;                                             
> > > 
> > > It always tells me unbound:
> > > 
> > > (XEN) physdev.c:237:d14 14, pirq: 54 is ffffffff
> > > (XEN) irq.c:1873:d14 14, nr_pirqs: 56
> > > (XEN) physdev.c:237:d14 14, pirq: 53 is ffffffff
> > > (XEN) irq.c:1873:d14 14, nr_pirqs: 56
> > > (XEN) physdev.c:237:d14 14, pirq: 52 is ffffffff
> > > (XEN) irq.c:1873:d14 14, nr_pirqs: 56
> > > (XEN) physdev.c:237:d14 14, pirq: 51 is ffffffff
> > > (XEN) irq.c:1873:d14 14, nr_pirqs: 56
> > > (XEN) physdev.c:237:d14 14, pirq: 50 is ffffffff
> > > (XEN) irq.c:1873:d14 14, nr_pirqs: 56
> > > (a bit older debug code, so the 'unbound' does not show up here).
> > > 
> > > Which means that the call to unmap_domain_pirq_emuirq does not happen.
> > > The checks in unmap_domain_pirq_emuirq also look to be depend
> > > on the code being IRQ_UNBOUND.
> > > 
> > > In other words, all of that code looks to only clear things when
> > > they are !IRQ_UNBOUND.
> > > 
> > > But the other logic (IRQ_UNBOUND) looks to be missing a removal
> > > in the radix tree:
> > > 
> > >   if ( emuirq != IRQ_PT )                                                     
> > >         radix_tree_delete(&d->arch.hvm_domain.emuirq_pirq, emuirq);             
> > >                                                                         
> > > And I think that is what is causing the leak - the radix tree
> > > needs to be pruned? Or perhaps the allocate_pirq should check
> > > the radix tree for IRQ_UNBOUND ones and re-use them?
> > 
> > I think that you are looking in the wrong place.
> > The issue is that QEMU doesn't call pt_msi_disable in
> > pt_msgctrl_reg_write if (!val & PCI_MSI_FLAGS_ENABLE).
> 
> In my test-case I am not even calling QEMU. I am just doing two hypercalls 
> hypercall - get_free_pirq and unmap.
> > 
> > The code above is correct as is because it is trying to handle emulated
> > IRQs and MSIs, not real passthrough MSIs. They latter are not added to
> > that radix tree, see physdev_hvm_map_pirq and physdev_map_pirq.
> 
> The bug is in the hypervisor. This little patch solves the test-case
> (I hadn't tried to do the PCI passthrough yet)
> 
> 
> diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c
> index b0b0c65..b78717a 100644
> --- a/xen/arch/x86/irq.c
> +++ b/xen/arch/x86/irq.c
> @@ -1851,8 +1851,8 @@ static int pirq_guest_force_unbind(struct domain *d, struct pirq *pirq)
>  static inline bool_t is_free_pirq(const struct domain *d,
>                                    const struct pirq *pirq)
>  {
> -    return !pirq || (!pirq->arch.irq && (!is_hvm_domain(d) ||
> -        pirq->arch.hvm.emuirq == IRQ_UNBOUND));
> +    return !pirq || ((pirq->arch.irq == 0 || (pirq->arch.irq == PIRQ_ALLOCATED)) &&
> +           (!is_hvm_domain(d) || pirq->arch.hvm.emuirq == IRQ_UNBOUND));
>  }
>  
>  int get_free_pirq(struct domain *d, int type)
> 
> 
> The reason is that pirq->arch.irq in PHYSDEVOP_get_free_pirq is set to
> from the value of zero to -1 (PIRQ_ALLOCATED). Then in map_domain_pirq
> we check it first:
> 
> 904     old_irq = domain_pirq_to_irq(d, pirq);                  
> .. snip..                    
> 1907     if ( (old_irq > 0 && (old_irq != irq) ) ||                                  
> 
> and since the 'old_irq' is -1 (or zero), and the irq passed in
> is different, then all checks pass and the value is over-written:
> 
>  1988         set_domain_irq_pirq(d, irq, info);                                      
> 
> And that is it.


We have to be careful about this: the point of PHYSDEVOP_get_free_pirq is
that Linux can know for sure the pirq that is going to be used to map the
MSI by QEMU. If you modify is_free_pirq that way, suddenly the pirq
could be allocated for something else after Linux called
PHYSDEVOP_get_free_pirq and before QEMU called xc_physdev_map_pirq_msi.

Right now the unmap is supposed to be done by QEMU, not Linux. So I
think that it is "normal" (although counterintuitive) that your little
test works that way.

pirq allocated via PHYSDEVOP_get_free_pirq should be passed to QEMU,
mapped by QEMU, unmapped by QEMU and eventually freed by QEMU.

This is not the bestest interface ever written of course but that's how
it works now.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ