linux-kernel - Re: [Xen-devel] [PATCH 1/4] xen/events: Clear cpu_evtchn

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <55410CD6.2000708@oracle.com>
Date:	Wed, 29 Apr 2015 12:54:46 -0400
From:	Boris Ostrovsky <boris.ostrovsky@...cle.com>
To:	David Vrabel <david.vrabel@...rix.com>, konrad.wilk@...cle.com
CC:	xen-devel@...ts.xenproject.org, annie.li@...cle.com,
	linux-kernel@...r.kernel.org
Subject: Re: [Xen-devel] [PATCH 1/4] xen/events: Clear cpu_evtchn_mask before
 resuming

On 04/29/2015 12:32 PM, David Vrabel wrote:
> On 28/04/15 19:29, Boris Ostrovsky wrote:
>> On 04/28/2015 12:28 PM, David Vrabel wrote:
>>> On 28/04/15 16:52, Boris Ostrovsky wrote:
>>>> When a guest is resumed, the hypervisor may change event channel
>>>> assignments. If this happens and the guest uses 2-level events it
>>>> is possible for the interrupt to be claimed by wrong VCPU since
>>>> cpu_evtchn_mask bits may be stale. This can happen even though
>>>> evtchn_2l_bind_to_cpu() attempts to clear old bits: irq_info that
>>>> is passed in is not necessarily the original one (from pre-migration
>>>> times) but instead is freshly allocated during resume and so any
>>>> information about which CPU the channel was bound to is lost.
>>>>
>>>> Thus we should clear the mask during resume.
>>>>
>>>> We also need to make sure that bits for xenstore and console channels
>>>> are set when these two subsystems are resumed. While rebind_evtchn_irq()
>>>> (which is invoked for both of them on a resume) calls
>>>> irq_set_affinity(),
>>>> the latter will in fact postpone setting affinity until handling the
>>>> interrupt. But because cpu_evtchn_mask will have bits for these two
>>>> cleared
>>>> we won't be able to take the interrupt.
>>>>
>>>> Setting IRQ_MOVE_PCNTXT flag for the two irqs avoids this problem by
>>>> allowing to set affinity immediately, which is safe for
>>>> event-channel-based
>>>> interrupts.
>>> [...]
>>>> --- a/drivers/tty/hvc/hvc_xen.c
>>>> +++ b/drivers/tty/hvc/hvc_xen.c
>>>> @@ -533,6 +533,7 @@ static int __init xen_hvc_init(void)
>>>>              info = vtermno_to_xencons(HVC_COOKIE);
>>>>            info->irq = bind_evtchn_to_irq(info->evtchn);
>>>> +        irq_set_status_flags(info->irq, IRQ_MOVE_PCNTXT);
>>>>        }
>>>>        if (info->irq < 0)
>>>>            info->irq = 0; /* NO_IRQ */
>>>> diff --git a/drivers/xen/events/events_2l.c
>>>> b/drivers/xen/events/events_2l.c
>>>> index 5db43fc..7dd4631 100644
>>>> --- a/drivers/xen/events/events_2l.c
>>>> +++ b/drivers/xen/events/events_2l.c
>>>> @@ -345,6 +345,15 @@ irqreturn_t xen_debug_interrupt(int irq, void
>>>> *dev_id)
>>>>        return IRQ_HANDLED;
>>>>    }
>>>>    +static void evtchn_2l_resume(void)
>>>> +{
>>>> +    int i;
>>>> +
>>>> +    for_each_online_cpu(i)
>>>> +        memset(per_cpu(cpu_evtchn_mask, i), 0, sizeof(xen_ulong_t) *
>>>> +                EVTCHN_2L_NR_CHANNELS/BITS_PER_EVTCHN_WORD);
>>>> +}
>>>> +
>>>>    static const struct evtchn_ops evtchn_ops_2l = {
>>>>        .max_channels      = evtchn_2l_max_channels,
>>>>        .nr_channels       = evtchn_2l_max_channels,
>>>> @@ -356,6 +365,7 @@ static const struct evtchn_ops evtchn_ops_2l = {
>>>>        .mask              = evtchn_2l_mask,
>>>>        .unmask            = evtchn_2l_unmask,
>>>>        .handle_events     = evtchn_2l_handle_events,
>>>> +    .resume               = evtchn_2l_resume,
>>>>    };
>>>>      void __init xen_evtchn_2l_init(void)
>>>> diff --git a/drivers/xen/xenbus/xenbus_comms.c
>>>> b/drivers/xen/xenbus/xenbus_comms.c
>>>> index fdb0f33..30203d1 100644
>>>> --- a/drivers/xen/xenbus/xenbus_comms.c
>>>> +++ b/drivers/xen/xenbus/xenbus_comms.c
>>>> @@ -231,6 +231,7 @@ int xb_init_comms(void)
>>>>            }
>>>>              xenbus_irq = err;
>>>> +        irq_set_status_flags(xenbus_irq, IRQ_MOVE_PCNTXT);
>>> IRQ_MOVE_PCNTXT means "Interrupt can be migrated from process context"
>>> which doesn't really sound relevant to me here?
>>>
>>> Thomas Glexnier is really not happy with mis-use of IRQ APIs.
>> Yes, this is somewhat expanding definition of IRQ_MOVE_PCNTXT but then
>> there are other devices that do the same (HPET, for one).
> Ok.  But why set it on only these two event channels and not every one?

On resume in xen_irq_resume() we do

         for_each_possible_cpu(cpu) {
                 restore_cpu_virqs(cpu);
                 restore_cpu_ipis(cpu);
         }
         restore_pirqs();

And console and xenstore are neither of the those three types. 
rebind_irq_to_cpu() essentially plays the role of these restore_*() 
routines, except it does not make the hypercall to let the hypervisor 
assign notify_vcpu_id and it doesn't do explicit bind_evtchn_to_cpu().

Which makes me think that perhaps we should modify rebind_irq_to_cpu() 
to in fact do these two things. We may still need to keep 
irq_set_affinity() so that generic affinity data is properly set up but 
this will be deferred until the first interrupt.

>
>>>   From the commit log the evtchn_2l_resume() fucntion that's added sounds
>>> like it fixes the problem on its own?
>> It in fact makes this problem worse since now that cpu_evtchn_mask is
>> cleared during resume we cannot process the interrupt anymore in
>> evtchn_2l_handle_events(): irqs have to be bound to a cpu in order for
>> an interrupt to be processed.
> Perhaps evtchn_2l_resume() should set the local cpu mask for any bound
> event channels? And then you wouldn't need IRQ_MOVE_PCNTX.

But then (at least in 2-level case) more than one VCPUs may pick the 
same interrupt, won't they? Because the local cpu mask is what tells a 
VCPU that it is allowed to claim an interrupt.

-boris

>
>> We already have this issue even without clearing the mask if, say,
>> xenstore channel changes or if it gets bound to cpu other than 0 before
>> suspend. It's just that we rarely (if ever) see this happen.
>>
>> We can explicitly call events_base.c:set_affinity_irq() from
>> rebind_evtchn_irq() after irq_set_affinity() if we want to avoid using
>> IRQ_MOVE_PCNTXT flag but that will also looks kind of strange.
> David

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/