linux-kernel - Re: [PATCH v5 3/4] kvm: Create kvm_clear

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20120717223150.GL1868@redhat.com>
Date:	Wed, 18 Jul 2012 01:31:50 +0300
From:	"Michael S. Tsirkin" <mst@...hat.com>
To:	Alex Williamson <alex.williamson@...hat.com>
Cc:	avi@...hat.com, gleb@...hat.com, kvm@...r.kernel.org,
	linux-kernel@...r.kernel.org, jan.kiszka@...mens.com
Subject: Re: [PATCH v5 3/4] kvm: Create kvm_clear_irq()

On Tue, Jul 17, 2012 at 04:22:10PM -0600, Alex Williamson wrote:
> On Wed, 2012-07-18 at 01:05 +0300, Michael S. Tsirkin wrote:
> > On Tue, Jul 17, 2012 at 04:01:40PM -0600, Alex Williamson wrote:
> > > On Wed, 2012-07-18 at 00:05 +0300, Michael S. Tsirkin wrote:
> > > > On Tue, Jul 17, 2012 at 01:51:27PM -0600, Alex Williamson wrote:
> > > > > On Tue, 2012-07-17 at 21:55 +0300, Michael S. Tsirkin wrote:
> > > > > > On Tue, Jul 17, 2012 at 10:45:52AM -0600, Alex Williamson wrote:
> > > > > > > On Tue, 2012-07-17 at 19:21 +0300, Michael S. Tsirkin wrote:
> > > > > > > > On Tue, Jul 17, 2012 at 10:17:03AM -0600, Alex Williamson wrote:
> > > > > > > > > > > > > > > >   And current code looks buggy if yes we need to fix it somehow.
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > Which to me seems to indicate this should be handled as a separate
> > > > > > > > > > > > > > > effort.
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > A separate patchset, sure. But likely a prerequisite: we still need to
> > > > > > > > > > > > > > look at all the code. Let's not copy bugs, need to fix them.
> > > > > > > > > > > > > 
> > > > > > > > > > > > > This looks tangential to me unless you can come up with an actual reason
> > > > > > > > > > > > > the above spinlock usage is incorrect or insufficient.
> > > > > > > > > > > > 
> > > > > > > > > > > > You copy the same pattern that seems racy. So you double the
> > > > > > > > > > > > amount of code that woul need to be fixed.
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > _Seems_ racy, or _is_ racy?  Please identify the race.
> > > > > > > > > > 
> > > > > > > > > > Look at this:
> > > > > > > > > > 
> > > > > > > > > > static inline int kvm_irq_line_state(unsigned long *irq_state,
> > > > > > > > > >                                      int irq_source_id, int level)
> > > > > > > > > > {
> > > > > > > > > >         /* Logical OR for level trig interrupt */
> > > > > > > > > >         if (level)
> > > > > > > > > >                 set_bit(irq_source_id, irq_state);
> > > > > > > > > >         else
> > > > > > > > > >                 clear_bit(irq_source_id, irq_state);
> > > > > > > > > > 
> > > > > > > > > >         return !!(*irq_state);
> > > > > > > > > > }
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > Now:
> > > > > > > > > > If other CPU changes some other bit after the atomic change,
> > > > > > > > > > it looks like !!(*irq_state) might return a stale value.
> > > > > > > > > > 
> > > > > > > > > > CPU 0 clears bit 0. CPU 1 sets bit 1. CPU 1 sets level to 1.
> > > > > > > > > > If CPU 0 sees a stale value now it will return 0 here
> > > > > > > > > > and interrupt will get cleared.
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > Maybe this is not a problem. But in that case IMO it needs
> > > > > > > > > > a comment explaining why and why it's not a problem in
> > > > > > > > > > your code.
> > > > > > > > > 
> > > > > > > > > So you want to close the door on anything that uses kvm_set_irq until
> > > > > > > > > this gets fixed... that's insane.
> > > > > > > > 
> > > > > > > > What does kvm_set_irq use have to do with it?  You posted this patch:
> > > > > > > > 
> > > > > > > > +static int kvm_clear_pic_irq(struct kvm_kernel_irq_routing_entry *e,
> > > > > > > > +                            struct kvm *kvm, int irq_source_id)
> > > > > > > > +{
> > > > > > > > +#ifdef CONFIG_X86
> > > > > > > > +       struct kvm_pic *pic = pic_irqchip(kvm);
> > > > > > > > +       int level =
> > > > > > > > kvm_clear_irq_line_state(&pic->irq_states[e->irqchip.pin],
> > > > > > > > +                                            irq_source_id);
> > > > > > > > +       if (level)
> > > > > > > > +               kvm_pic_set_irq(pic, e->irqchip.pin,
> > > > > > > > +                               !!pic->irq_states[e->irqchip.pin]);
> > > > > > > > +       return level;
> > > > > > > > +#else
> > > > > > > > +       return -1;
> > > > > > > > +#endif
> > > > > > > > +}
> > > > > > > > +
> > > > > > > > 
> > > > > > > > it seems racy in the same way.
> > > > > > > 
> > > > > > > Now you're just misrepresenting how we got here, which was:
> > > > > > > 
> > > > > > > > > > > > > IMHO, we're going off into the weeds again with these last
> > > > > > > > > > > > > two patches.  It may be a valid optimization, but it really has no
> > > > > > > > > > > > > bearing on the meat of the series (and afaict, no significant
> > > > > > > > > > > > > performance difference either).
> > > > > > > > > > > > 
> > > > > > > > > > > > For me it's not a performance thing. IMO code is cleaner without this locking:
> > > > > > > > > > > > we add a lock but only use it in some cases, so the rules become really
> > > > > > > > > > > > complex.
> > > > > > > 
> > > > > > > So I'm happy to drop the last 2 patches, which were done at your request
> > > > > > > anyway, but you've failed to show how the locking in patches 1&2 is
> > > > > > > messy, inconsistent, or complex and now you're asking to block all
> > > > > > > progress.
> > > > > > 
> > > > > > I'm asking for bugs to get fixed and not duplicated. Adding more bugs is
> > > > > > not progress. Or maybe there is no bug. Let's see why and add a comment.
> > > > > > 
> > > > > > >  Those patches are just users of kvm_set_irq.
> > > > > > 
> > > > > > 
> > > > > > Well these add calls to kvm_set_irq which scans all vcpus under
> > > > > > spinlock. In the past Avi thought this is not a good idea too.
> > > > > > Maybe things changed.
> > > > > 
> > > > > We can drop the spinlock if we don't care about spurious EOIs, which is
> > > > > only a theoretical scalability problem anyway.
> > > > 
> > > > Not theoretical at all IMO. We see the problem with virtio with old
> > > > guests today.
> > > 
> > > And how are you injecting level interrupts with virtio today w/o this
> > > interface?
> > 
> > Not well at all. Bad performance with interrupt sharing.
> > 
> > > > > We're talking about
> > > > > level interrupts here, how scalable do we need to be?
> > > > > 
> > > > 
> > > > The reason we are moving them into kernel at all is for speed, no?
> > > 
> > > Come on, if we take that approach why aren't we writing all of this in
> > > assembly for speed?!  All I'm suggesting is there's a limit to return on
> > > investment at some point.  Maybe it's here.
> > 
> > Well I am just warning about known problems: don't wake up (typically)
> > many eoifds on a single interrupt or you will have scalability
> > problems that we already see with emulated devices.
> 
> Well, that's why we don't want to bounce back to userspace.  All we need
> to do in VFIO for each callback is note that the interrupt wasn't
> previously masked and do nothing.  So we're talking about acquiring a
> spinlock and a few data references.  Sure, I'd like to avoid doing that,
> but not if it means blocking this patch series until some unknown number
> of patches fixes all the tangential problems you find.

I'd like Avi's take on whether kvm_set_irq under a spinlock here
is OK. He nacked this in the past.

Meanwhile fixing the tangential problems is time well spent too.
If there are races in kvm irq handling it seems more important
to fix them than add an optimization of level interrupts.

-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/