linux-kernel - Re: [Xen-devel] [PATCH] xen: privcmd: schedule() after private hypercall when non CONFIG

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 1 Dec 2014 23:36:11 +0100
From:	"Luis R. Rodriguez" <mcgrof@...e.com>
To:	David Vrabel <david.vrabel@...rix.com>
Cc:	Juergen Gross <jgross@...e.com>, Joerg Roedel <jroedel@...e.de>,
	kvm@...r.kernel.org, Peter Zijlstra <peterz@...radead.org>,
	x86@...nel.org, Oleg Nesterov <oleg@...hat.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	Davidlohr Bueso <dbueso@...e.de>,
	Jan Beulich <JBeulich@...e.com>,
	xen-devel@...ts.xenproject.org, boris.ostrovsky@...cle.com,
	Borislav Petkov <bp@...e.de>, Olaf Hering <ohering@...e.de>,
	Ingo Molnar <mingo@...nel.org>
Subject: Re: [Xen-devel] [PATCH] xen: privcmd: schedule() after private
	hypercall when non CONFIG_PREEMPT

On Mon, Dec 01, 2014 at 06:16:28PM +0000, David Vrabel wrote:
> On 01/12/14 16:19, Luis R. Rodriguez wrote:
> > On Mon, Dec 01, 2014 at 03:54:24PM +0000, David Vrabel wrote:
> >> On 01/12/14 15:44, Luis R. Rodriguez wrote:
> >>> On Mon, Dec 1, 2014 at 10:18 AM, David Vrabel <david.vrabel@...rix.com> wrote:
> >>>> On 01/12/14 15:05, Luis R. Rodriguez wrote:
> >>>>> On Mon, Dec 01, 2014 at 11:11:43AM +0000, David Vrabel wrote:
> >>>>>> On 27/11/14 18:36, Luis R. Rodriguez wrote:
> >>>>>>> On Thu, Nov 27, 2014 at 07:36:31AM +0100, Juergen Gross wrote:
> >>>>>>>> On 11/26/2014 11:26 PM, Luis R. Rodriguez wrote:
> >>>>>>>>> From: "Luis R. Rodriguez" <mcgrof@...e.com>
> >>>>>>>>>
> >>>>>>>>> Some folks had reported that some xen hypercalls take a long time
> >>>>>>>>> to complete when issued from the userspace private ioctl mechanism,
> >>>>>>>>> this can happen for instance with some hypercalls that have many
> >>>>>>>>> sub-operations, this can happen for instance on hypercalls that use
> >>>>>> [...]
> >>>>>>>>> --- a/drivers/xen/privcmd.c
> >>>>>>>>> +++ b/drivers/xen/privcmd.c
> >>>>>>>>> @@ -60,6 +60,9 @@ static long privcmd_ioctl_hypercall(void __user *udata)
> >>>>>>>>>                              hypercall.arg[0], hypercall.arg[1],
> >>>>>>>>>                              hypercall.arg[2], hypercall.arg[3],
> >>>>>>>>>                              hypercall.arg[4]);
> >>>>>>>>> +#ifndef CONFIG_PREEMPT
> >>>>>>>>> + schedule();
> >>>>>>>>> +#endif
> >>>>>>
> >>>>>> As Juergen points out, this does nothing.  You need to schedule while in
> >>>>>> the middle of the hypercall.
> >>>>>>
> >>>>>> Remember that Xen's hypercall preemption only preempts the hypercall to
> >>>>>> run interrupts in the guest.
> >>>>>
> >>>>> How is it ensured that when the kernel preempts on this code path on
> >>>>> CONFIG_PREEMPT=n kernel that only interrupts in the guest are run?
> >>>>
> >>>> Sorry, I really didn't describe this very well.
> >>>>
> >>>> If a hypercall needs a continuation, Xen returns to the guest with the
> >>>> IP set to the hypercall instruction, and on the way back to the guest
> >>>> Xen may schedule a different VCPU or it will do any upcalls (as per normal).
> >>>>
> >>>> The guest is free to return from the upcall to the original task
> >>>> (continuing the hypercall) or to a different one.
> >>>
> >>> OK so that addresses what Xen will do when using continuation and
> >>> hypercall preemption, my concern here was that using
> >>> preempt_schedule_irq() on CONFIG_PREEMPT=n kernels in the middle of a
> >>> hypercall on the return from an interrupt (e.g., the timer interrupt)
> >>> would still let the kernel preempt to tasks other than those related
> >>> to Xen.
> >>
> >> Um.  Why would that be a problem?  We do want to switch to any task the
> >> Linux scheduler thinks is best.
> > 
> > Its safe but -- it technically is doing kernel preemption, unless we want
> > to adjust the definition of CONFIG_PREEMPT=n to exclude hypercalls. This
> > was my original concern with the use of preempt_schedule_irq() to do this.
> > I am afraid of setting precedents without being clear or wider review and
> > acceptance.
> 
> It's voluntary preemption at a well defined point. 

Its voluntarily preempting the kernel even for CONFIG_PREEMPT=n kernels... 

> It's no different to a cond_resched() call.

Then I do agree its a fair analogy (and find this obviously odd that how
widespread cond_resched() is), we just don't have an equivalent for IRQ
context, why not avoid the special check then and use this all the time in the
middle of a hypercall on the return from an interrupt (e.g., the timer
interrupt)?

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 5e344bb..e60b5a1 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2759,6 +2759,12 @@ static inline int signal_pending_state(long state, struct task_struct *p)
  */
 extern int _cond_resched(void);
 
+/*
+ * Voluntarily preempting the kernel even for CONFIG_PREEMPT=n kernels
+ * on very special circumstances.
+ */
+extern int cond_resched_irq(void);
+
 #define cond_resched() ({			\
 	__might_sleep(__FILE__, __LINE__, 0);	\
 	_cond_resched();			\
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 240157c..1c4d443 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4264,6 +4264,16 @@ int __sched _cond_resched(void)
 }
 EXPORT_SYMBOL(_cond_resched);
 
+int __sched cond_resched_irq(void)
+{
+	if (should_resched()) {
+		preempt_schedule_irq();
+		return 1;
+	}
+	return 0;
+}
+EXPORT_SYMBOL_GPL(cond_resched_irq);
+
 /*
  * __cond_resched_lock() - if a reschedule is pending, drop the given lock,
  * call schedule, and on return reacquire the lock.
  Luis
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/