linux-kernel - Re: [PATCH] xen: privcmd: schedule() after private hypercall when non CONFIG

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20141201175224.GJ25677@wotan.suse.de>
Date:	Mon, 1 Dec 2014 18:52:24 +0100
From:	"Luis R. Rodriguez" <mcgrof@...e.com>
To:	Juergen Gross <jgross@...e.com>
Cc:	David Vrabel <david.vrabel@...rix.com>,
	Peter Zijlstra <peterz@...radead.org>,
	Ingo Molnar <mingo@...nel.org>,
	Oleg Nesterov <oleg@...hat.com>,
	Konrad Rzeszutek Wilk <konrad.wilk@...cle.com>,
	boris.ostrovsky@...cle.com, xen-devel@...ts.xenproject.org,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	x86@...nel.org, kvm@...r.kernel.org,
	Davidlohr Bueso <dbueso@...e.de>,
	Joerg Roedel <jroedel@...e.de>, Borislav Petkov <bp@...e.de>,
	Jan Beulich <JBeulich@...e.com>, Olaf Hering <ohering@...e.de>
Subject: Re: [PATCH] xen: privcmd: schedule() after private hypercall when
	non CONFIG_PREEMPT

On Mon, Dec 01, 2014 at 06:07:48PM +0100, Juergen Gross wrote:
> On 12/01/2014 05:19 PM, Luis R. Rodriguez wrote:
>> On Mon, Dec 01, 2014 at 03:54:24PM +0000, David Vrabel wrote:
>>> On 01/12/14 15:44, Luis R. Rodriguez wrote:
>>>> On Mon, Dec 1, 2014 at 10:18 AM, David Vrabel <david.vrabel@...rix.com> wrote:
>>>>> On 01/12/14 15:05, Luis R. Rodriguez wrote:
>>>>>> On Mon, Dec 01, 2014 at 11:11:43AM +0000, David Vrabel wrote:
>>>>>>> On 27/11/14 18:36, Luis R. Rodriguez wrote:
>>>>>>>> On Thu, Nov 27, 2014 at 07:36:31AM +0100, Juergen Gross wrote:
>>>>>>>>> On 11/26/2014 11:26 PM, Luis R. Rodriguez wrote:
>>>>>>>>>> From: "Luis R. Rodriguez" <mcgrof@...e.com>
>>>>>>>>>>
>>>>>>>>>> Some folks had reported that some xen hypercalls take a long time
>>>>>>>>>> to complete when issued from the userspace private ioctl mechanism,
>>>>>>>>>> this can happen for instance with some hypercalls that have many
>>>>>>>>>> sub-operations, this can happen for instance on hypercalls that use
>>>>>>> [...]
>>>>>>>>>> --- a/drivers/xen/privcmd.c
>>>>>>>>>> +++ b/drivers/xen/privcmd.c
>>>>>>>>>> @@ -60,6 +60,9 @@ static long privcmd_ioctl_hypercall(void __user *udata)
>>>>>>>>>>                               hypercall.arg[0], hypercall.arg[1],
>>>>>>>>>>                               hypercall.arg[2], hypercall.arg[3],
>>>>>>>>>>                               hypercall.arg[4]);
>>>>>>>>>> +#ifndef CONFIG_PREEMPT
>>>>>>>>>> + schedule();
>>>>>>>>>> +#endif
>>>>>>>
>>>>>>> As Juergen points out, this does nothing.  You need to schedule while in
>>>>>>> the middle of the hypercall.
>>>>>>>
>>>>>>> Remember that Xen's hypercall preemption only preempts the hypercall to
>>>>>>> run interrupts in the guest.
>>>>>>
>>>>>> How is it ensured that when the kernel preempts on this code path on
>>>>>> CONFIG_PREEMPT=n kernel that only interrupts in the guest are run?
>>>>>
>>>>> Sorry, I really didn't describe this very well.
>>>>>
>>>>> If a hypercall needs a continuation, Xen returns to the guest with the
>>>>> IP set to the hypercall instruction, and on the way back to the guest
>>>>> Xen may schedule a different VCPU or it will do any upcalls (as per normal).
>>>>>
>>>>> The guest is free to return from the upcall to the original task
>>>>> (continuing the hypercall) or to a different one.
>>>>
>>>> OK so that addresses what Xen will do when using continuation and
>>>> hypercall preemption, my concern here was that using
>>>> preempt_schedule_irq() on CONFIG_PREEMPT=n kernels in the middle of a
>>>> hypercall on the return from an interrupt (e.g., the timer interrupt)
>>>> would still let the kernel preempt to tasks other than those related
>>>> to Xen.
>>>
>>> Um.  Why would that be a problem?  We do want to switch to any task the
>>> Linux scheduler thinks is best.
>>
>> Its safe but -- it technically is doing kernel preemption, unless we want
>> to adjust the definition of CONFIG_PREEMPT=n to exclude hypercalls. This
>> was my original concern with the use of preempt_schedule_irq() to do this.
>> I am afraid of setting precedents without being clear or wider review and
>> acceptance.
>
> I wonder whether it would be more acceptable to add (or completely
> switch to) another preemption model: PREEMPT_SWITCHABLE. This would be
> similar to CONFIG_PREEMPT, but the "normal" value of __preempt_count
> would be settable via kernel parameter (default 2):
>
> 0: preempt
> 1: preempt_voluntary
> 2: preempt_none
>
> The kernel would run with preemption enabled. cond_sched() would
> reschedule if __preempt_count <= 1. And in case of long running kernel
> activities (like the hypercall case or other stuff requiring schedule()
> calls to avoid hangups) we would just set __preempt_count to 0 during
> these periods and restore the old value afterwards.
>
> This would be a rather intrusive but clean change IMO.
>
> Any thoughts?

I like the idea of dynamically changing at run time the preemption model and
personally find this reasonable, however I am not certain if this would
introduce a series of issues hard to address. Thoughts by others who linger
deep in the cold lonely scheduler caves ?

  Luis
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/