[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <511681386.5786.1432235282365.JavaMail.zimbra@efficios.com>
Date: Thu, 21 May 2015 19:08:02 +0000 (UTC)
From: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: Paul Turner <pjt@...gle.com>, Andrew Hunter <ahh@...gle.com>,
Ben Maurer <bmaurer@...com>, linux-kernel@...r.kernel.org,
Ingo Molnar <mingo@...hat.com>,
Steven Rostedt <rostedt@...dmis.org>,
"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
Josh Triplett <josh@...htriplett.org>,
Lai Jiangshan <laijs@...fujitsu.com>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [RFC PATCH] percpu system call: fast userspace percpu critical
sections
----- Original Message -----
> On Thu, May 21, 2015 at 10:44:47AM -0400, Mathieu Desnoyers wrote:
>
> > +struct thread_percpu_user {
> > + int32_t nesting;
> > + int32_t signal_sent;
> > + int32_t signo;
> > + int32_t current_cpu;
> > +};
>
> I would require this thing be naturally aligned, such that it does not
> cross cacheline boundaries.
Good point. Adding a comment into the code to that effect.
>
> > +
> > +static void percpu_user_sched_in(struct preempt_notifier *notifier, int
> > cpu)
> > +{
> > + struct thread_percpu_user __user *tpu_user;
> > + struct thread_percpu_user tpu;
> > + struct task_struct *t = current;
> > +
> > + tpu_user = t->percpu_user;
> > + if (tpu_user == NULL)
> > + return;
> > + if (unlikely(t->flags & PF_EXITING))
> > + return;
> > + /*
> > + * access_ok() of tpu_user has already been checked by sys_percpu().
> > + */
> > + if (__put_user(smp_processor_id(), &tpu_user->current_cpu)) {
> > + WARN_ON_ONCE(1);
> > + return;
> > + }
>
> This seems a waste; you already read the number unconditionally, might
> as well double check and avoid the store.
>
> > + if (__copy_from_user(&tpu, tpu_user, sizeof(tpu))) {
> > + WARN_ON_ONCE(1);
> > + return;
> > + }
>
> if (tpu.current_cpu != smp_processor_id())
> __put_user();
Yep, and I could even use the "cpu" parameter received by the
function rather than smp_processor_id().
>
>
>
> > + if (!tpu.nesting || tpu.signal_sent)
> > + return;
> > + if (do_send_sig_info(tpu.signo, SEND_SIG_PRIV, t, 0)) {
> > + WARN_ON_ONCE(1);
> > + return;
> > + }
> > + tpu.signal_sent = 1;
> > + if (__copy_to_user(tpu_user, &tpu, sizeof(tpu))) {
> > + WARN_ON_ONCE(1);
> > + return;
> > + }
> > +}
>
> Please do not use preempt notifiers for this.
Do you recommend we issue a function call from the scheduler
finish_task_switch() ?
>
> Second, this all is done with preemption disabled, this means that all
> that user access can fail.
OK, this is one part I was worried about.
>
> You print useless WARNs and misbehave. If you detect a desire to fault,
> you could delay until return to userspace and try again there. But it
> all adds complexity.
We could keep a flag, and then call the function again if we detect a
desire to fault.
>
> The big advantage pjt's scheme had is that we have the instruction
> pointer, we do not need to go read userspace memory that might not be
> there. And it being limited to a single range, while inconvenient,
> simplifies the entire kernel side to:
>
> if ((unsigned long)(ip - offset) < size)
> do_magic();
>
> Which is still simpler than the above.
There is one big aspect of pjt's approach that I still don't grasp
after all this time that makes me worry. How does it interact with
the following scenario ?
Userspace thread
- within the code region that needs to be restarted
- signal handler nested on top
- running within the signal handler code
- preempted by kernel
- checking instruction pointer misses the userspace stack
underneath the signal handler.
Given this scenario, is the kernel code really as simple as a pointer check
on pt_regs, or do we need a stack walk over all signal frames ? Another way
would be to check for the pt_regs instruction pointer whenever we receive
a signal, but then it would require per-architectures modifications, and
suddenly becomes less straightforward.
Thanks,
Mathieu
--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists