[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20141126110504.511b733a@thinkpad-w530>
Date: Wed, 26 Nov 2014 11:05:04 +0100
From: David Hildenbrand <dahi@...ux.vnet.ibm.com>
To: "Michael S. Tsirkin" <mst@...hat.com>
Cc: linuxppc-dev@...ts.ozlabs.org, linux-arch@...r.kernel.org,
linux-kernel@...r.kernel.org, benh@...nel.crashing.org,
paulus@...ba.org, akpm@...ux-foundation.org,
heiko.carstens@...ibm.com, schwidefsky@...ibm.com,
borntraeger@...ibm.com, mingo@...nel.org
Subject: Re: [RFC 0/2] Reenable might_sleep() checks for might_fault() when
atomic
Hi Michael,
thanks for your reply! some general thought:
This change was introduced mid 2013 but we don't have a single user relying
on this code change yet, right?
Disabling might_sleep() for functions that clearly state that they may sleep to
get some special case running feels wrong to me.
> On Tue, Nov 25, 2014 at 12:43:24PM +0100, David Hildenbrand wrote:
> > I recently discovered that commit 662bbcb2747c2422cf98d3d97619509379eee466
> > removed/skipped all might_sleep checks for might_fault() calls when in atomic.
>
> Yes. You can add e.g. might_sleep in your code if, for some reason, it is
> illegal to call it in an atomic context.
> But faults are legal in atomic if you handle the possible
> errors, and faults do not necessary cause caller to sleep, so might_fault
> should not call might_sleep.
My point is that and (almost at) everywhere where we use pagefault_disable, we
are using the inatomic variants. Also the documentation of copy_to_user()
clearly states at various points that this function "may sleep":
-> git grep "This function may sleep" yields
"Context: User context only. This function may sleep." e.g. s390, x86, mips
Patching out the might_sleep() from these functions seems more to be a hack to
solve another problem - not using the inatomic variants or finding them not to
be sufficient for a task?
So calling might_sleep() in these functions seems very right to me.
>
> > Reason was to allow calling copy_(to|from)_user while in pagefault_disabled(),
> > because otherwise, CONFIG_DEBUG_ATOMIC_SLEEP would find false positives.
>
> That wasn't the only reason BTW.
> Andi Kleen also showed that compiler did a terrible job optimizing
> get/put user when might_sleep was called.
> See e.g. this thread:
> https://lkml.org/lkml/2013/8/14/652
> There was even an lwn.net article about it, that I don't seem to be
> able to locate.
Thanks, I'll try to look it up! but:
might_sleep() will only be called when lock debugging / sleep in atomic is in,
so this doesn't seem to be a problem for me in a production environment. or am
I missing something?
> So might_sleep is not appropriate for lightweight operations like __get_user,
> which people literally expect to be a single instruction.
I agree that __.* variants should not call might_fault() (like I mentioned
after the table below).
>
>
> I also have a project going which handles very short packets by copying
> them into guest memory directly without waking up a thread.
> I do it by calling recvmsg on a socket, then switching to
> a thread if I get back EFAULT.
> Not yet clean enough to upstream but it does seem to cut
> the latency down quite a bit, so I'd like to have the option.
>
>
> Generally, a caller that does something like this under a spinlock:
> preempt_disable
> pagefault_disable
> error = copy_to_user
So why can't we use the inatomic variant? This seems to be
atomic environment to me. Calling a function that states that it may sleep
doesn't feel right to me.
> pagefault_enable
> preempt_enable_no_resched
>
> is not doing anything wrong and should not get a warning,
> as long as error is handled correctly later.
>
> You can also find the discussion around the patches
> educational:
> http://article.gmane.org/gmane.comp.emulators.kvm.devel/109928
>
> > However
> > we have the inatomic variants of these function for this purpose.
>
> Does inatomic install fixups now?
If not, I think this would rather be the way to go.
>
> Last I checked, it didn't so it did not satisfy this purpose.
> See this comment from x86:
>
> * Copy data from kernel space to user space. Caller must check
> * the specified block with access_ok() before calling this function.
> * The caller should also make sure he pins the user space address
> * so that we don't result in page fault and sleep.
>
>
> Also - switching to inatomic would scatter if (atomic) all
> over code. It's much better to simply call the same
> function (e.g. recvmsg) and have it fail gracefully:
> after all, we have code to handle get/put user errors
> anyway.
>
> > The result of this change was that all guest access (that correctly uses
> > might_fault()) doesn't perform atomic checks when CONFIG_DEBUG_ATOMIC_SLEEP is
> > enabled. We lost a mighty debugging feature for user access.
>
> What's the path you are trying to debug?
Well, we had a problem where we held a spin_lock and called
copy_(from|to)_user(). We experienced very random deadlocks that took some guy
almost a week to debug. The simple might_sleep() check would have showed this
error immediately.
I would have said that in 99,9% of all copy_to_user() calls we want to check
might_sleep(). That pagefault_disable() is a special case that should be
handled differently - in my opinion.
>
> If your code can faults, then it's safe to call
> from atomic context.
> If it can't, it must pin the page. You can not do access_ok checks and
> then assume access won't fault.
>
> > As I wasn't able to come up with any other reason why this should be
> > necessary, I suggest turning the might_sleep() checks on again in the
> > might_fault() code.
>
> Faults triggered with pagefault_disabled do not cause
> caller to sleep, merely to fail and get an error,
> so might_sleep is simply wrong.
>
> >
> > pagefault_disable/pagefault_enable seems to be used mainly for futex, perf event
> > and kmap.
> >
> > Going over all changes since that commit, it seems like most code already uses the
> > inatomic variants of copy_(to|from)_user. Code relying on (get|put)_user during
> > pagefault_disable() don't make use of any might_fault() in their (get|put)_user
> > implementation. Examples:
> > - arch/m68k/include/asm/futex.h
> > - arch/parisc/include/asm/futex.h
> > - arch/sh/include/asm/futex-irq.h
> > - arch/tile/include/asm/futex.h
> > So changing might_fault() back to trigger might_sleep() won't change a thing for
> > them. Hope I haven't missed anything.
>
> Did you check the generated code?
Nope not yet. But as I said, if lock debugging is not enabled, this should
remain as is - without any might_sleep() checks.
> On x86 it seems to me this patchset is definitely going to
> slow things down, in fact putting back in a performance regression
> that Andi found.
>
>
> > I only identified one might_fault() that would get triggered by get_user() on
> > powerpc and fixed it by using the inatomic variant (not tested). I am not sure
> > if we need some non-sleeping access_ok() prior to __get_user_inatomic().
> >
> > By looking at the code I was wondering where the correct place for might_fault()
> > calls is? Doesn't seem to be very consistent. Examples:
> >
> > | asm-generic | arm | arm64 | frv | m32r | x86 and s390
> > ---------------------------------------------------------------------------
> > get_user() | Yes | Yes | Yes | No | Yes | Yes
> > __get_user() | No | Yes | No | No | Yes | No
> > put_user() | Yes | Yes | Yes | No | Yes | Yes
> > __put_user() | No | Yes | No | No | Yes | No
> > copy_to_user() | Yes | No | No | Yes | Yes | Yes
> > __copy_to_user() | No | No | No | Yes | No | No
> > copy_from_user() | Yes | No | No | Yes | Yes | Yes
> > __copy_from_user() | No | No | No | Yes | No | No
> >
>
> I think it would be a mistake to make this change.
>
> Most call-sites handle faults in atomic just fine by
> returning error to caller.
>
> > So I would have assume that the way asm-generic, x86 and s390 (+ propably
> > others) do this is the right way?
> > So we can speed up multiple calls to e.g. copy_to_user() by doing the access
> > check manually (and also the might_fault() if relevant), then calling
> > __copy_to_user().
> >
> > So in general, I conclude that the concept is:
> > 1. __.* variants perform no checking and don't call might_fault()
> > 2. [a-z].* variants perform access checking (access_ok() if implemented)
> > 3. [a-z].* variants call might_fault()
> > 4. .*_inatomic variants usually don't perform access checks
> > 5. .*_inatomic variants don't call might_fault()
> > 6. If common code uses the __.* variants, it has to trigger access_ok() and
> > call might_fault()
> > 7. For pagefault_disable(), the inatomic variants are to be used
>
> inatomic variants don't seem to handle faults, so you
> must pin any memory you pass to them.
>
Would that be an option for your special case?
>
> > Comments? Opinions?
> >
>
> If the same address is accessed multiple times, access_ok + __
> variant can be used to speed access up a bit.
> This is rarely the case, but this is the case for e.g. vhost.
> But access_ok does not guarantee that no fault will trigger:
> there's really no way to do that ATM except pinning the page.
>
>
> > David Hildenbrand (2):
> > powerpc/fsl-pci: atomic get_user when pagefault_disabled
> > mm, sched: trigger might_sleep() in might_fault() when atomic
> >
> > arch/powerpc/sysdev/fsl_pci.c | 2 +-
> > include/linux/kernel.h | 8 ++++++--
> > mm/memory.c | 11 ++++-------
> > 3 files changed, 11 insertions(+), 10 deletions(-)
> >
> > --
> > 1.8.5.5
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists