linux-kernel - Re: [PATCH RESEND v4] sched/fair: Add advisory flag for borrowing a timeslice

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <54949BF0.8030403@oracle.com>
Date:	Fri, 19 Dec 2014 14:43:12 -0700
From:	Khalid Aziz <khalid.aziz@...cle.com>
To:	Thomas Gleixner <tglx@...utronix.de>
CC:	Peter Zijlstra <peterz@...radead.org>, corbet@....net,
	mingo@...hat.com, hpa@...or.com, riel@...hat.com,
	akpm@...ux-foundation.org, rientjes@...gle.com, ak@...ux.intel.com,
	mgorman@...e.de, raistlin@...ux.it,
	kirill.shutemov@...ux.intel.com, atomlin@...hat.com,
	avagin@...nvz.org, gorcunov@...nvz.org, serge.hallyn@...onical.com,
	athorlton@....com, oleg@...hat.com, vdavydov@...allels.com,
	daeseok.youn@...il.com, keescook@...omium.org,
	yangds.fnst@...fujitsu.com, sbauer@....utah.edu,
	vishnu.ps@...sung.com, axboe@...com, paulmck@...ux.vnet.ibm.com,
	linux-kernel@...r.kernel.org, linux-doc@...r.kernel.org,
	linux-api@...r.kernel.org
Subject: Re: [PATCH RESEND v4] sched/fair: Add advisory flag for borrowing
 a timeslice

On 12/18/2014 05:27 PM, Thomas Gleixner wrote:
> On Thu, 18 Dec 2014, Khalid Aziz wrote:
>> On 12/18/2014 04:02 PM, Thomas Gleixner wrote:
>>> If we can solve it with a proper designed and well thought out
>>> functionality in the kernel based on a futex like mechanism, why cant
>>> java and databases not switch over to that and simply use it?
>>>
>>> You need to modify user space anyway, so it does not matter whether
>>> you modify it in a sane or in a hacky way.
>>
>> Actually userspace does not need to be modified. The code to use this
>> functionality is already present in database code since this same
>> functionality exists on other OSs (the API is a little different but those
>> details can be handled with a simple header file in userspace). Userspace code
>> has already been tested and debugged thoroughly on the OSs that support this
>> functionality and that has significant impact on testing effort. So for
>> userspace it is simply a matter of turning that code on on Linux as well and
>> recompiling. This would be a multi-platform solution for database/java as
>> opposed to a Linux specific solution.
>
> Bullshit. If you turn that option on, it's a modification from the QA
> point of view and you need to run a full validation no matter
> what. Anything else is just QA by crystal ball.
>
> Of course you carefully avoided (again) to answer the real question:
>
>> But its simpler to hack crap into the scheduler than coming up with a
>> proper solution to the problem, right?
>
> I can answer it for you: Yes, it is simpler.
>
> But as you might have figured out it's not really popular and therefor
> not simpler to be accepted by the people who actually care about sane
> designs. I can whip you up special purpose hacks for that which will
> give you way more guarantees with way less lines of horrible code, but
> that does not mean that such hacks are an acceptable solution. You can
> carry those hacks in your private tree and ship it to your customers,
> but do not expect that any sane maintainer will care about it.
>
> Now the very same maintainers asked you several times to answer the
> question why this can't be done with proper futex like spin
> mechanisms, which would solve a bunch of related problems as well.
>
>   You never even tried to answer that question simply because you never
>   tried to think about it for real. Your only answer is that you want A
>   because A is already used on other OSs and therefor solution B is not
>   an option.
>
>   But if solution B would gain 4% performance, then according to your
>   previous argumentation it would become suddenly very interesting,
>   right?
>
> So unless you even show any sign of thinking about different
> approaches and technically arguing why they cannot deliver the same
> value you wont get anywhere with this and I can tell you why.
>
> You create a new user space ABI
>
>   That forces the kernel to support it forever, which in consequence
>   imposes restrictions on the kernel scheduler forever.
>
>   We have enough restrictions by misdesigned ABIs (e.g. sched_yield())
>   already, so we really do not need more of that.
>
> You ignore any request to prove why a proper designed spin futex
> interface would not be a sensible solution for the problem.
>
>   Of course you are free to ignore that (as you are free to ignore
>   important review comments), but you don't have to be suprised when
>   the responsible maintainers ignore any further attempt from you to
>   get this merged.
>
> Aside of that, you still fail to provide a proper test case which is
> publically usable for the people involved in this to reproduce your 3%
> gain and analyze the problem at hand properly. The provided:
>
>        enable_hack();
>        while (/*some condition */) {
>        	    /* bla */
> 	    /* blub */
> 	    /* blurb */
> 	    /* yay! */
>        }
>        disable_hack();
>
> is beyond useless.
>
> Thanks,
>
> 	tglx
>

Fair enough. Implications of a new userspace ABI can be significant and 
I can accept not introducing a new one in the kernel.

The queuing problem caused by a task taking a contended lock just before 
its current timeslice is up which userspace app wouldn't know about, is 
a real problem nevertheless. My patch attempts to avoid the contention 
in the first place. futex with adaptive spinning is a post-contention 
solution that tries to minimize the cost of contention but does nothing 
to avoid the contention. Solving this problem using futex can help only 
if the userspace lock uses futex.

I have looked at solving this problem in userspace using priority 
inheritance semaphore but ran into many problems. I will go back and 
take another look at it.

I appreciate your feedback.

Thanks,
Khalid
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/