linux-kernel - Re: [revert] mysql+oltp regression

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20080811124857.GD10082@elte.hu>
Date:	Mon, 11 Aug 2008 14:48:57 +0200
From:	Ingo Molnar <mingo@...e.hu>
To:	Gregory Haskins <ghaskins@...ell.com>
Cc:	Mike Galbraith <efault@....de>,
	LKML <linux-kernel@...r.kernel.org>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>
Subject: Re: [revert] mysql+oltp regression

* Gregory Haskins <ghaskins@...ell.com> wrote:

> Ingo Molnar wrote:
>> * Mike Galbraith <efault@....de> wrote:
>>
>>   
>>> Greetings,
>>>
>>> During regression testing of tip/sched/clock fixes, a regression in  
>>> low client count throughput turned up, which I traced this back to 
>>> the commit below.  I don't see anything wrong with it, but suspect 
>>> that it is preventing client/server pairs from staying together on 
>>> the same CPU as buddies, which mysql definitely likes quite a lot.  
>>> (I suspect that this is the case, because I've seen this same 
>>> performance curve while tinkering with wakeup affinity and breaking 
>>> it all to pieces;)
>>>
>>> Changelog and test results below in case nobody sees a problem with  
>>> the commit itself.
>>>     
>>
>> i've applied your fix to tip/sched/urgent for the time being, thanks  
>> Mike for tracking it down. We can re-try newer iterations of Greg's  
>> patch in tip/sched/devel.
>>
>>   
>
> Hmm..  The patch still looks correct afaict.  I fear we are just 
> papering over some other issue by reverting it, but I will try to see 
> if I can track this down.  We will, of course, now be skipping trying 
> to balance the (effectively random) last task in the queue which may 
> or may not result in better performance on sheer luck instead of 
> algorithmic intelligence.  This makes me nervous.

yeah - but we had that behavior for quite some time.

This is how the patch cycle works normally: we had a fair chance to 
discover this problem in your testing then in -tip testing and then in 
linux-next or -mm but we didnt find it at any stage.

Now we are in the upstream release cycle so unless there's some 
immediate fix available (or there are _really_ strong reasons against 
the revert) doing the revert is the right approach.

A revert is not necessarily the indicator of the quality of the change 
in question, it is a tester-driven exception event that guarantees that 
the kernel improves in a monotonic way. (for all testers who opt to help 
us in doing so)

And given that the problem was readily reproducible for Mike, it should 
be reproducible for you as well - so we dont actually make the bug 
harder to fix by doing the revert.

Perhaps we should introduce the notion of "Defer-to-next-release" 
reverts - which this really is - in contrast to "Revert-because-bad", 
which your change definitely is not.

> Speaking of this: Another patch I submitted to you Ingo (had to do 
> with updating the load_weight inside task_setprio) seems to also have 
> this phenomenon: e.g. its technically correct but further testing has 
> revealed negative repercussions elsewhere.  So please ignore that 
> patch (or revert if you already pulled in, but I don't think you 
> have).  Ill try to look into this issue as well.

ok, under which thread/subject is that? Not queued in tip/sched/* yet, 
correct?

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/