linux-kernel - Re: [RFC PATCH v2 00/17] Core scheduling v2

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20190509021144.GA24577@aaronlu>
Date:   Thu, 9 May 2019 10:11:44 +0800
From:   Aaron Lu <aaron.lu@...ux.alibaba.com>
To:     Julien Desfossez <jdesfossez@...italocean.com>
Cc:     Vineeth Remanan Pillai <vpillai@...italocean.com>,
        Phil Auld <pauld@...hat.com>,
        Nishanth Aravamudan <naravamudan@...italocean.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Tim Chen <tim.c.chen@...ux.intel.com>, mingo@...nel.org,
        tglx@...utronix.de, pjt@...gle.com, torvalds@...ux-foundation.org,
        linux-kernel@...r.kernel.org, subhra.mazumdar@...cle.com,
        fweisbec@...il.com, keescook@...omium.org, kerrnel@...gle.com,
        Aaron Lu <aaron.lwe@...il.com>,
        Aubrey Li <aubrey.intel@...il.com>,
        Valentin Schneider <valentin.schneider@....com>,
        Mel Gorman <mgorman@...hsingularity.net>,
        Pawan Gupta <pawan.kumar.gupta@...ux.intel.com>,
        Paolo Bonzini <pbonzini@...hat.com>
Subject: Re: [RFC PATCH v2 00/17] Core scheduling v2

On Wed, May 08, 2019 at 01:49:09PM -0400, Julien Desfossez wrote:
> On 08-May-2019 10:30:09 AM, Aaron Lu wrote:
> > On Mon, May 06, 2019 at 03:39:37PM -0400, Julien Desfossez wrote:
> > > On 29-Apr-2019 11:53:21 AM, Aaron Lu wrote:
> > > > This is what I have used to make sure no two unmatched tasks being
> > > > scheduled on the same core: (on top of v1, I thinks it's easier to just
> > > > show the diff instead of commenting on various places of the patches :-)
> > > 
> > > We imported this fix in v2 and made some small changes and optimizations
> > > (with and without Peter’s fix from https://lkml.org/lkml/2019/4/26/658)
> > > and in both cases, the performance problem where the core can end up
> > 
> > By 'core', do you mean a logical CPU(hyperthread) or the entire core?
> No I really meant the entire core.
> 
> I’m sorry, I should have added a little bit more context. This relates
> to a performance issue we saw in v1 and discussed here:
> https://lore.kernel.org/lkml/20190410150116.GI2490@worktop.programming.kicks-ass.net/T/#mb9f1f54a99bac468fc5c55b06a9da306ff48e90b
> 
> We proposed a fix that solved this, Peter came up with a better one
> (https://lkml.org/lkml/2019/4/26/658), but if we add your isolation fix
> as posted above, the same problem reappears. Hope this clarifies your
> ask.

It's clear now, thanks.
I don't immediately see how my isolation fix would make your fix stop
working, will need to check. But I'm busy with other stuffs so it will
take a while.

> 
> I hope that we did not miss anything crucial while integrating your fix
> on top of v2 + Peter’s fix. The changes are conceptually similar, but we
> refactored it slightly to make the logic clear. Please have a look and
> let us know

I suppose you already have a branch that have all the bits there? I
wonder if you can share that branch somewhere so I can start working on
top of it to make sure we are on the same page?

Also, it would be good if you can share the workload, cmdline options,
how many workers need to start etc. to reproduce this issue.

Thanks.