lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 18 Jan 2022 09:16:59 -0800
From:   Peter Oskolkov <posk@...gle.com>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     mingo@...hat.com, tglx@...utronix.de, juri.lelli@...hat.com,
        vincent.guittot@...aro.org, dietmar.eggemann@....com,
        rostedt@...dmis.org, bsegall@...gle.com, mgorman@...e.de,
        bristot@...hat.com, linux-kernel@...r.kernel.org,
        linux-mm@...ck.org, linux-api@...r.kernel.org, x86@...nel.org,
        pjt@...gle.com, avagin@...gle.com, jannh@...gle.com,
        tdelisle@...terloo.ca, posk@...k.io
Subject: Re: [RFC PATCH v2 4/5] sched: UMCG: add a blocked worker list

On Mon, Jan 17, 2022 at 1:19 AM Peter Zijlstra <peterz@...radead.org> wrote:
>
> On Thu, Jan 13, 2022 at 03:39:39PM -0800, Peter Oskolkov wrote:

[...]

> >
> > So this change basically decouples block/wake detection from
> > M:N threading in the sense that the number of servers is now
> > does not have to be M or N, but is more driven by the scalability
> > needs of the userspace application.
>
> So I don't object to having this blocking list, we had that early on in
> the discussions.
>
> *However*, combined with WF_CURRENT_CPU this 1:N userspace model doesn't
> really make sense, also combined with Proxy-Exec (if we ever get that
> sorted) it will fundamentally not work.
>
> More consideration is needed I think...

I was not very clear here. The intent of this change is not to make
1:N a good general approach, but to make "several running workers per
single server" a viable option.

My guess, based on some numbers/benchmarks from another project, is
that having a single server/runqueue per four or eight running
workers, properly aligned with (= affined to) an AMD chiplet, will be
the most performant solution, comparing to both a runqueue per single
running worker and to a global runqueue. On Intel this will probably
look like a single runqueue per core (2 running workers/HT threads).

So in this model a "server" represents a runqueue.

I'll reply to other active umcg discussions shortly.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ