lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Y/NNKdAtSZv631Z2@hirez.programming.kicks-ass.net>
Date:   Mon, 20 Feb 2023 11:36:25 +0100
From:   Peter Zijlstra <peterz@...radead.org>
To:     Zhang Rui <rui.zhang@...el.com>
Cc:     tglx@...utronix.de, mingo@...hat.com, bp@...en8.de,
        dave.hansen@...ux.intel.com, hpa@...or.com, x86@...nel.org,
        linux-kernel@...r.kernel.org, zhang.jia@...ux.alibaba.com,
        len.brown@...el.com
Subject: Re: [RFC PATCH V2 0/1] x86: cpu topology fix and question on
 x86_max_cores


On Mon, Feb 20, 2023 at 11:28:55AM +0800, Zhang Rui wrote:

> Solution for fix smp_num_sibling
> --------------------------------
> 
> Patch 1/1 ensures that smp_num_siblings represents the system-wide maximum
> number of siblings by always increasing its value. Never allow it to
> decrease.
> 
> It is sufficient to make the problem go away.
> 
> However, there is a pontenial problem left. That is, when boot CPU is an
> Ecore CPU, smp_num_sibling is set to 1 during BSP probe, kernel disables
> SMT support by setting cpu_smt_control to CPU_SMT_NOT_SUPPORTED in
> start_kernel()->check_bugs()->cpu_smt_check_topology().
> So far, we don't have such platforms.

This is the much recurring problem of the boot CPU not having access to
the system topology.

Instead of fixing that, Intel seems to work at making it worse. At some
point, we're just going to have to give up and move to DT or something
:/

Please communicate (again), that only knowing the topology/setup of the
system once all the CPUs are online is crap. Once you start bringing up
APs some things are fixed -- if we guessed wrong, we're hosed.

Specific examples of this that we've ran into in the past are:

 - does the machine have SMT
 - is the machine Hybrid
   (and if so, how many different core types will be have)

Specifically, things like determining the number of GP event counters on
a PMU sometimes depends on HT being active, but we want the PMU
initialized really early because it also serves watchdog and you want
splats when something goes side-ways.

The end result is that we have to make things complicated and
dynamically re-adjust when system resources come online.

So far we've managed -- just, but *PLEASE*, dont make it worse!!!

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ