linux-kernel - Re: [PATCH v1 1/2] x86/tsc: use logical_package as a better estimation of socket numbers

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <dfd2fb43-2a19-545a-fea8-f793a685ef30@intel.com>
Date:   Mon, 24 Oct 2022 08:42:30 -0700
From:   Dave Hansen <dave.hansen@...el.com>
To:     Zhang Rui <rui.zhang@...el.com>, Feng Tang <feng.tang@...el.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        "H . Peter Anvin" <hpa@...or.com>,
        Peter Zijlstra <peterz@...radead.org>, x86@...nel.org,
        linux-kernel@...r.kernel.org
Cc:     tim.c.chen@...el.com, Xiongfeng Wang <wangxiongfeng2@...wei.com>,
        liaoyu15@...wei.com
Subject: Re: [PATCH v1 1/2] x86/tsc: use logical_package as a better
 estimation of socket numbers

On 10/22/22 09:12, Zhang Rui wrote:
>>> I'm not sure if we have a perfect solution here.
>> Are the implementations fixable?
> currently, I don't have any idea.
> 
>>   Or, at least tolerable?

That would be great to figure out before we start throwing more patches
around.

>> For instance, I can live with the implementation being a bit goofy
>> when
>> kernel commandlines are in play.  We can pr_info() about those cases.
> My understanding is that the cpus in the last package may still have
> small cpu id value. This means that the 'logical_packages' is hard to
> break unless we boot with very small CPU count and happened to disable
> all cpus in one/more packages. Feng is experiencing with this and may
> have some update later.
> 
> If this is the case, is this a valid case that we need to take care of?

Well, let's talk through it a bit.

What is the triggering event and what's the fallout?

Is the user on a truly TSC stable system or not?

What kind of maxcpus= argument do they need to specify?  Is it something
that's likely to get used in production or is it most likely just for
debugging?

What is the maxcpus= fallout?  Does it over estimate or under estimate
the number of logical packages?

How many cases outside of maxcpus= do we know of that lead to an
imprecise "logical packages" calculation?

Does this lead to the TSC being mistakenly marked stable when it is not,
or *not* being marked stable when it is?

Let's get all of that info in one place and make sure we are all agreed
on the *problem* before we got to the solution space.