linux-kernel - Re: [PATCH RFC 1/1] kernel/cpu: to track which CPUHP callback is failed

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20210406161032.iwmlfszrfcpjss5e@e107158-lin>
Date:   Tue, 6 Apr 2021 17:10:32 +0100
From:   Qais Yousef <qais.yousef@....com>
To:     Dongli Zhang <dongli.zhang@...cle.com>
Cc:     linux-kernel@...r.kernel.org, tglx@...utronix.de,
        peterz@...radead.org, mpe@...erman.id.au,
        aneesh.kumar@...ux.ibm.com, ethp@...com, npiggin@...il.com,
        joe.jin@...cle.com
Subject: Re: [PATCH RFC 1/1] kernel/cpu: to track which CPUHP callback is
 failed

On 04/05/21 14:22, Dongli Zhang wrote:
> May I have if there is any feedback on this? To pr_err_once() here helps narrow
> down what is the root cause of cpu online failure.
> 
> 
> The issue fixed by d7eb79c6290c ("KVM: kvmclock: Fix vCPUs > 64 can't be
> online/hotpluged") is able to demonstrate how this pr_err_once() helps.
> 
> Due to VM kernel issue, at most 64 VCPUs can online if host clocksource is
> switched to hpet.
> 
> # echo hpet > /sys/devices/system/clocksource/clocksource0/current_clocksource
> 
> 
> By default, we have no idea why only 64 out of 100 VCPUs are online in VM. We
> need to customize kernel to debug why some CPUs are not able to online.
> 
> 
> We will have below and know the root cause is with kvmclock, if we add the
> pr_err_once().

Isn't pr_debug() more appropriate here? Don't you want one on the
cpuhp_down_callbacks() too?

I *think* it's common now to have CONFIG_DYNAMIC_DEBUG enabled so one can
enable that line when they need to

	https://www.kernel.org/doc/html/latest/admin-guide/dynamic-debug-howto.html

I can see the appeal of pr_err_once() but I think this can fail for legitimate
reasons and is not necessarily strictly always an error?

Thanks

--
Qais Yousef