linux-kernel - Re: [PATCH RESEND] x86/tsc: print some log if calibrated tsc freq deviates from original too much

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <CAKcXpBz0wsabXFLK3uSgXn2sti11VcZeZz1c6G=7P6RoiExqjw@mail.gmail.com>
Date: Wed, 10 Apr 2024 12:52:55 +0800
From: Lei Chen <lei.chen@...rtx.com>
To: Ingo Molnar <mingo@...nel.org>
Cc: Thomas Gleixner <tglx@...utronix.de>, Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>, 
	Dave Hansen <dave.hansen@...ux.intel.com>, x86@...nel.org, 
	"H. Peter Anvin" <hpa@...or.com>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH RESEND] x86/tsc: print some log if calibrated tsc freq
 deviates from original too much

On Tue, Apr 9, 2024 at 4:05 PM Ingo Molnar <mingo@...nel.org> wrote:
>
>
> * Lei Chen <lei.chen@...rtx.com> wrote:
>
> > In most cases, tsc_khz is refined by hpet on boot. But in a few
> > production-level nodes, the refinement fails because calibrated
> > freq diviates from origin tsc freq more than 1%. Printing some
> > logs will help get this info.
> >
> > Signed-off-by: Lei Chen <lei.chen@...rtx.com>
> > ---
> >  arch/x86/kernel/tsc.c | 9 ++++++++-
> >  1 file changed, 8 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
> > index 15f97c0abc9d..a68b16e72df1 100644
> > --- a/arch/x86/kernel/tsc.c
> > +++ b/arch/x86/kernel/tsc.c
> > @@ -1435,8 +1435,15 @@ static void tsc_refine_calibration_work(struct work_struct *work)
> >       }
> >
> >       /* Make sure we're within 1% */
> > -     if (abs(tsc_khz - freq) > tsc_khz/100)
> > +     if (abs(tsc_khz - freq) > tsc_khz/100) {
> > +             pr_warn("Warning: TSC freq calibrated by [%s]: %lu.%03lu MHz deviates too much from original freq: %lu.%03lu MHz\n",
>
> Yeah, so it wouldn't cost us anything to more precisely define 'too much':
>
>  s/deviates too much from
>   /deviates by more than 1% from
>
> Right?

Thanks for your suggestion, I'll change it.


>
> > +                     hpet ? "HPET" : "PM_TIMER",
> > +                     (unsigned long)freq / 1000,
> > +                     (unsigned long)freq % 1000,
> > +                     (unsigned long)tsc_khz / 1000,
> > +                     (unsigned long)tsc_khz % 1000);
> >               goto out;
> > +     }
>
> The warning makes sense I suppose, if it's one per system and once per
> bootup [right?], but I think pr_info() would be plenty enough priority for
> this condition - especially as we didn't have the warning before and don't
> know how frequent it is?
>
I'm afraid tsc frequency deviates from HPET more than 1% is a system risk.
It means either HPET or tsc might be not stable. Users should be cautious of
the risk and turn to ntp for help. In our production nodes, it indeed
causes more
time skew.  So maybe warning is a better way to expose the risk ?