linux-kernel - [PATCH v2 0/2] Speed MTRR programming up when we can

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <1561689337-19390-1-git-send-email-ricardo.neri-calderon@linux.intel.com>
Date:   Thu, 27 Jun 2019 19:35:35 -0700
From:   Ricardo Neri <ricardo.neri-calderon@...ux.intel.com>
To:     Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...nel.org>, Borislav Petkov <bp@...e.de>
Cc:     Alan Cox <alan.cox@...el.com>, Tony Luck <tony.luck@...el.com>,
        "H. Peter Anvin" <hpa@...or.com>,
        Andy Shevchenko <andriy.shevchenko@...ux.intel.com>,
        Andi Kleen <andi.kleen@...el.com>,
        Hans de Goede <hdegoede@...hat.com>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Jordan Borgner <mail@...dan-borgner.de>,
        "Ravi V. Shankar" <ravi.v.shankar@...el.com>,
        Mohammad Etemadi <mohammad.etemadi@...el.com>,
        Ricardo Neri <ricardo.neri@...el.com>,
        linux-kernel@...r.kernel.org, x86@...nel.org,
        Ricardo Neri <ricardo.neri-calderon@...ux.intel.com>
Subject: [PATCH v2 0/2] Speed MTRR programming up when we can

This is the second iteration of this patchset. The first iteration can be
viewed here [1]. 

Programming MTRR registers in multi-processor systems is a rather lengthy
process. Furthermore, all processors must program these registers in lock
step and with interrupts disabled; the process also involves flushing
caches and TLBs twice. As a result, the process may take a considerable
amount of time.

In some platforms, this can lead to a large skew of the refined-jiffies
clock source. Early when booting, if no other clock is available (e.g.,
booting with hpet=disabled), the refined-jiffies clock source is used to
monitor the TSC clock source. If the skew of refined-jiffies is too large,
Linux wrongly assumes that the TSC is unstable:

     clocksource: timekeeping watchdog on CPU1: Marking clocksource
          'tsc-early' as unstable because the skew is too large:
     clocksource: 'refined-jiffies' wd_now: fffedc10 wd_last:
          fffedb90 mask: ffffffff
     clocksource: 'tsc-early' cs_now: 5eccfddebc cs_last: 5e7e3303d4
          mask: ffffffffffffffff
     tsc: Marking TSC unstable due to clocksource watchdog

As per my measurements, around 98% of the time needed by the procedure to
program MTRRs in multi-processor systems is spent flushing caches with
wbinvd(). As per the Section 11.11.8 of the Intel 64 and IA 32
Architectures Software Developer's Manual, it is not necessary to flush
caches if the CPU supports cache self-snooping. Thus, skipping the cache
flushes can reduce by several tens of milliseconds the time needed to
complete the programming of the MTRR registers.

However, there exist CPU models with errata that affect their self-
snooping capabilities. Such errata may cause unpredictable behavior,
machine check errors, or hangs. For instance:

     "Where two different logical processors have the same code page
      mapped with two different memory types Specifically if one code
      page is mapped by one logical processor as write back and by
      another as uncacheable and certain instruction timing conditions
      occur the system may experience unpredictable behaviour." [2].

Similar errata are reported in other processors as well [3], [4], [5],
[6], and [7].

Thus, in order to confidently leverage self-snooping for the MTRR
programming algorithm, we must first clear such feature in models with
known errata.

By measuring the execution time of mtrr_aps_init() (from which MTRRs
in all CPUs are programmed in lock-step at boot), I find savings in the
time required to program MTRRs as follows:

Platform                      time-with-wbinvd(ms) time-no-wbinvd(ms)
104-core (208 LP) Skylake            1437                 28
2-core (4 LP) Haswell                 114                  2

LP = Logical Processor

Thanks and BR,
Ricardo

Changes since v1:

 * Relocated comment on the utility of cache self-snooping from
   check_memory_type_self_snoop_errata() to the prepare_set() function
   of the generic MTRR programming ops (Thomas Gleixner).
 * In early_init_intel(), moved check_memory_type_self_snoop_errata()
   next to check_mpx_erratum() for improved readability.

[1]. https://lkml.org/lkml/2019/6/27/828
[2]. Erratum BF52, 
https://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/xeon-3600-specification-update.pdf
[3]. Erratum BK47, 
https://www.mouser.com/pdfdocs/2ndgencorefamilymobilespecificationupdate.pdf
[4]. Erratum AAO54, 
https://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/xeon-c5500-c3500-spec-update.pdf
[5]. Errata AZ39, AZ42, 
https://www.intel.com/content/dam/support/us/en/documents/processors/mobile/celeron/sb/320121.pdf
[6]. Errata AQ51, AQ102, AQ104, 
https://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/pentium-dual-core-desktop-e2000-specification-update.pdf
[7]. Errata AN107, AN109, 
https://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/pentium-dual-core-specification-update.pdf

Ricardo Neri (2):
  x86/cpu/intel: Clear cache self-snoop capability in CPUs with known
    errata
  x86, mtrr: generic: Skip cache flushes on CPUs with cache
    self-snooping

 arch/x86/kernel/cpu/intel.c        | 27 +++++++++++++++++++++++++++
 arch/x86/kernel/cpu/mtrr/generic.c | 15 +++++++++++++--
 2 files changed, 40 insertions(+), 2 deletions(-)

-- 
2.17.1