lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20200928072256.13098-1-gshan@redhat.com>
Date:   Mon, 28 Sep 2020 17:22:54 +1000
From:   Gavin Shan <gshan@...hat.com>
To:     linux-arm-kernel@...ts.infradead.org
Cc:     linux-kernel@...r.kernel.org, mark.rutland@....com,
        anshuman.khandual@....com, robin.murphy@....com,
        catalin.marinas@....com, will@...nel.org, shan.gavin@...il.com
Subject: [PATCH v3 0/2] arm64/mm: Enable color zero pages

The feature of color zero pages isn't enabled on arm64, meaning all
read-only (anonymous) VM areas are backed up by same zero page. It
leads pressure to data cache on reading data from them. In extreme
case, the same data cache set could be experiencing high pressure
and thrashing. This tries to enable color zero pages to resolve the
issue.

PATCH[1/2] decouples the zero PGD table from zero page
PATCH[2/2] allocates the needed zero pages according to L1 cache size

Testing
=======
[1] The experiment reveals how heavily the (L1) data cache miss impacts
    the overall application's performance. The machine where the test
    is carried out has the following L1 data cache topology. In the
    mean while, the host kernel have following configurations.

    The test case allocates contiguous page frames through HugeTLBfs
    and reads 4-bytes data from the same offset (0x0) from these (N)
    contiguous page frames. N is equal to 8 or 9 separately in the
    following two test cases. This is repeated for one million of
    times.

    Note that 8 is number of L1 data cache ways. The experiment is
    cause L1 cache thrashing on one particular set.

    Host:      CONFIG_ARM64_PAGE_SHIFT=12
               DEFAULT_HUGE_PAGE_SIZE=2MB
    L1 dcache: cache-line-size=64
               number-of-sets=64
               number-of-ways=8

                            N=8           N=9
    ------------------------------------------------------------------
    cache-misses:           43,429        9,038,460
    L1-dcache-load-misses:  43,429        9,038,460
    seconds time elapsed:   0.299206372   0.722253140   (2.41 times)

[2] The experiment should have been carried out on machine where the
    L1 data cache capacity of one particular way is larger than 4KB.
    However, I'm unable to find such kind of machines. So I have to
    evaluate the performance impact caused by L2 data cache thrashing.
    The experiment is carried out on the machine, which has following
    L1/L2 data cache topology. The host kernel configuration is same
    to [1].

    The corresponding test program allocates contiguous page frames
    through hugeTLBfs and builds VMAs backed by zero pages. These
    contiguous pages are sequentially read from fixed offset (0) in step
    of 32KB and by 8 times. After that, the VMA backed by zero pages are
    sequentially read in step of 4KB and by once. It's repeated by 8
    millions of times.

    Note 32KB is the cache capacity in one L2 data cache way and 8 is
    number of L2 data cache sets. This experiment is to cause L2 data
    cache thrashing on one particular set.

    L1 dcache:  <same as [1]>
    L2 dcache:  cache-line-size=64
                number-of-sets=512
                number-of-ways=8

    -----------------------------------------------------------------------
    cache-references:       1,427,213,737    1,421,394,472
    cache-misses:              35,804,552       42,636,698
    L1-dcache-load-misses:     35,804,552       42,636,698
    seconds time elapsed:   2.602511671      2.098198172      (+19.3%)

Changes since v2:

   * Rebased to last upstream kernel (5.9.rc6)             (Gavin)
   * Improved commit log                                   (Gavin)
   * Provide performance data in the cover letter          (Catalin)


Gavin Shan (2):
  arm64/mm: Introduce zero PGD table
  arm64/mm: Enable color zero pages

 arch/arm64/include/asm/cache.h       |  3 ++
 arch/arm64/include/asm/mmu_context.h |  6 +--
 arch/arm64/include/asm/pgtable.h     | 11 ++++-
 arch/arm64/kernel/cacheinfo.c        | 67 ++++++++++++++++++++++++++++
 arch/arm64/kernel/setup.c            |  2 +-
 arch/arm64/kernel/vmlinux.lds.S      |  4 ++
 arch/arm64/mm/init.c                 | 37 +++++++++++++++
 arch/arm64/mm/mmu.c                  |  7 ---
 arch/arm64/mm/proc.S                 |  2 +-
 drivers/base/cacheinfo.c             |  3 +-
 include/linux/cacheinfo.h            |  6 +++
 11 files changed, 132 insertions(+), 16 deletions(-)

-- 
2.23.0

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ