lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20201221162519.GA22504@open-light-1.localdomain>
Date:   Mon, 21 Dec 2020 11:25:22 -0500
From:   Liang Li <liliang.opensource@...il.com>
To:     Alexander Duyck <alexander.h.duyck@...ux.intel.com>,
        Mel Gorman <mgorman@...hsingularity.net>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Andrea Arcangeli <aarcange@...hat.com>,
        Dan Williams <dan.j.williams@...el.com>,
        "Michael S. Tsirkin" <mst@...hat.com>,
        David Hildenbrand <david@...hat.com>,
        Jason Wang <jasowang@...hat.com>,
        Dave Hansen <dave.hansen@...el.com>,
        Michal Hocko <mhocko@...e.com>,
        Liang Li <liliangleo@...iglobal.com>
Cc:     linux-mm@...ck.org, linux-kernel@...r.kernel.org,
        virtualization@...ts.linux-foundation.org
Subject: [RFC v2 PATCH 0/4] speed up page allocation for __GFP_ZERO

The first version can be found at: https://lkml.org/lkml/2020/4/12/42

Zero out the page content usually happens when allocating pages with
the flag of __GFP_ZERO, this is a time consuming operation, it makes
the population of a large vma area very slowly. This patch introduce
a new feature for zero out pages before page allocation, it can help
to speed up page allocation with __GFP_ZERO.

My original intention for adding this feature is to shorten VM
creation time when SR-IOV devicde is attached, it works good and the
VM creation time is reduced by about 90%.

Creating a VM [64G RAM, 32 CPUs] with GPU passthrough
=====================================================
QEMU use 4K pages, THP is off
                  round1      round2      round3
w/o this patch:    23.5s       24.7s       24.6s
w/ this patch:     10.2s       10.3s       11.2s

QEMU use 4K pages, THP is on
                  round1      round2      round3
w/o this patch:    17.9s       14.8s       14.9s
w/ this patch:     1.9s        1.8s        1.9s
=====================================================

Obviously, it can do more than this. We can benefit from this feature
in the flowing case:

Interactive sence
=================
Shorten application lunch time on desktop or mobile phone, it can help
to improve the user experience. Test shows on a
server [Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz], zero out 1GB RAM by
the kernel will take about 200ms, while some mainly used application
like Firefox browser, Office will consume 100 ~ 300 MB RAM just after
launch, by pre zero out free pages, it means the application launch
time will be reduced about 20~60ms (can be visual sensed?). May be
we can make use of this feature to speed up the launch of Andorid APP
(I didn't do any test for Android).

Virtulization
=============
Speed up VM creation and shorten guest boot time, especially for PCI
SR-IOV device passthrough scenario. Compared with some of the para
vitalization solutions, it is easy to deploy because it’s transparent
to guest and can handle DMA properly in BIOS stage, while the para
virtualization solution can’t handle it well.

Improve guest performance when use VIRTIO_BALLOON_F_REPORTING for memory
overcommit. The VIRTIO_BALLOON_F_REPORTING feature will report guest page
to the VMM, VMM will unmap the corresponding host page for reclaim,
when guest allocate a page just reclaimed, host will allocate a new page
and zero it out for guest, in this case pre zero out free page will help
to speed up the proccess of fault in and reduce the performance impaction.

Speed up kernel routine
=======================
This can’t be guaranteed because we don’t pre zero out all the free pages,
but is true for most case. It can help to speed up some important system
call just like fork, which will allocate zero pages for building page
table. And speed up the process of page fault, especially for huge page
fault. The POC of Hugetlb free page pre zero out has been done.

Security
========
This is a weak version of "introduce init_on_alloc=1 and init_on_free=1
boot options", which zero out page in a asynchronous way. For users can't
tolerate the impaction of 'init_on_alloc=1' or 'init_on_free=1' brings,
this feauture provide another choice.

For the feedback of the first version, cache pollution is the main concern
of the mm guys, On the other hand, this feature is really helpful for
some use case. May be we should let the user decide wether to use it.
So a switch is added in the /sys files, users who don’t like it can turn
off the switch, or by configuring a large batch size to reduce cache
pollution.

To make the whole function works, support of pre zero out free huge pages
should be added for hugetlbfs, I will send another patch for it.

Liang Li (4):
  mm: let user decide page reporting option
  mm: pre zero out free pages to speed up page allocation for __GFP_ZERO
  mm: make page reporing worker works better for low order page
  mm: Add batch size for free page reporting

 drivers/virtio/virtio_balloon.c |   3 +
 include/linux/highmem.h         |  31 +++-
 include/linux/page-flags.h      |  16 +-
 include/linux/page_reporting.h  |   3 +
 include/trace/events/mmflags.h  |   7 +
 mm/Kconfig                      |  10 ++
 mm/Makefile                     |   1 +
 mm/huge_memory.c                |   3 +-
 mm/page_alloc.c                 |   4 +
 mm/page_prezero.c               | 266 ++++++++++++++++++++++++++++++++
 mm/page_prezero.h               |  13 ++
 mm/page_reporting.c             |  49 +++++-
 mm/page_reporting.h             |  16 +-
 13 files changed, 405 insertions(+), 17 deletions(-)
 create mode 100644 mm/page_prezero.c
 create mode 100644 mm/page_prezero.h

Cc: Alexander Duyck <alexander.h.duyck@...ux.intel.com>
Cc: Mel Gorman <mgorman@...hsingularity.net>
Cc: Andrea Arcangeli <aarcange@...hat.com>
Cc: Dan Williams <dan.j.williams@...el.com>
Cc: Dave Hansen <dave.hansen@...el.com>
Cc: David Hildenbrand <david@...hat.com>
Cc: Michal Hocko <mhocko@...e.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>
Cc: Alex Williamson <alex.williamson@...hat.com>
Cc: Michael S. Tsirkin <mst@...hat.com>
Signed-off-by: Liang Li <liliang324@...il.com>
-- 
2.18.2

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ