lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 19 Apr 2017 12:13:18 -0400
From:   Johannes Weiner <hannes@...xchg.org>
To:     "Huang, Ying" <ying.huang@...el.com>
Cc:     Andrew Morton <akpm@...ux-foundation.org>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH -mm -v9 2/3] mm, THP, swap: Check whether THP can be
 split firstly

On Wed, Apr 19, 2017 at 03:06:24PM +0800, Huang, Ying wrote:
> From: Huang Ying <ying.huang@...el.com>
> 
> To swap out THP (Transparent Huage Page), before splitting the THP,
> the swap cluster will be allocated and the THP will be added into the
> swap cache.  But it is possible that the THP cannot be split, so that
> we must delete the THP from the swap cache and free the swap cluster.
> To avoid that, in this patch, whether the THP can be split is checked
> firstly.  The check can only be done racy, but it is good enough for
> most cases.
> 
> With the patchset, the swap out throughput improves 3.6% (from about
> 4.16GB/s to about 4.31GB/s) in the vm-scalability swap-w-seq test case
> with 8 processes.  The test is done on a Xeon E5 v3 system.  The swap
> device used is a RAM simulated PMEM (persistent memory) device.  To
> test the sequential swapping out, the test case creates 8 processes,
> which sequentially allocate and write to the anonymous pages until the
> RAM and part of the swap device is used up.
> 
> Cc: Johannes Weiner <hannes@...xchg.org>
> Signed-off-by: "Huang, Ying" <ying.huang@...el.com>
> Acked-by: Kirill A. Shutemov <kirill.shutemov@...ux.intel.com> [for can_split_huge_page()]

How often does this actually happen in practice? Because all that this
protects us from is trying to allocate a swap cluster - which with the
si->free_clusters list really isn't all that expensive - and return it
again. Unless this happens all the time in practice, this optimization
seems misplaced.

It's especially a little strange because in the other email I asked
about the need for unlikely() annotations, yet this patch is adding
branches and checks for what seems to be an unlikely condition into
the THP hot path.

I'd suggest you drop both these optimization attempts unless there is
real data proving that they have a measurable impact.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ