lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 20 Apr 2017 08:50:43 +0800
From:   "Huang\, Ying" <ying.huang@...el.com>
To:     Johannes Weiner <hannes@...xchg.org>
Cc:     "Huang\, Ying" <ying.huang@...el.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        <linux-mm@...ck.org>, <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH -mm -v9 2/3] mm, THP, swap: Check whether THP can be split firstly

Johannes Weiner <hannes@...xchg.org> writes:

> On Wed, Apr 19, 2017 at 03:06:24PM +0800, Huang, Ying wrote:
>> From: Huang Ying <ying.huang@...el.com>
>> 
>> To swap out THP (Transparent Huage Page), before splitting the THP,
>> the swap cluster will be allocated and the THP will be added into the
>> swap cache.  But it is possible that the THP cannot be split, so that
>> we must delete the THP from the swap cache and free the swap cluster.
>> To avoid that, in this patch, whether the THP can be split is checked
>> firstly.  The check can only be done racy, but it is good enough for
>> most cases.
>> 
>> With the patchset, the swap out throughput improves 3.6% (from about
>> 4.16GB/s to about 4.31GB/s) in the vm-scalability swap-w-seq test case
>> with 8 processes.  The test is done on a Xeon E5 v3 system.  The swap
>> device used is a RAM simulated PMEM (persistent memory) device.  To
>> test the sequential swapping out, the test case creates 8 processes,
>> which sequentially allocate and write to the anonymous pages until the
>> RAM and part of the swap device is used up.
>> 
>> Cc: Johannes Weiner <hannes@...xchg.org>
>> Signed-off-by: "Huang, Ying" <ying.huang@...el.com>
>> Acked-by: Kirill A. Shutemov <kirill.shutemov@...ux.intel.com> [for can_split_huge_page()]
>
> How often does this actually happen in practice? Because all that this
> protects us from is trying to allocate a swap cluster - which with the
> si->free_clusters list really isn't all that expensive - and return it
> again. Unless this happens all the time in practice, this optimization
> seems misplaced.

In addition to allocate/free swap cluster, add/delete to/from the swap
cache will be called too.

> It's especially a little strange because in the other email I asked
> about the need for unlikely() annotations, yet this patch is adding
> branches and checks for what seems to be an unlikely condition into
> the THP hot path.
>
> I'd suggest you drop both these optimization attempts unless there is
> real data proving that they have a measurable impact.

To my surprise too, I found this patch has measurable impact in my
test.  The swap out throughput improves 3.6% in the vm-scalability
swap-w-seq test case with 8 processes.  Details are in the original
patch description.

Best Regards,
Huang, Ying

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ