lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 27 Mar 2019 11:00:32 -0700
From:   Dave Hansen <dave.hansen@...el.com>
To:     Zi Yan <ziy@...dia.com>
Cc:     Keith Busch <kbusch@...nel.org>,
        Yang Shi <yang.shi@...ux.alibaba.com>, mhocko@...e.com,
        mgorman@...hsingularity.net, riel@...riel.com, hannes@...xchg.org,
        akpm@...ux-foundation.org, "Busch, Keith" <keith.busch@...el.com>,
        "Williams, Dan J" <dan.j.williams@...el.com>,
        "Wu, Fengguang" <fengguang.wu@...el.com>,
        "Du, Fan" <fan.du@...el.com>, "Huang, Ying" <ying.huang@...el.com>,
        linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 06/10] mm: vmscan: demote anon DRAM pages to PMEM node

On 3/27/19 10:48 AM, Zi Yan wrote:
> For 40MB/s vs 750MB/s, they were using sys_migrate_pages(). Sorry
> about the confusion there. As I measure only the migrate_pages() in
> the kernel, the throughput becomes: migrating 4KB page: 0.312GB/s
> vs migrating 512 4KB pages: 0.854GB/s. They are still >2x
> difference.
> 
> Furthermore, if we only consider the migrate_page_copy() in
> mm/migrate.c, which only calls copy_highpage() and
> migrate_page_states(), the throughput becomes: migrating 4KB page:
> 1.385GB/s vs migrating 512 4KB pages: 1.983GB/s. The gap is
> smaller, but migrating 512 4KB pages still achieves 40% more 
> throughput.
> 
> Do these numbers make sense to you?

Yes.  It would be very interesting to batch the migrations in the
kernel and see how it affects the code.  A 50% boost is interesting,
but not if it's only in microbenchmarks and takes 2k lines of code.

50% is *very* interesting if it happens in the real world and we can
do it in 10 lines of code.

So, let's see what the code looks like.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ