lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20231031023704.GA39338@system.software.com>
Date:   Tue, 31 Oct 2023 11:37:04 +0900
From:   Byungchul Park <byungchul@...com>
To:     Dave Hansen <dave.hansen@...el.com>
Cc:     linux-kernel@...r.kernel.org, linux-mm@...ck.org,
        kernel_team@...ynix.com, akpm@...ux-foundation.org,
        ying.huang@...el.com, namit@...are.com, xhao@...ux.alibaba.com,
        mgorman@...hsingularity.net, hughd@...gle.com, willy@...radead.org,
        david@...hat.com, peterz@...radead.org, luto@...nel.org,
        tglx@...utronix.de, mingo@...hat.com, bp@...en8.de,
        dave.hansen@...ux.intel.com
Subject: Re: [v3 0/3] Reduce TLB flushes under some specific conditions

On Mon, Oct 30, 2023 at 10:55:07AM -0700, Dave Hansen wrote:
> On 10/30/23 00:25, Byungchul Park wrote:
> > I'm suggesting a mechanism to reduce TLB flushes by keeping source and
> > destination of folios participated in the migrations until all TLB
> > flushes required are done, only if those folios are not mapped with
> > write permission PTE entries at all. I worked Based on v6.6-rc5.
> 
> There's a lot of common overhead here, on top of the complexity in general:
> 
>  * A new page flag
>  * A new cpumask_t in task_struct
>  * A new zone list
>  * Extra (temporary) memory consumption
> 
> and the benefits are ... "performance improved a little bit" on one
> workload.  That doesn't seem like a good overall tradeoff to me.

I tested it under limited conditions to get stable results e.g. not to
use hyper-thread, dedicate cpu times to the test and so on. However, I'm
conviced that this patch set is more worth developing than you think it
is. Lemme share the results I've just got after changing # of CPUs
participated in the test, 16 -> 80, in the system with 80 CPUs. This is
just for your information - not that stable tho.

	Byungchul

---

Architecture - x86_64                                               
QEMU - kvm enabled, host cpu                                        
Numa - 2 nodes (80 CPUs 1GB, no CPUs 8GB)                           
Linux Kernel - v6.6-rc5, numa balancing tiering on, demotion enabled
Benchmark - XSBench -p 50000000 (-p option makes the runtime longer)

mainline kernel
===============

   The 1st try)
   =====================================
   Threads:     64                      
   Runtime:     233.118 seconds         
   =====================================
   numa_pages_migrated 758334           
   pgmigrate_success 1724964            
   nr_tlb_remote_flush 305706           
   nr_tlb_remote_flush_received 18598543
   nr_tlb_local_flush_all 19092         
   nr_tlb_local_flush_one 4518717       
   
   The 2nd try)
   =====================================
   Threads:     64                      
   Runtime:     221.725 seconds         
   =====================================
   numa_pages_migrated 633209           
   pgmigrate_success 2156509            
   nr_tlb_remote_flush 261977           
   nr_tlb_remote_flush_received 14289256
   nr_tlb_local_flush_all 11738         
   nr_tlb_local_flush_one 4520317       
   
mainline kernel + migrc
=======================

   The 1st try)
   =====================================
   Threads:     64                     
   Runtime:     212.522 seconds        
   ====================================
   numa_pages_migrated 901264          
   pgmigrate_success 1990814           
   nr_tlb_remote_flush 151280          
   nr_tlb_remote_flush_received 9031376
   nr_tlb_local_flush_all 21208        
   nr_tlb_local_flush_one 4519595      
   
   The 2nd try)
   =====================================
   Threads:     64                     
   Runtime:     204.410 seconds        
   ====================================
   numa_pages_migrated 929260          
   pgmigrate_success 2729868           
   nr_tlb_remote_flush 166722          
   nr_tlb_remote_flush_received 8238273
   nr_tlb_local_flush_all 13717        
   nr_tlb_local_flush_one 4519582      

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ