linux-kernel - Re: [RFC v10 PATCH 0/3] mm: zap pages with read mmap

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <d00aea15-cf08-1980-dcdf-bf24334e6848@linux.alibaba.com>
Date:   Mon, 17 Sep 2018 13:00:58 -0700
From:   Yang Shi <yang.shi@...ux.alibaba.com>
To:     Matthew Wilcox <willy@...radead.org>
Cc:     mhocko@...nel.org, ldufour@...ux.vnet.ibm.com, vbabka@...e.cz,
        kirill@...temov.name, akpm@...ux-foundation.org,
        dave.hansen@...el.com, oleg@...hat.com, srikar@...ux.vnet.ibm.com,
        linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: [RFC v10 PATCH 0/3] mm: zap pages with read mmap_sem in munmap
 for large mapping



On 9/15/18 3:10 AM, Matthew Wilcox wrote:
> On Sat, Sep 15, 2018 at 04:34:56AM +0800, Yang Shi wrote:
>> Regression and performance data:
>> Did the below regression test with setting thresh to 4K manually in the code:
>>    * Full LTP
>>    * Trinity (munmap/all vm syscalls)
>>    * Stress-ng: mmap/mmapfork/mmapfixed/mmapaddr/mmapmany/vm
>>    * mm-tests: kernbench, phpbench, sysbench-mariadb, will-it-scale
>>    * vm-scalability
>>
>> With the patches, exclusive mmap_sem hold time when munmap a 80GB address
>> space on a machine with 32 cores of E5-2680 @ 2.70GHz dropped to us level
>> from second.
>>
>> munmap_test-15002 [008]   594.380138: funcgraph_entry: |  __vm_munmap {
>> munmap_test-15002 [008]   594.380146: funcgraph_entry:      !2485684 us |    unmap_region();
>> munmap_test-15002 [008]   596.865836: funcgraph_exit:       !2485692 us |  }
>>
>> Here the excution time of unmap_region() is used to evaluate the time of
>> holding read mmap_sem, then the remaining time is used with holding
>> exclusive lock.
> Something I've been wondering about for a while is whether we should "sort"
> the readers together.  ie if the acquirers look like this:
>
> A write
> B read
> C read
> D write
> E read
> F read
> G write
>
> then we should grant the lock to A, BCEF, D, G rather than A, BC, D, EF, G.

I'm not sure how much this can help to the real world workload.

Typically, there are multi threads to contend for one mmap_sem. So, they 
are trying to read/write the same address space. There might be 
dependency or synchronization among them. Sorting read together might 
break the dependency?

Thanks,
Yang

> A quick way to test this is in __rwsem_down_read_failed_common do
> something like:
>
> -	if (list_empty(&sem->wait_list))
> +	if (list_empty(&sem->wait_list)) {
>   		adjustment += RWSEM_WAITING_BIAS;
> +		list_add(&waiter.list, &sem->wait_list);
> +	} else {
> +		struct rwsem_waiter *first = list_first_entry(&sem->wait_list,
> +						struct rwsem_waiter, list);
> +		if (first.type == RWSEM_WAITING_FOR_READ)
> +			list_add(&waiter.list, &sem->wait_list);
> +		else
> +			list_add_tail(&waiter.list, &sem->wait_list);
> +	}
> -	list_add_tail(&waiter.list, &sem->wait_list);
>
> It'd be interesting to know if this makes any difference with your tests.
>
> (this isn't perfect, of course; it'll fail to sort readers together if there's
> a writer at the head of the queue; eg:
>
> A write
> B write
> C read
> D write
> E read
> F write
> G read
>
> but it won't do any worse than we have at the moment).