linux-kernel - Re: Very slow unlockall()

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <36e72f8c-3e23-ec48-d8c5-402dc8cfb9c9@gmail.com>
Date:   Sun, 31 Jan 2021 18:22:31 +0100
From:   Milan Broz <gmazyland@...il.com>
To:     linux-mm@...ck.org
Cc:     Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Mikulas Patocka <mpatocka@...hat.com>,
        Michal Hocko <mhocko@...e.com>
Subject: Re: Very slow unlockall()

On 08/01/2021 15:39, Milan Broz wrote:
> On 08/01/2021 14:41, Michal Hocko wrote:
>> On Wed 06-01-21 16:20:15, Milan Broz wrote:
>>> Hi,
>>>
>>> we use mlockall(MCL_CURRENT | MCL_FUTURE) / munlockall() in cryptsetup code
>>> and someone tried to use it with hardened memory allocator library.
>>>
>>> Execution time was increased to extreme (minutes) and as we found, the problem
>>> is in munlockall().
>>>
>>> Here is a plain reproducer for the core without any external code - it takes
>>> unlocking on Fedora rawhide kernel more than 30 seconds!
>>> I can reproduce it on 5.10 kernels and Linus' git.
>>>
>>> The reproducer below tries to mmap large amount memory with PROT_NONE (later never used).
>>> The real code of course does something more useful but the problem is the same.
>>>
>>> #include <stdio.h>
>>> #include <stdlib.h>
>>> #include <fcntl.h>
>>> #include <sys/mman.h>
>>>
>>> int main (int argc, char *argv[])
>>> {
>>>         void *p  = mmap(NULL, 1UL << 41, PROT_NONE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
>>>
>>>         if (p == MAP_FAILED) return 1;
>>>
>>>         if (mlockall(MCL_CURRENT | MCL_FUTURE)) return 1;
>>>         printf("locked\n");
>>>
>>>         if (munlockall()) return 1;
>>>         printf("unlocked\n");
>>>
>>>         return 0;
>>> }
>>>
>>> In traceback I see that time is spent in munlock_vma_pages_range.
>>>
>>> [ 2962.006813] Call Trace:
>>> [ 2962.006814]  ? munlock_vma_pages_range+0xe7/0x4b0
>>> [ 2962.006814]  ? vma_merge+0xf3/0x3c0
>>> [ 2962.006815]  ? mlock_fixup+0x111/0x190
>>> [ 2962.006815]  ? apply_mlockall_flags+0xa7/0x110
>>> [ 2962.006816]  ? __do_sys_munlockall+0x2e/0x60
>>> [ 2962.006816]  ? do_syscall_64+0x33/0x40
>>> ...
>>>
>>> Or with perf, I see
>>>
>>> # Overhead  Command  Shared Object      Symbol                               
>>> # ........  .......  .................  .....................................
>>> #
>>>     48.18%  lock     [kernel.kallsyms]  [k] lock_is_held_type
>>>     11.67%  lock     [kernel.kallsyms]  [k] ___might_sleep
>>>     10.65%  lock     [kernel.kallsyms]  [k] follow_page_mask
>>>      9.17%  lock     [kernel.kallsyms]  [k] debug_lockdep_rcu_enabled
>>>      6.73%  lock     [kernel.kallsyms]  [k] munlock_vma_pages_range
>>> ...
>>>
>>>
>>> Could please anyone check what's wrong here with the memory locking code?
>>> Running it on my notebook I can effectively DoS the system :)
>>>
>>> Original report is https://gitlab.com/cryptsetup/cryptsetup/-/issues/617
>>> but this is apparently a kernel issue, just amplified by usage of munlockall().
>>
>> Which kernel version do you see this with? Have older releases worked
>> better?
> 
> Hi,
> 
> I tried 5.10 stable and randomly few kernels I have built on testing VM (5.3 was the oldest),
> it seems to be very similar run time, so the problem is apparently old...(I can test some specific kernel version if it make any sense).
> 
> For mainline (reproducer above):
> 
> With 5.11.0-0.rc2.20210106git36bbbd0e234d.117.fc34.x86_64 (latest Fedora rawhide kernel build - many debug options are on)
> 
> # time ./lock 
> locked
> unlocked
> 
> real    0m32.287s
> user    0m0.001s
> sys     0m32.126s
> 
> 
> Today's Linus git - 5.11.0-rc2+ in my testing x86_64 VM (no extensive kernel debug options):
> 
> # time ./lock
> locked
> unlocked
> 
> real    0m4.172s
> user    0m0.000s
> sys     0m4.172s
> 
> m.

Hi,

so because there is no response, is this expected behavior of memory management subsystem then?

Thanks,
Milan