lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date: Mon, 29 Jan 2024 10:21:36 +0800
From: zhiguojiang <justinjiang@...o.com>
To: Matthew Wilcox <willy@...radead.org>
Cc: Andrew Morton <akpm@...ux-foundation.org>, linux-mm@...ck.org,
 linux-kernel@...r.kernel.org, opensource.kernel@...o.com
Subject: Re: [PATCH] mm:vmscan: shrink skip folios in the exiting task



在 2024/1/28 14:35, Matthew Wilcox 写道:
> On Thu, Jan 25, 2024 at 09:34:53AM +0800, zhiguojiang wrote:
>>>> In the scenarios of the low memory system and mutil backed-applications,
>>>> the time-consuming problem caused by shrinking the exiting task's folios
>>>> will be more severe.
>>> What testing have you done of this patch?  How often does it happen?
>>> Are there particular workloads that benefit from this?  (I'm not sure
>>> what "mutil backed-applications" are?)
>> 1 Yes, this patch has been tested.
>>
>> 2 When the exiting tasks and shrink_inactive_list occur at the same time,
>>     the folios which shrink_inactive_list reclaims may be the exiting tasks's
>> folios
>>     in lruvecs. And when system is low memory, it more likely to occur,
>> because
>>     more backend applidatuions will be killed.
>>     The shrink_inactive_list reclaims the exiting tasks's folios in lruvecs
>> and
>>     transforms the exiting tasks's anon folios into swap memory, which leads
>>     to the increasing load of the current exiting tasks.
> Ah, we're talking about an OOM scenario.  OK, that makes some sense.
> You should have mentioned that in the patch description.
Hi,

1 Ok, I will update a more comprehensive description in next version.

2 I think this issue can occur not only in OOM scenario, but also in
   normal task exit scenario. So:

1) if (unlikely(!atomic_read(&vma->vm_mm->mm_users))) represents
   the scenario where the task exits normally.

2) if(test_bit(MMF_OOM_SKIP, &vma->vm_mm->flags)) represents the
    OOM Reaper scenario.

3 MMF_OOM_SKIP can only represent OOM scenario and cannot represent
   normal task exit scenario, as when MMF_OOM_SKIP is set in normal
   task exit scenario, the memory folios of the task have already been
   released. And the shrink_inactive_list should recognize these lru folios
   in exiting task before this exiting task releases its memory folios.

     tlb_gather_mmu_fullmm(&tlb, mm);
     /* update_hiwater_rss(mm) here? but nobody should be looking */
     /* Use ULONG_MAX here to ensure all VMAs in the mm are unmapped */
     unmap_vmas(&tlb, &mas, vma, 0, ULONG_MAX, ULONG_MAX, false);
     mmap_read_unlock(mm);

     /*
      * Set MMF_OOM_SKIP to hide this task from the oom killer/reaper
      * because the memory has been already freed.
      */
     set_bit(MMF_OOM_SKIP, &mm->flags);
>> 3 This patch can alleviate the load of the tasks exiting process. This patch
>>     can make that the exiting tasks release its anon folios faster instead of
>>     releasing its swap memory from its anon folios swap-in in
>> shrink_inactive_list.
>>
>> 4 "mutil backed-applications" means that there are a large number of
>>      the backend applications in the system.
>>
>> Thanks
>>> And I do mean specifically of this patch, because to my eyes it
>>> shouldn't even compile.
>> Has been tested.
> That's odd.  I thought GCC used to complain about
>
> 	long x = 0x100000000;
>
> but it does the right thing.  Except on 32-bit where it'll say
> "warning: integer constant out of range".
>
> In any case, the number you chose is not going to work on 32-bit systems
> or on ARM or x86.  It conflicts with protection keys on x86 and MTE on
> ARM.
You're right, I overlooked the situation with the 32-bit system.
> We can do it without any new magic numbers.  I'm not sure this is a
> great approach, but this should work:
>
> +++ b/mm/rmap.c
> @@ -840,6 +840,12 @@ static bool folio_referenced_one(struct folio *folio,
>          int referenced = 0;
>          unsigned long start = address, ptes = 0;
>
> +       /* Skip this folio if it's mapped by an exiting task */
> +       if (test_bit(MMF_OOM_SKIP, &vma->vm_mm->flags)) {
> +               pra->referenced = -1;
> +               return false;
> +       }
> +
>          while (page_vma_mapped_walk(&pvmw)) {
>                  address = pvmw.address;
>
I agree with your point of view. I think this is currently the best 
solution,
   but I think it also needs to be added with:
if (unlikely(!atomic_read(&vma->vm_mm->mm_users)))

Please help review it again.

Best Regards.



Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ