linux-kernel - Re: [PATCH v2 01/11] mm/vmstat: remove remote node draining

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <3329f63e-5671-1500-0730-cd46ba461d04@redhat.com>
Date:   Thu, 2 Mar 2023 11:10:03 +0100
From:   David Hildenbrand <david@...hat.com>
To:     Marcelo Tosatti <mtosatti@...hat.com>
Cc:     Christoph Lameter <cl@...ux.com>,
        Aaron Tomlin <atomlin@...mlin.com>,
        Frederic Weisbecker <frederic@...nel.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        linux-kernel@...r.kernel.org, linux-mm@...ck.org,
        Mel Gorman <mgorman@...e.de>, Vlastimil Babka <vbabka@...e.cz>
Subject: Re: [PATCH v2 01/11] mm/vmstat: remove remote node draining

[...]

> 
>> (2) drain_zone_pages() documents that we're draining the PCP
>>      (bulk-freeing them) of the current CPU on remote nodes. That bulk-
>>      freeing will properly adjust free memory counters. What exactly is
>>      the impact when no longer doing that? Won't the "snapshot" of some
>>      counters eventually be wrong? Do we care?
> 
> Don't see why the snapshot of counters will be wrong.
> 
> Instead of freeing pages on pcp list of remote nodes after they are
> considered idle ("3 seconds idle till flush"), what will happen is that
> drain_all_pages() will free those pcps, for example after an allocation
> fails on direct reclaim:
> 
>          page = get_page_from_freelist(gfp_mask, order, alloc_flags, ac);
> 
>          /*
>           * If an allocation failed after direct reclaim, it could be because
>           * pages are pinned on the per-cpu lists or in high alloc reserves.
>           * Shrink them and try again
>           */
>          if (!page && !drained) {
>                  unreserve_highatomic_pageblock(ac, false);
>                  drain_all_pages(NULL);
>                  drained = true;
>                  goto retry;
>          }
> 
> In both cases the pages are freed (and counters maintained) here:
> 
> static inline void __free_one_page(struct page *page,
>                  unsigned long pfn,
>                  struct zone *zone, unsigned int order,
>                  int migratetype, fpi_t fpi_flags)
> {
>          struct capture_control *capc = task_capc(zone);
>          unsigned long buddy_pfn = 0;
>          unsigned long combined_pfn;
>          struct page *buddy;
>          bool to_tail;
> 
>          VM_BUG_ON(!zone_is_initialized(zone));
>          VM_BUG_ON_PAGE(page->flags & PAGE_FLAGS_CHECK_AT_PREP, page);
> 
>          VM_BUG_ON(migratetype == -1);
>          if (likely(!is_migrate_isolate(migratetype)))
>                  __mod_zone_freepage_state(zone, 1 << order, migratetype);
> 
>          VM_BUG_ON_PAGE(pfn & ((1 << order) - 1), page);
>          VM_BUG_ON_PAGE(bad_range(zone, page), page);
> 
>          while (order < MAX_ORDER - 1) {
>                  if (compaction_capture(capc, page, order, migratetype)) {
>                          __mod_zone_freepage_state(zone, -(1 << order),
>                                                                  migratetype);
>                          return;
>                  }
> 
>> Describing the difference between instructed refresh of vmstat and "remotely
>> drain per-cpu lists" in order to move free memory from the pcp to the buddy
>> would be great.
> 
> The difference is that now remote PCPs will be drained on demand, either via
> kcompactd or direct reclaim (through drain_all_pages), when memory is
> low.
> 
> For example, with the following test:
> 
> dd if=/dev/zero of=file bs=1M count=32000 on a tmpfs filesystem:
> 
>        kcompactd0-116     [005] ...1 228232.042873: drain_all_pages <-kcompactd_do_work
>        kcompactd0-116     [005] ...1 228232.042873: __drain_all_pages <-kcompactd_do_work
>                dd-479485  [003] ...1 228232.455130: __drain_all_pages <-__alloc_pages_slowpath.constprop.0
>                dd-479485  [011] ...1 228232.721994: __drain_all_pages <-__alloc_pages_slowpath.constprop.0
>       gnome-shell-3750    [015] ...1 228232.723729: __drain_all_pages <-__alloc_pages_slowpath.constprop.0
> 
> The commit message was indeed incorrect. Updated one:
> 
> "mm/vmstat: remove remote node draining
> 
> Draining of pages from the local pcp for a remote zone should not be
> necessary, since once the system is low on memory (or compaction on a
> zone is in effect), drain_all_pages should be called freeing any unused
> pcps."
> 
> Thanks!

Thanks for the explanation, that makes sense to me. Feel free to add my

Acked-by: David Hildenbrand <david@...hat.com>

... hoping that some others (Mel, Vlastimil?) can have another look.

-- 
Thanks,

David / dhildenb