lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 16 Aug 2021 21:15:03 +0200
From:   David Hildenbrand <david@...hat.com>
To:     Yang Shi <shy828301@...il.com>, naoya.horiguchi@....com,
        osalvador@...e.de, tdmackey@...tter.com, akpm@...ux-foundation.org,
        corbet@....net
Cc:     linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 1/2] mm: hwpoison: don't drop slab caches for offlining
 non-LRU page

On 16.08.21 20:09, Yang Shi wrote:
> In the current implementation of soft offline, if non-LRU page is met,
> all the slab caches will be dropped to free the page then offline.  But
> if the page is not slab page all the effort is wasted in vain.  Even
> though it is a slab page, it is not guaranteed the page could be freed
> at all.

... but there is a chance it could be and the current behavior is 
actually helpful in some setups.

[...]

> The lockup made the machine is quite unusable.  And it also made the
> most workingset gone, the reclaimabled slab caches were reduced from 12G
> to 300MB, the page caches were decreased from 17G to 4G.
> 
> But the most disappointing thing is all the effort doesn't make the page
> offline, it just returns:
> 
> soft_offline: 0x1469f2: unknown non LRU page type 5ffff0000000000 ()
> 

In your example, yes. I had a look at the introducing commit: 
facb6011f399 ("HWPOISON: Add soft page offline support")

"
     When the page is not free or LRU we try to free pages
     from slab and other caches. The slab freeing is currently
     quite dumb and does not try to focus on the specific slab
     cache which might own the page. This could be potentially
     improved later.
"

I wonder, if instead of removing it altogether, we could actually 
improve it as envisioned.

To be precise, for alloc_contig_range() it would also make sense to be 
able to shrink only in a specific physical memory range; this here seems 
to be a similar thing. (actually, alloc_contig_range(), actual memory 
offlining and hw poisoning/soft-offlining have a lot in common)

Unfortunately, the last time I took a brief look at teaching shrinkers 
to be range-aware, it turned out to be a lot of work ... so maybe this 
is really a long term goal to be mitigated in the meantime by disabling 
it, if it turns out to be more of a problem than actually help.

-- 
Thanks,

David / dhildenb

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ