lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CACePvbWpbGa9w3MNsATYHMcTSkzOu6OWw6tdiGS_=PdXYXzH1w@mail.gmail.com>
Date: Tue, 26 Nov 2024 16:17:03 -0800
From: Chris Li <chrisl@...nel.org>
To: Barry Song <21cnbao@...il.com>
Cc: chenridong <chenridong@...wei.com>, Matthew Wilcox <willy@...radead.org>, 
	Chen Ridong <chenridong@...weicloud.com>, akpm@...ux-foundation.org, mhocko@...e.com, 
	hannes@...xchg.org, yosryahmed@...gle.com, yuzhao@...gle.com, 
	david@...hat.com, ryan.roberts@....com, linux-mm@...ck.org, 
	linux-kernel@...r.kernel.org, wangweiyang2@...wei.com, xieym_ict@...mail.com, 
	Kairui Song <ryncsn@...il.com>
Subject: Re: [RFC PATCH v2 1/1] mm/vmscan: move the written-back folios to the
 tail of LRU after shrinking

On Mon, Nov 18, 2024 at 1:56 AM Barry Song <21cnbao@...il.com> wrote:
>
> On Mon, Nov 18, 2024 at 10:41 PM chenridong <chenridong@...wei.com> wrote:
> >
> >
> >
> > On 2024/11/18 12:14, Barry Song wrote:
> > > On Mon, Nov 18, 2024 at 5:03 PM Matthew Wilcox <willy@...radead.org> wrote:
> > >>
> > >> On Sat, Nov 16, 2024 at 09:16:58AM +0000, Chen Ridong wrote:
> > >>> 2. In shrink_page_list function, if folioN is THP(2M), it may be splited
> > >>>    and added to swap cache folio by folio. After adding to swap cache,
> > >>>    it will submit io to writeback folio to swap, which is asynchronous.
> > >>>    When shrink_page_list is finished, the isolated folios list will be
> > >>>    moved back to the head of inactive lru. The inactive lru may just look
> > >>>    like this, with 512 filioes have been move to the head of inactive lru.
> > >>
> > >> I was hoping that we'd be able to stop splitting the folio when adding
> > >> to the swap cache.  Ideally. we'd add the whole 2MB and write it back
> > >> as a single unit.
> > >
> > > This is already the case: adding to the swapcache doesn’t require splitting
> > > THPs, but failing to allocate 2MB of contiguous swap slots will.
> > >
> > >>
> > >> This is going to become much more important with memdescs.  We'd have to
> > >> allocate 512 struct folios to do this, which would be about 10 4kB pages,
> > >> and if we're trying to swap out memory, we're probably low on memory.
> > >>
> > >> So I don't like this solution you have at all because it doesn't help us
> > >> get to the solution we're going to need in about a year's time.
> > >>
> > >
> > > Ridong might need to clarify why this splitting is occurring. If it’s due to the
> > > failure to allocate swap slots, we still need a solution to address it.
> > >
> > > Thanks
> > > Barry
> >
> > shrink_folio_list
> >   add_to_swap
> >     folio_alloc_swap
> >       get_swap_pages
> >         scan_swap_map_slots
> >         /*
> >         * Swapfile is not block device or not using clusters so unable
> >         * to allocate large entries.
> >         */
> >         if (!(si->flags & SWP_BLKDEV) || !si->cluster_info)
> >           return 0;
> >
> > In my test, I use a file as swap, which is not 'SWP_BLKDEV'. So it
> > failed to get get_swap_pages.
>
> Alright, a proper non-rotating swap block device would be much
> better. In your case, though, cluster allocation isn’t supported.

Ah yes. The later part of the swap allocation series removes the non
cluster allocation code path.
It is not merged to mm-unstable yet. So even a swapfile not block
device will get the cluster allocator.

>
> >
> > I think this is a race issue between 'shrink_folio_list' executing and
> > writing back asynchronously. In my test, 512 folios(THP split) were
> > added to swap, only about 60 folios had not been written back when
> > 'move_folios_to_lru' was invoked after 'shrink_folio_list'. What if
> > writing back faster? Maybe this will happen even 32 folios(without THP)
> > are in the 'folio_list' of shrink_folio_list's inputs.
>
> On a real non-rotate swap device, the race condition would occur only when
> contiguous 2MB swap slots are unavailable.
>
> Hi Chris,
> I recall you mentioned unifying the code for swap devices and swap files, or
> for non-rotating and rotating devices. I assume a swap file (not a block device)
> would also be a practical user case?

I assume you mean non-SSD vs SSD device. In this follow up series of
the swap allocator from Kairui, the old non cluster allocator gets
removed, the cluster allocator will be used all the time.

https://lore.kernel.org/linux-mm/20241022192451.38138-4-ryncsn@gmail.com/

Chris

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ