lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 26 Jun 2020 14:07:07 +0900
From:   Joonsoo Kim <js1304@...il.com>
To:     Matthew Wilcox <willy@...radead.org>
Cc:     Andrew Morton <akpm@...ux-foundation.org>,
        Linux Memory Management List <linux-mm@...ck.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Johannes Weiner <hannes@...xchg.org>,
        Michal Hocko <mhocko@...nel.org>,
        Hugh Dickins <hughd@...gle.com>,
        Minchan Kim <minchan@...nel.org>,
        Vlastimil Babka <vbabka@...e.cz>,
        Mel Gorman <mgorman@...hsingularity.net>, kernel-team@....com
Subject: Re: [PATCH v6 4/6] mm/swapcache: support to handle the exceptional
 entries in swapcache

2020년 6월 19일 (금) 오전 10:33, Joonsoo Kim <js1304@...il.com>님이 작성:
>
> On Wed, Jun 17, 2020 at 05:17:17AM -0700, Matthew Wilcox wrote:
> > On Wed, Jun 17, 2020 at 02:26:21PM +0900, js1304@...il.com wrote:
> > > From: Joonsoo Kim <iamjoonsoo.kim@....com>
> > >
> > > Swapcache doesn't handle the exceptional entries since there is no case
> >
> > Don't call them exceptional entries.
> >
> > The radix tree has/had the concecpt of exceptional entries.  The swapcache
> > doesn't use the radix tree any more, it uses the XArray.  The XArray
> > has value entries.
> >
> > But you shouldn't call them value entries either; that's an XArray
> > concept.  The swap cache and indeed page cache use value entries to
> > implement shadow entries (they're also used to implement dax entries and
> > swap entries in the page cache).  So just call them shadow entries here.
> >
> > I know there are still places which use the term 'nrexceptional' in
> > the kernel.  I just haven't got round to replacing them yet.  Please
> > don't add more.
>
> Okay! Thanks for commenting.
>
> >
> > > +void clear_shadow_from_swap_cache(int type, unsigned long begin,
> > > +                           unsigned long end)
> > > +{
> > > +   unsigned long curr;
> > > +   void *old;
> > > +   swp_entry_t entry = swp_entry(type, begin);
> > > +   struct address_space *address_space = swap_address_space(entry);
> > > +   XA_STATE(xas, &address_space->i_pages, begin);
> > > +
> > > +retry:
> > > +   xa_lock_irq(&address_space->i_pages);
> > > +   for (curr = begin; curr <= end; curr++) {
> > > +           entry = swp_entry(type, curr);
> > > +           if (swap_address_space(entry) != address_space) {
> > > +                   xa_unlock_irq(&address_space->i_pages);
> > > +                   address_space = swap_address_space(entry);
> > > +                   begin = curr;
> > > +                   xas_set(&xas, begin);
> > > +                   goto retry;
> > > +           }
> > > +
> > > +           old = xas_load(&xas);
> > > +           if (!xa_is_value(old))
> > > +                   continue;
> > > +           xas_store(&xas, NULL);
> > > +           address_space->nrexceptional--;
> > > +           xas_next(&xas);
> > > +   }
> > > +   xa_unlock_irq(&address_space->i_pages);
> > > +}
> >
> > This is a very clunky loop.  I'm not sure it's even right, given that
> > you change address space without changing the xas's address space.  How
> > about this?
>
> You are correct. The xas's address space should be changed.
>
>
> >       for (;;) {
> >               XA_STATE(xas, &address_space->i_pages, begin);
> >               unsigned long nr_shadows = 0;
> >
> >               xas_lock_irq(&xas);
> >               xas_for_each(&xas, entry, end) {
> >                       if (!xa_is_value(entry))
> >                               continue;
> >                       xas_store(&xas, NULL);
> >                       nr_shadows++;
> >               }
> >               address_space->nr_exceptionals -= nr_shadows;
> >               xas_unlock_irq(&xas);
> >
> >               if (xas.xa_index >= end)
> >                       break;
> >               entry = swp_entry(type, xas.xa_index);
> >               address_space = swap_address_space(entry);
> >       }
>
> Thanks for suggestion.
>
> I make a patch based on your suggestion. IIUC about Xarray,
> after running xas_for_each(), xas.xa_index can be less than the end if
> there are empty slots on last portion of array. Handling this case is
> also considered in my patch.

Hello, Matthew.

Could you check if the following patch (Xarray part) is correct?
Since I made a patch based on your suggestion, I'd like to get your review. :)

Thanks.

> Thanks.
>
> ------------------->8--------------------------------
> From 72e97600ea294372a13ab8e208ebd3c0e1889408 Mon Sep 17 00:00:00 2001
> From: Joonsoo Kim <iamjoonsoo.kim@....com>
> Date: Fri, 15 Nov 2019 09:48:32 +0900
> Subject: [PATCH v6 4/6] mm/swapcache: support to handle the shadow entries
>
> Workingset detection for anonymous page will be implemented in the
> following patch and it requires to store the shadow entries into the
> swapcache. This patch implements an infrastructure to store the shadow
> entry in the swapcache.
>
> Acked-by: Johannes Weiner <hannes@...xchg.org>
> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@....com>
> ---
>  include/linux/swap.h | 17 ++++++++++++----
>  mm/shmem.c           |  3 ++-
>  mm/swap_state.c      | 57 ++++++++++++++++++++++++++++++++++++++++++++++------
>  mm/swapfile.c        |  2 ++
>  mm/vmscan.c          |  2 +-
>  5 files changed, 69 insertions(+), 12 deletions(-)
>
> diff --git a/include/linux/swap.h b/include/linux/swap.h
> index f4f5f94..901da54 100644
> --- a/include/linux/swap.h
> +++ b/include/linux/swap.h
> @@ -416,9 +416,13 @@ extern struct address_space *swapper_spaces[];
>  extern unsigned long total_swapcache_pages(void);
>  extern void show_swap_cache_info(void);
>  extern int add_to_swap(struct page *page);
> -extern int add_to_swap_cache(struct page *, swp_entry_t, gfp_t);
> -extern void __delete_from_swap_cache(struct page *, swp_entry_t entry);
> +extern int add_to_swap_cache(struct page *page, swp_entry_t entry,
> +                       gfp_t gfp, void **shadowp);
> +extern void __delete_from_swap_cache(struct page *page,
> +                       swp_entry_t entry, void *shadow);
>  extern void delete_from_swap_cache(struct page *);
> +extern void clear_shadow_from_swap_cache(int type, unsigned long begin,
> +                               unsigned long end);
>  extern void free_page_and_swap_cache(struct page *);
>  extern void free_pages_and_swap_cache(struct page **, int);
>  extern struct page *lookup_swap_cache(swp_entry_t entry,
> @@ -572,13 +576,13 @@ static inline int add_to_swap(struct page *page)
>  }
>
>  static inline int add_to_swap_cache(struct page *page, swp_entry_t entry,
> -                                                       gfp_t gfp_mask)
> +                                       gfp_t gfp_mask, void **shadowp)
>  {
>         return -1;
>  }
>
>  static inline void __delete_from_swap_cache(struct page *page,
> -                                                       swp_entry_t entry)
> +                                       swp_entry_t entry, void *shadow)
>  {
>  }
>
> @@ -586,6 +590,11 @@ static inline void delete_from_swap_cache(struct page *page)
>  {
>  }
>
> +static inline void clear_shadow_from_swap_cache(int type, unsigned long begin,
> +                               unsigned long end)
> +{
> +}
> +
>  static inline int page_swapcount(struct page *page)
>  {
>         return 0;
> diff --git a/mm/shmem.c b/mm/shmem.c
> index a0dbe62..e9a99a2 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -1374,7 +1374,8 @@ static int shmem_writepage(struct page *page, struct writeback_control *wbc)
>                 list_add(&info->swaplist, &shmem_swaplist);
>
>         if (add_to_swap_cache(page, swap,
> -                       __GFP_HIGH | __GFP_NOMEMALLOC | __GFP_NOWARN) == 0) {
> +                       __GFP_HIGH | __GFP_NOMEMALLOC | __GFP_NOWARN,
> +                       NULL) == 0) {
>                 spin_lock_irq(&info->lock);
>                 shmem_recalc_inode(inode);
>                 info->swapped++;
> diff --git a/mm/swap_state.c b/mm/swap_state.c
> index 1050fde..49a66dc 100644
> --- a/mm/swap_state.c
> +++ b/mm/swap_state.c
> @@ -110,12 +110,14 @@ void show_swap_cache_info(void)
>   * add_to_swap_cache resembles add_to_page_cache_locked on swapper_space,
>   * but sets SwapCache flag and private instead of mapping and index.
>   */
> -int add_to_swap_cache(struct page *page, swp_entry_t entry, gfp_t gfp)
> +int add_to_swap_cache(struct page *page, swp_entry_t entry,
> +                       gfp_t gfp, void **shadowp)
>  {
>         struct address_space *address_space = swap_address_space(entry);
>         pgoff_t idx = swp_offset(entry);
>         XA_STATE_ORDER(xas, &address_space->i_pages, idx, compound_order(page));
>         unsigned long i, nr = hpage_nr_pages(page);
> +       void *old;
>
>         VM_BUG_ON_PAGE(!PageLocked(page), page);
>         VM_BUG_ON_PAGE(PageSwapCache(page), page);
> @@ -125,16 +127,25 @@ int add_to_swap_cache(struct page *page, swp_entry_t entry, gfp_t gfp)
>         SetPageSwapCache(page);
>
>         do {
> +               unsigned long nr_shadows = 0;
> +
>                 xas_lock_irq(&xas);
>                 xas_create_range(&xas);
>                 if (xas_error(&xas))
>                         goto unlock;
>                 for (i = 0; i < nr; i++) {
>                         VM_BUG_ON_PAGE(xas.xa_index != idx + i, page);
> +                       old = xas_load(&xas);
> +                       if (xa_is_value(old)) {
> +                               nr_shadows++;
> +                               if (shadowp)
> +                                       *shadowp = old;
> +                       }
>                         set_page_private(page + i, entry.val + i);
>                         xas_store(&xas, page);
>                         xas_next(&xas);
>                 }
> +               address_space->nrexceptional -= nr_shadows;
>                 address_space->nrpages += nr;
>                 __mod_node_page_state(page_pgdat(page), NR_FILE_PAGES, nr);
>                 ADD_CACHE_INFO(add_total, nr);
> @@ -154,7 +165,8 @@ int add_to_swap_cache(struct page *page, swp_entry_t entry, gfp_t gfp)
>   * This must be called only on pages that have
>   * been verified to be in the swap cache.
>   */
> -void __delete_from_swap_cache(struct page *page, swp_entry_t entry)
> +void __delete_from_swap_cache(struct page *page,
> +                       swp_entry_t entry, void *shadow)
>  {
>         struct address_space *address_space = swap_address_space(entry);
>         int i, nr = hpage_nr_pages(page);
> @@ -166,12 +178,14 @@ void __delete_from_swap_cache(struct page *page, swp_entry_t entry)
>         VM_BUG_ON_PAGE(PageWriteback(page), page);
>
>         for (i = 0; i < nr; i++) {
> -               void *entry = xas_store(&xas, NULL);
> +               void *entry = xas_store(&xas, shadow);
>                 VM_BUG_ON_PAGE(entry != page, entry);
>                 set_page_private(page + i, 0);
>                 xas_next(&xas);
>         }
>         ClearPageSwapCache(page);
> +       if (shadow)
> +               address_space->nrexceptional += nr;
>         address_space->nrpages -= nr;
>         __mod_node_page_state(page_pgdat(page), NR_FILE_PAGES, -nr);
>         ADD_CACHE_INFO(del_total, nr);
> @@ -208,7 +222,7 @@ int add_to_swap(struct page *page)
>          * Add it to the swap cache.
>          */
>         err = add_to_swap_cache(page, entry,
> -                       __GFP_HIGH|__GFP_NOMEMALLOC|__GFP_NOWARN);
> +                       __GFP_HIGH|__GFP_NOMEMALLOC|__GFP_NOWARN, NULL);
>         if (err)
>                 /*
>                  * add_to_swap_cache() doesn't return -EEXIST, so we can safely
> @@ -246,13 +260,44 @@ void delete_from_swap_cache(struct page *page)
>         struct address_space *address_space = swap_address_space(entry);
>
>         xa_lock_irq(&address_space->i_pages);
> -       __delete_from_swap_cache(page, entry);
> +       __delete_from_swap_cache(page, entry, NULL);
>         xa_unlock_irq(&address_space->i_pages);
>
>         put_swap_page(page, entry);
>         page_ref_sub(page, hpage_nr_pages(page));
>  }
>
> +void clear_shadow_from_swap_cache(int type, unsigned long begin,
> +                               unsigned long end)
> +{
> +       unsigned long curr = begin;
> +       void *old;
> +
> +       for (;;) {
> +               unsigned long nr_shadows = 0;
> +               swp_entry_t entry = swp_entry(type, curr);
> +               struct address_space *address_space = swap_address_space(entry);
> +               XA_STATE(xas, &address_space->i_pages, curr);
> +
> +               xa_lock_irq(&address_space->i_pages);
> +               xas_for_each(&xas, old, end) {
> +                       if (!xa_is_value(old))
> +                               continue;
> +                       xas_store(&xas, NULL);
> +                       nr_shadows++;
> +               }
> +               address_space->nrexceptional -= nr_shadows;
> +               xa_unlock_irq(&address_space->i_pages);
> +
> +               /* search the next swapcache until we meet end */
> +               curr >>= SWAP_ADDRESS_SPACE_SHIFT;
> +               curr++;
> +               curr <<= SWAP_ADDRESS_SPACE_SHIFT;
> +               if (curr > end)
> +                       break;
> +       }
> +}
> +
>  /*
>   * If we are the only user, then try to free up the swap cache.
>   *
> @@ -429,7 +474,7 @@ struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask,
>         __SetPageSwapBacked(page);
>
>         /* May fail (-ENOMEM) if XArray node allocation failed. */
> -       if (add_to_swap_cache(page, entry, gfp_mask & GFP_KERNEL)) {
> +       if (add_to_swap_cache(page, entry, gfp_mask & GFP_KERNEL, NULL)) {
>                 put_swap_page(page, entry);
>                 goto fail_unlock;
>         }
> diff --git a/mm/swapfile.c b/mm/swapfile.c
> index 38f6433..4630db1 100644
> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -696,6 +696,7 @@ static void add_to_avail_list(struct swap_info_struct *p)
>  static void swap_range_free(struct swap_info_struct *si, unsigned long offset,
>                             unsigned int nr_entries)
>  {
> +       unsigned long begin = offset;
>         unsigned long end = offset + nr_entries - 1;
>         void (*swap_slot_free_notify)(struct block_device *, unsigned long);
>
> @@ -721,6 +722,7 @@ static void swap_range_free(struct swap_info_struct *si, unsigned long offset,
>                         swap_slot_free_notify(si->bdev, offset);
>                 offset++;
>         }
> +       clear_shadow_from_swap_cache(si->type, begin, end);
>  }
>
>  static void set_cluster_next(struct swap_info_struct *si, unsigned long next)
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 3caa35f..37943bf 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -901,7 +901,7 @@ static int __remove_mapping(struct address_space *mapping, struct page *page,
>         if (PageSwapCache(page)) {
>                 swp_entry_t swap = { .val = page_private(page) };
>                 mem_cgroup_swapout(page, swap);
> -               __delete_from_swap_cache(page, swap);
> +               __delete_from_swap_cache(page, swap, NULL);
>                 xa_unlock_irqrestore(&mapping->i_pages, flags);
>                 put_swap_page(page, swap);
>                 workingset_eviction(page, target_memcg);
> --
> 2.7.4
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ