linux-kernel - Re: [PATCH v2] mm: swap: async free swap slot cache entries

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1fa1da19b0b929efec46bd02a6fc358fef1b9c42.camel@linux.intel.com>
Date: Mon, 05 Feb 2024 10:15:04 -0800
From: Tim Chen <tim.c.chen@...ux.intel.com>
To: Chris Li <chrisl@...nel.org>
Cc: "Huang, Ying" <ying.huang@...el.com>, Andrew Morton
 <akpm@...ux-foundation.org>, linux-kernel@...r.kernel.org,
 linux-mm@...ck.org,  Wei Xu <weixugc@...gle.com>, Yu
 Zhao <yuzhao@...gle.com>, Greg Thelen
 <gthelen@...gle.com>, Chun-Tse Shao <ctshao@...gle.com>, Suren
 Baghdasaryan <surenb@...gle.com>, Yosry
 Ahmed <yosryahmed@...gle.com>,  Brain Geffon
 <bgeffon@...gle.com>, Minchan Kim <minchan@...nel.org>, Michal Hocko
 <mhocko@...e.com>, Mel Gorman <mgorman@...hsingularity.net>, Nhat Pham
 <nphamcs@...il.com>, Johannes Weiner <hannes@...xchg.org>, Kairui Song
 <kasong@...cent.com>, Zhongkun He <hezhongkun.hzk@...edance.com>, Kemeng
 Shi <shikemeng@...weicloud.com>,  Barry Song <v-songbaohua@...o.com>
Subject: Re: [PATCH v2] mm: swap: async free swap slot cache entries

On Sat, 2024-02-03 at 10:12 -0800, Chris Li wrote:
> 
> > > >  {
> > > >     struct swap_slots_cache *cache;
> > > > @@ -282,17 +298,14 @@ void free_swap_slot(swp_entry_t entry)
> > > >                     goto direct_free;
> > > >             }
> > > >             if (cache->n_ret >= SWAP_SLOTS_CACHE_SIZE) {
> > > > -                   /*
> > > > -                    * Return slots to global pool.
> > > > -                    * The current swap_map value is SWAP_HAS_CACHE.
> > > > -                    * Set it to 0 to indicate it is available for
> > > > -                    * allocation in global pool
> > > > -                    */
> > > > -                   swapcache_free_entries(cache->slots_ret, cache->n_ret);
> > > > -                   cache->n_ret = 0;
> > > > +                   spin_unlock_irq(&cache->free_lock);
> > > > +                   schedule_work(&cache->async_free);
> > > > +                   goto direct_free;
> > > >             }
> > > >             cache->slots_ret[cache->n_ret++] = entry;
> > > >             spin_unlock_irq(&cache->free_lock);
> > > > +           if (cache->n_ret >= SWAP_SLOTS_CACHE_SIZE)
> > > > +                   schedule_work(&cache->async_free);
> > 
> > 
> > I have some concerns about the current patch with the change above.
> > We could hit the direct_free path very often.
> > 
> > By delaying the freeing of entries in the return
> > cache, we have to do more freeing of swap entry one at a time. When
> > we try to free an entry, we can find the return cache still full, waiting to be freed.
> 
> You are describing the async free is not working. In that case it will always
> hit the direct free path one by one.
> 
> > 
> > So we have fewer batch free of swap entries, resulting in an increase in
> > number of sis->lock acquisitions overall. This could have the
> > effect of reducing swap throughput overall when swap is under heavy
> > operations and sis->lock is contended.
> 
> I  can change the direct free path to free all entries. If the async
> free hasn't freed up the batch by the time the next swap fault comes in.
> The new swap fault will take the hit, just free the whole batch. It will behave
> closer to the original batch free behavior in this path.
> 
Will that negate the benefit you are looking for?

A hack is to double the SWAP_SLOTS_CACHE_SIZE to 128, and trigger the
background reclaim when entries reach 64. This will allow you to avoid
the one by one relcaim direct path and hopefully the delayed job
would have done its job when slots accumulate between 64 and 128.

However, I am unsure how well this hack is under really heavy
swap load.  It means that the background reclaim will need to 
work through a larger backlog and
hold the sis->lock longer. So if you hit the direct path while
the background reclaim is underway, you may have longer tail latency
to acquire the sis->lock. 

Tim