[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20190215131122.GA4525@dhcp22.suse.cz>
Date: Fri, 15 Feb 2019 14:11:22 +0100
From: Michal Hocko <mhocko@...nel.org>
To: "Huang, Ying" <ying.huang@...el.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>, linux-mm@...ck.org,
linux-kernel@...r.kernel.org, Hugh Dickins <hughd@...gle.com>,
"Paul E . McKenney" <paulmck@...ux.vnet.ibm.com>,
Minchan Kim <minchan@...nel.org>,
Johannes Weiner <hannes@...xchg.org>,
Tim Chen <tim.c.chen@...ux.intel.com>,
Mel Gorman <mgorman@...hsingularity.net>,
Jérôme Glisse <jglisse@...hat.com>,
Andrea Arcangeli <aarcange@...hat.com>,
David Rientjes <rientjes@...gle.com>,
Rik van Riel <riel@...hat.com>, Jan Kara <jack@...e.cz>,
Dave Jiang <dave.jiang@...el.com>,
Daniel Jordan <daniel.m.jordan@...cle.com>,
Andrea Parri <andrea.parri@...rulasolutions.com>
Subject: Re: [PATCH -mm -V7] mm, swap: fix race between swapoff and some swap
operations
On Fri 15-02-19 15:08:36, Huang, Ying wrote:
> Michal Hocko <mhocko@...nel.org> writes:
>
> > On Mon 11-02-19 16:38:46, Huang, Ying wrote:
> >> From: Huang Ying <ying.huang@...el.com>
> >>
> >> When swapin is performed, after getting the swap entry information from
> >> the page table, system will swap in the swap entry, without any lock held
> >> to prevent the swap device from being swapoff. This may cause the race
> >> like below,
> >>
> >> CPU 1 CPU 2
> >> ----- -----
> >> do_swap_page
> >> swapin_readahead
> >> __read_swap_cache_async
> >> swapoff swapcache_prepare
> >> p->swap_map = NULL __swap_duplicate
> >> p->swap_map[?] /* !!! NULL pointer access */
> >>
> >> Because swapoff is usually done when system shutdown only, the race may
> >> not hit many people in practice. But it is still a race need to be fixed.
> >>
> >> To fix the race, get_swap_device() is added to check whether the specified
> >> swap entry is valid in its swap device. If so, it will keep the swap
> >> entry valid via preventing the swap device from being swapoff, until
> >> put_swap_device() is called.
> >>
> >> Because swapoff() is very rare code path, to make the normal path runs as
> >> fast as possible, disabling preemption + stop_machine() instead of
> >> reference count is used to implement get/put_swap_device(). From
> >> get_swap_device() to put_swap_device(), the preemption is disabled, so
> >> stop_machine() in swapoff() will wait until put_swap_device() is called.
> >>
> >> In addition to swap_map, cluster_info, etc. data structure in the struct
> >> swap_info_struct, the swap cache radix tree will be freed after swapoff,
> >> so this patch fixes the race between swap cache looking up and swapoff
> >> too.
> >>
> >> Races between some other swap cache usages protected via disabling
> >> preemption and swapoff are fixed too via calling stop_machine() between
> >> clearing PageSwapCache() and freeing swap cache data structure.
> >>
> >> Alternative implementation could be replacing disable preemption with
> >> rcu_read_lock_sched and stop_machine() with synchronize_sched().
> >
> > using stop_machine is generally discouraged. It is a gross
> > synchronization.
> >
> > Besides that, since when do we have this problem?
>
> For problem, you mean the race between swapoff and the page fault
> handler?
yes
> The problem is introduced in v4.11 when we avoid to replace
> swap_info_struct->lock with swap_cluster_info->lock in
> __swap_duplicate() if possible to improve the scalability of swap
> operations. But because swapoff is a really rare operation, I don't
> think it's necessary to backport the fix.
Well, a lack of any bug reports would support your theory that this is
unlikely to hit in practice. Fixes tag would be nice to have regardless
though.
Thanks!
--
Michal Hocko
SUSE Labs
Powered by blists - more mailing lists