lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [day] [month] [year] [list]
Date:   Fri, 15 Feb 2019 14:11:22 +0100
From:   Michal Hocko <mhocko@...nel.org>
To:     "Huang, Ying" <ying.huang@...el.com>
Cc:     Andrew Morton <akpm@...ux-foundation.org>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org, Hugh Dickins <hughd@...gle.com>,
        "Paul E . McKenney" <paulmck@...ux.vnet.ibm.com>,
        Minchan Kim <minchan@...nel.org>,
        Johannes Weiner <hannes@...xchg.org>,
        Tim Chen <tim.c.chen@...ux.intel.com>,
        Mel Gorman <mgorman@...hsingularity.net>,
        Jérôme Glisse <jglisse@...hat.com>,
        Andrea Arcangeli <aarcange@...hat.com>,
        David Rientjes <rientjes@...gle.com>,
        Rik van Riel <riel@...hat.com>, Jan Kara <jack@...e.cz>,
        Dave Jiang <dave.jiang@...el.com>,
        Daniel Jordan <daniel.m.jordan@...cle.com>,
        Andrea Parri <andrea.parri@...rulasolutions.com>
Subject: Re: [PATCH -mm -V7] mm, swap: fix race between swapoff and some swap
 operations

On Fri 15-02-19 15:08:36, Huang, Ying wrote:
> Michal Hocko <mhocko@...nel.org> writes:
> 
> > On Mon 11-02-19 16:38:46, Huang, Ying wrote:
> >> From: Huang Ying <ying.huang@...el.com>
> >> 
> >> When swapin is performed, after getting the swap entry information from
> >> the page table, system will swap in the swap entry, without any lock held
> >> to prevent the swap device from being swapoff.  This may cause the race
> >> like below,
> >> 
> >> CPU 1				CPU 2
> >> -----				-----
> >> 				do_swap_page
> >> 				  swapin_readahead
> >> 				    __read_swap_cache_async
> >> swapoff				      swapcache_prepare
> >>   p->swap_map = NULL		        __swap_duplicate
> >> 					  p->swap_map[?] /* !!! NULL pointer access */
> >> 
> >> Because swapoff is usually done when system shutdown only, the race may
> >> not hit many people in practice.  But it is still a race need to be fixed.
> >> 
> >> To fix the race, get_swap_device() is added to check whether the specified
> >> swap entry is valid in its swap device.  If so, it will keep the swap
> >> entry valid via preventing the swap device from being swapoff, until
> >> put_swap_device() is called.
> >> 
> >> Because swapoff() is very rare code path, to make the normal path runs as
> >> fast as possible, disabling preemption + stop_machine() instead of
> >> reference count is used to implement get/put_swap_device().  From
> >> get_swap_device() to put_swap_device(), the preemption is disabled, so
> >> stop_machine() in swapoff() will wait until put_swap_device() is called.
> >> 
> >> In addition to swap_map, cluster_info, etc.  data structure in the struct
> >> swap_info_struct, the swap cache radix tree will be freed after swapoff,
> >> so this patch fixes the race between swap cache looking up and swapoff
> >> too.
> >> 
> >> Races between some other swap cache usages protected via disabling
> >> preemption and swapoff are fixed too via calling stop_machine() between
> >> clearing PageSwapCache() and freeing swap cache data structure.
> >> 
> >> Alternative implementation could be replacing disable preemption with
> >> rcu_read_lock_sched and stop_machine() with synchronize_sched().
> >
> > using stop_machine is generally discouraged. It is a gross
> > synchronization.
> >
> > Besides that, since when do we have this problem?
> 
> For problem, you mean the race between swapoff and the page fault
> handler?

yes

> The problem is introduced in v4.11 when we avoid to replace
> swap_info_struct->lock with swap_cluster_info->lock in
> __swap_duplicate() if possible to improve the scalability of swap
> operations.  But because swapoff is a really rare operation, I don't
> think it's necessary to backport the fix.

Well, a lack of any bug reports would support your theory that this is
unlikely to hit in practice. Fixes tag would be nice to have regardless
though.

Thanks!
-- 
Michal Hocko
SUSE Labs

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ