linux-kernel - Re: [PATCH 2/5] swap: fix do_swap

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <7684b3de-2824-9b1f-f033-d4bc14f9e195@linux.intel.com>
Date:   Thu, 8 Apr 2021 14:34:30 -0700
From:   Tim Chen <tim.c.chen@...ux.intel.com>
To:     Miaohe Lin <linmiaohe@...wei.com>, akpm@...ux-foundation.org
Cc:     hannes@...xchg.org, mhocko@...e.com, iamjoonsoo.kim@....com,
        vbabka@...e.cz, alex.shi@...ux.alibaba.com, willy@...radead.org,
        minchan@...nel.org, richard.weiyang@...il.com,
        ying.huang@...el.com, hughd@...gle.com,
        linux-kernel@...r.kernel.org, linux-mm@...ck.org
Subject: Re: [PATCH 2/5] swap: fix do_swap_page() race with swapoff

On 4/8/21 6:08 AM, Miaohe Lin wrote:
> When I was investigating the swap code, I found the below possible race
> window:
> 
> CPU 1					CPU 2
> -----					-----
> do_swap_page
>   synchronous swap_readpage
>     alloc_page_vma
> 					swapoff
> 					  release swap_file, bdev, or ...

Perhaps I'm missing something.  The release of swap_file, bdev etc
happens after we have cleared the SWP_VALID bit in si->flags in destroy_swap_extents
if I read the swapoff code correctly.

>       swap_readpage
> 	check sis->flags is ok
> 	  access swap_file, bdev...[oops!]
> 					    si->flags = 0

This happens after we clear the si->flags
					synchronize_rcu()
					release swap_file, bdev, in destroy_swap_extents()

So I think if we have get_swap_device/put_swap_device in do_swap_page,
it should fix the race you've pointed out here.  
Then synchronize_rcu() will wait till we have completed do_swap_page and
call put_swap_device.

> 
> Using current get/put_swap_device() to guard against concurrent swapoff for
> swap_readpage() looks terrible because swap_readpage() may take really long
> time. And this race may not be really pernicious because swapoff is usually
> done when system shutdown only. To reduce the performance overhead on the
> hot-path as much as possible, it appears we can use the percpu_ref to close
> this race window(as suggested by Huang, Ying).

I think it is better to break this patch into two.

One patch is to fix the race in do_swap_page and swapoff
by adding get_swap_device/put_swap_device in do_swap_page.

The second patch is to modify get_swap_device and put_swap_device
with percpu_ref. But swapoff is a relatively rare events.  

I am not sure making percpu_ref change for performance is really beneficial.
Did you encounter a real use case where you see a problem with swapoff?
The delay in swapoff is primarily in try_to_unuse to bring all
the swapped off pages back into memory.  Synchronizing with other
CPU for paging in probably is a small component in overall scheme
of things.

Thanks.

Tim