linux-kernel - Re: [PATCH 2/5] swap: fix do_swap

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <87r1jgwa2w.fsf@yhuang6-desk1.ccr.corp.intel.com>
Date:   Mon, 12 Apr 2021 09:44:23 +0800
From:   "Huang, Ying" <ying.huang@...el.com>
To:     Miaohe Lin <linmiaohe@...wei.com>
Cc:     Tim Chen <tim.c.chen@...ux.intel.com>, <akpm@...ux-foundation.org>,
        <hannes@...xchg.org>, <mhocko@...e.com>, <iamjoonsoo.kim@....com>,
        <vbabka@...e.cz>, <alex.shi@...ux.alibaba.com>,
        <willy@...radead.org>, <minchan@...nel.org>,
        <richard.weiyang@...il.com>, <hughd@...gle.com>,
        <linux-kernel@...r.kernel.org>, <linux-mm@...ck.org>
Subject: Re: [PATCH 2/5] swap: fix do_swap_page() race with swapoff

Miaohe Lin <linmiaohe@...wei.com> writes:

> On 2021/4/10 1:17, Tim Chen wrote:
>> 
>> 
>> On 4/9/21 1:42 AM, Miaohe Lin wrote:
>>> On 2021/4/9 5:34, Tim Chen wrote:
>>>>
>>>>
>>>> On 4/8/21 6:08 AM, Miaohe Lin wrote:
>>>>> When I was investigating the swap code, I found the below possible race
>>>>> window:
>>>>>
>>>>> CPU 1					CPU 2
>>>>> -----					-----
>>>>> do_swap_page
>>>>>   synchronous swap_readpage
>>>>>     alloc_page_vma
>>>>> 					swapoff
>>>>> 					  release swap_file, bdev, or ...
>>>>
>>>
>>> Many thanks for quick review and reply!
>>>
>>>> Perhaps I'm missing something.  The release of swap_file, bdev etc
>>>> happens after we have cleared the SWP_VALID bit in si->flags in destroy_swap_extents
>>>> if I read the swapoff code correctly.
>>> Agree. Let's look this more close:
>>> CPU1								CPU2
>>> -----								-----
>>> swap_readpage
>>>   if (data_race(sis->flags & SWP_FS_OPS)) {
>>> 								swapoff
>>> 								  p->swap_file = NULL;
>>>     struct file *swap_file = sis->swap_file;
>>>     struct address_space *mapping = swap_file->f_mapping;[oops!]
>>> 								  ...
>>> 								  p->flags = 0;
>>>     ...
>>>
>>> Does this make sense for you?
>> 
>> p->swapfile = NULL happens after the 
>> p->flags &= ~SWP_VALID, synchronize_rcu(), destroy_swap_extents() sequence in swapoff().
>> 
>> So I don't think the sequence you illustrated on CPU2 is in the right order.
>> That said, without get_swap_device/put_swap_device in swap_readpage, you could
>> potentially blow pass synchronize_rcu() on CPU2 and causes a problem.  so I think
>> the problematic race looks something like the following:
>> 
>> 
>> CPU1								CPU2
>> -----								-----
>> swap_readpage
>>   if (data_race(sis->flags & SWP_FS_OPS)) {
>> 								swapoff
>> 								  p->flags = &= ~SWP_VALID;
>> 								  ..
>> 								  synchronize_rcu();
>> 								  ..
>> 								  p->swap_file = NULL;
>>     struct file *swap_file = sis->swap_file;
>>     struct address_space *mapping = swap_file->f_mapping;[oops!]
>> 								  ...
>>     ...
>> 
>
> Agree. This is also what I meant to illustrate. And you provide a better one. Many thanks!

For the pages that are swapped in through swap cache.  That isn't an
issue.  Because the page is locked, the swap entry will be marked with
SWAP_HAS_CACHE, so swapoff() cannot proceed until the page has been
unlocked.

So the race is for the fast path as follows,

		if (data_race(si->flags & SWP_SYNCHRONOUS_IO) &&
		    __swap_count(entry) == 1)

I found it in your original patch description.  But please make it more
explicit to reduce the potential confusing.

Best Regards,
Huang, Ying