linux-kernel - Re: [PATCH -mm -V7] mm, swap: fix race between swapoff and some swap operations

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <997d210f-8706-7dc1-b1bb-abcc2db2ddd1@linux.intel.com>
Date:   Tue, 12 Feb 2019 09:58:11 -0800
From:   Tim Chen <tim.c.chen@...ux.intel.com>
To:     "Huang, Ying" <ying.huang@...el.com>,
        Andrea Parri <andrea.parri@...rulasolutions.com>
Cc:     Daniel Jordan <daniel.m.jordan@...cle.com>,
        Andrew Morton <akpm@...ux-foundation.org>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org, Hugh Dickins <hughd@...gle.com>,
        "Paul E . McKenney" <paulmck@...ux.vnet.ibm.com>,
        Minchan Kim <minchan@...nel.org>,
        Johannes Weiner <hannes@...xchg.org>,
        Mel Gorman <mgorman@...hsingularity.net>,
        Jérôme Glisse <jglisse@...hat.com>,
        Michal Hocko <mhocko@...e.com>,
        Andrea Arcangeli <aarcange@...hat.com>,
        David Rientjes <rientjes@...gle.com>,
        Rik van Riel <riel@...hat.com>, Jan Kara <jack@...e.cz>,
        Dave Jiang <dave.jiang@...el.com>
Subject: Re: [PATCH -mm -V7] mm, swap: fix race between swapoff and some swap
 operations

On 2/11/19 10:47 PM, Huang, Ying wrote:
> Andrea Parri <andrea.parri@...rulasolutions.com> writes:
> 
>>>> +	if (!si)
>>>> +		goto bad_nofile;
>>>> +
>>>> +	preempt_disable();
>>>> +	if (!(si->flags & SWP_VALID))
>>>> +		goto unlock_out;
>>>
>>> After Hugh alluded to barriers, it seems the read of SWP_VALID could be
>>> reordered with the write in preempt_disable at runtime.  Without smp_mb()
>>> between the two, couldn't this happen, however unlikely a race it is?
>>>
>>> CPU0                                CPU1
>>>
>>> __swap_duplicate()
>>>     get_swap_device()
>>>         // sees SWP_VALID set
>>>                                    swapoff
>>>                                        p->flags &= ~SWP_VALID;
>>>                                        spin_unlock(&p->lock); // pair w/ smp_mb
>>>                                        ...
>>>                                        stop_machine(...)
>>>                                        p->swap_map = NULL;
>>>         preempt_disable()
>>>     read NULL p->swap_map
>>
>>
>> I don't think that that smp_mb() is necessary.  I elaborate:
>>
>> An important piece of information, I think, that is missing in the
>> diagram above is the stopper thread which executes the work queued
>> by stop_machine().  We have two cases to consider, that is,
>>
>>   1) the stopper is "executed before" the preempt-disable section
>>
>> 	CPU0
>>
>> 	cpu_stopper_thread()
>> 	...
>> 	preempt_disable()
>> 	...
>> 	preempt_enable()
>>
>>   2) the stopper is "executed after" the preempt-disable section
>>
>> 	CPU0
>>
>> 	preempt_disable()
>> 	...
>> 	preempt_enable()
>> 	...
>> 	cpu_stopper_thread()
>>
>> Notice that the reads from p->flags and p->swap_map in CPU0 cannot
>> cross cpu_stopper_thread().  The claim is that CPU0 sees SWP_VALID
>> unset in (1) and that it sees a non-NULL p->swap_map in (2).
>>
>> I consider the two cases separately:
>>
>>   1) CPU1 unsets SPW_VALID, it locks the stopper's lock, and it
>>      queues the stopper work; CPU0 locks the stopper's lock, it
>>      dequeues this work, and it reads from p->flags.
>>
>>      Diagrammatically, we have the following MP-like pattern:
>>
>> 	CPU0				CPU1
>>
>> 	lock(stopper->lock)		p->flags &= ~SPW_VALID
>> 	get @work			lock(stopper->lock)
>> 	unlock(stopper->lock)		add @work
>> 	reads p->flags 			unlock(stopper->lock)
>>
>>      where CPU0 must see SPW_VALID unset (if CPU0 sees the work
>>      added by CPU1).
>>
>>   2) CPU0 reads from p->swap_map, it locks the completion lock,
>>      and it signals completion; CPU1 locks the completion lock,
>>      it checks for completion, and it writes to p->swap_map.
>>
>>      (If CPU0 doesn't signal the completion, or CPU1 doesn't see
>>      the completion, then CPU1 will have to iterate the read and
>>      to postpone the control-dependent write to p->swap_map.)
>>
>>      Diagrammatically, we have the following LB-like pattern:
>>
>> 	CPU0				CPU1
>>
>> 	reads p->swap_map		lock(completion)
>> 	lock(completion)		read completion->done
>> 	completion->done++		unlock(completion)
>> 	unlock(completion)		p->swap_map = NULL
>>
>>      where CPU0 must see a non-NULL p->swap_map if CPU1 sees the
>>      completion from CPU0.
>>
>> Does this make sense?
> 
> Thanks a lot for detailed explanation!

This is certainly a non-trivial explanation of why memory barrier is not
needed.  Can we put it in the commit log and mention something in
comments on why we don't need memory barrier?

Thanks.

Tim