lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 11 Feb 2019 13:59:24 +0000
From:   Linux Upstream <linux.upstream@...plus.com>
To:     Peter Zijlstra <peterz@...radead.org>,
        Chintan Pandya <chintan.pandya@...plus.com>
CC:     "hughd@...gle.com" <hughd@...gle.com>,
        "jack@...e.cz" <jack@...e.cz>,
        "mawilcox@...rosoft.com" <mawilcox@...rosoft.com>,
        "akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "linux-mm@...ck.org" <linux-mm@...ck.org>
Subject: Re: [RFC 1/2] page-flags: Make page lock operation atomic



On 11/02/19 7:16 PM, Peter Zijlstra wrote:
> On Mon, Feb 11, 2019 at 12:53:53PM +0000, Chintan Pandya wrote:
>> Currently, page lock operation is non-atomic. This is opening
>> some scope for race condition. For ex, if 2 threads are accessing
>> same page flags, it may happen that our desired thread's page
>> lock bit (PG_locked) might get overwritten by other thread
>> leaving page unlocked. This can cause issues later when some
>> code expects page to be locked but it is not.
>>
>> Make page lock/unlock operation use the atomic version of
>> set_bit API. There are other flag set operations which still
>> uses non-atomic version of set_bit API. Bit, that might be
>> the change for the future.
>>
>> Change-Id: I13bdbedc2b198af014d885e1925c93b83ed6660e
> 
> That doesn't belong in patches.

Sure. That's a miss. Will fix this.

> 
>> Signed-off-by: Chintan Pandya <chintan.pandya@...plus.com>
> 
> NAK.
> 
> This is bound to regress some stuff. Now agreed that using non-atomic
> ops is tricky, but many are in places where we 'know' there can't be
> concurrency.
> 
> If you can show any single one is wrong, we can fix that one, but we're
> not going to blanket remove all this just because.

Not quite familiar with below stack but from crash dump, found that this
was another stack running on some other CPU at the same time which also
updates page cache lru and manipulate locks.

[84415.344577] [20190123_21:27:50.786264]@1 preempt_count_add+0xdc/0x184
[84415.344588] [20190123_21:27:50.786276]@1 workingset_refault+0xdc/0x268
[84415.344600] [20190123_21:27:50.786288]@1 add_to_page_cache_lru+0x84/0x11c
[84415.344612] [20190123_21:27:50.786301]@1 ext4_mpage_readpages+0x178/0x714
[84415.344625] [20190123_21:27:50.786313]@1 ext4_readpages+0x50/0x60
[84415.344636] [20190123_21:27:50.786324]@1 
__do_page_cache_readahead+0x16c/0x280
[84415.344646] [20190123_21:27:50.786334]@1 filemap_fault+0x41c/0x588
[84415.344655] [20190123_21:27:50.786343]@1 ext4_filemap_fault+0x34/0x50
[84415.344664] [20190123_21:27:50.786353]@1 __do_fault+0x28/0x88

Not entirely sure if it's racing with the crashing stack or it's simply
overrides the the bit set by case 2 (mentioned in 0/2).
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ