lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6ad329f6-6e1b-411b-a5d7-be1c8fd89d96@redhat.com>
Date: Fri, 16 Feb 2024 17:16:31 +0100
From: David Hildenbrand <david@...hat.com>
To: Kairui Song <ryncsn@...il.com>
Cc: linux-mm@...ck.org, Andrew Morton <akpm@...ux-foundation.org>,
 "Huang, Ying" <ying.huang@...el.com>, Chris Li <chrisl@...nel.org>,
 Minchan Kim <minchan@...nel.org>, Yu Zhao <yuzhao@...gle.com>,
 Barry Song <v-songbaohua@...o.com>, SeongJae Park <sj@...nel.org>,
 Hugh Dickins <hughd@...gle.com>, Johannes Weiner <hannes@...xchg.org>,
 Matthew Wilcox <willy@...radead.org>, Michal Hocko <mhocko@...e.com>,
 Yosry Ahmed <yosryahmed@...gle.com>, stable@...r.kernel.org,
 linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2] mm/swap: fix race when skipping swapcache

>> (4) relock the folio. (we do that already, might not want to fail)
>>
>> (4) take the PTE lock. If the PTE did not change, turn it into a present
>> PTE entry. Otherwise, cleanup.
> 
> Very interesting idea!
> 
> I'm just not sure what actual benefit it brings. The only concern
> about reusing swapcache_prepare so far is repeated page faults that
> may hurt performance or statistics, this issue is basically gone after
> adding a schedule().

I think you know that slapping in random schedule() calls is not a 
proper way to wait for an event to happen :) It's pretty much 
unpredictable how long the schedule() will take and if there even is 
anything to schedule to!

With what I propose, just like with page migration, you really do wait 
for the event (page read and unlocked, only the PTE has to be updated) 
to happen before you try anything else.

Now, the difference is most likely that the case here happens much less 
frequently than page migration. Still, you could have all threads 
faulting one the same page and all would do the same dance here.

> 
> We can't drop all the operations around swap cache and map anyway. It
> doesn't know if it should skip the swapcache until swapcache lookup
> and swap count look up are all done. So I think it can be done more
> naturally here with a special value, making things simpler, robust,
> and improving performance a bit more.
> 

The issue will always be that someone can zap the PTE concurrently, 
which would free up the swap cache. With what I propose, that cannot 
happen in the sync swapin case we are discussing here.

If someone where to zap the PTE in the meantime, it would only remove 
the special non-swap entry, indicating to swapin code that concurrent 
zapping happened. The swapin code would handle the swapcache freeing 
instead, after finishing reading the content.

So the swapcache would not get freed concurrently anymore if I am not wrong.

At least the theory, I didn't prototype it yet.

> And in another series [1] I'm working on making shmem make use of
> cache bypassed swapin as well, following this approach I'll have to
> implement another shmem map based synchronization too.
> 

I'd have to look further into that, if something like that could 
similarly apply to shmem. But yes, it's no using PTEs, so a PTE-based 
sync mechanism does definitely not apply..

> After all it's only a rare race, I think a simpler solution might be better.

I'm not sure that simpler means better here. Simpler might be better for 
a backport, though.

The "use schedule() to wait" looks odd, maybe it's a common primitive 
that I simply didn't stumble over yet. (I doubt it but it could be possible)

-- 
Cheers,

David / dhildenb


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ