[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZHDi8q6N9BPElAMH@x1n>
Date: Fri, 26 May 2023 12:48:50 -0400
From: Peter Xu <peterx@...hat.com>
To: Hugh Dickins <hughd@...gle.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
Mike Kravetz <mike.kravetz@...cle.com>,
Mike Rapoport <rppt@...nel.org>,
"Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
Matthew Wilcox <willy@...radead.org>,
David Hildenbrand <david@...hat.com>,
Suren Baghdasaryan <surenb@...gle.com>,
Qi Zheng <zhengqi.arch@...edance.com>,
Yang Shi <shy828301@...il.com>,
Mel Gorman <mgorman@...hsingularity.net>,
Peter Zijlstra <peterz@...radead.org>,
Will Deacon <will@...nel.org>, Yu Zhao <yuzhao@...gle.com>,
Alistair Popple <apopple@...dia.com>,
Ralph Campbell <rcampbell@...dia.com>,
Ira Weiny <ira.weiny@...el.com>,
Steven Price <steven.price@....com>,
SeongJae Park <sj@...nel.org>,
Naoya Horiguchi <naoya.horiguchi@....com>,
Christophe Leroy <christophe.leroy@...roup.eu>,
Zack Rusin <zackr@...are.com>, Jason Gunthorpe <jgg@...pe.ca>,
Axel Rasmussen <axelrasmussen@...gle.com>,
Anshuman Khandual <anshuman.khandual@....com>,
Pasha Tatashin <pasha.tatashin@...een.com>,
Miaohe Lin <linmiaohe@...wei.com>,
Minchan Kim <minchan@...nel.org>,
Christoph Hellwig <hch@...radead.org>,
Song Liu <song@...nel.org>,
Thomas Hellstrom <thomas.hellstrom@...ux.intel.com>,
linux-kernel@...r.kernel.org, linux-mm@...ck.org
Subject: Re: [PATCH 01/31] mm: use pmdp_get_lockless() without surplus
barrier()
On Thu, May 25, 2023 at 03:35:01PM -0700, Hugh Dickins wrote:
> On Wed, 24 May 2023, Peter Xu wrote:
> > On Sun, May 21, 2023 at 09:49:45PM -0700, Hugh Dickins wrote:
> > > Use pmdp_get_lockless() in preference to READ_ONCE(*pmdp), to get a more
> > > reliable result with PAE (or READ_ONCE as before without PAE); and remove
> > > the unnecessary extra barrier()s which got left behind in its callers.
> >
> > Pure question: does it mean that some of below path (missing barrier()
> > ones) could have problem when CONFIG_PAE, hence this can be seen as a
> > (potential) bug fix?
>
> I don't think so; or at least, I am not claiming that this fixes any.
>
> It really depends on what use is made of the pmdval afterwards, and
> I've not checked through them. The READ_ONCE()s which were there,
> were good enough to make sure that the compiler did not reevaluate
> the pmdval later on, with perhaps a confusingly different result.
>
> But, at least in the x86 PAE case, they were not good enough to ensure
> that the two halves of the entry match up; and, sad to say, nor is that
> absolutely guaranteed by these conversions to pmdp_get_lockless() -
> because of the "HOWEVER" below. PeterZ's comments in linux/pgtable.h
> are well worth reading through.
Yes exactly - that's one major thing of my confusion on using
{ptep|pmdp}_get_lockless().
In irqoff ctx, AFAICT we can see a totally messed up pte/pmd with present
bit set if extremely unlucky. E.g. it can race with something like
"DONTNEED (contains tlbflush) then a POPULATE_WRITE" so we can have
"present -> present" conversion of pte when reading, so we can read half
pfn1 and then the other half pfn2.
The other confusing thing on this _lockless trick on PAE is, I think it
_might_ go wrong with devmap..
The problem is here we assumed even if high & low may not match, we still
can rely on most pte/pmd checks are done only on low bits (except _none()
check) to guarantee at least the checks are still atomic on low bits.
But it seems to me it's not true anymore if with pmd_trans_huge() after
devmap introduced, e.g.:
static inline int pmd_trans_huge(pmd_t pmd)
{
return (pmd_val(pmd) & (_PAGE_PSE|_PAGE_DEVMAP)) == _PAGE_PSE;
}
#define _PAGE_PSE (_AT(pteval_t, 1) << _PAGE_BIT_PSE)
#define _PAGE_BIT_PSE 7 /* 4 MB (or 2MB) page */
#define _PAGE_DEVMAP (_AT(u64, 1) << _PAGE_BIT_DEVMAP)
#define _PAGE_BIT_DEVMAP _PAGE_BIT_SOFTW4
#define _PAGE_BIT_SOFTW4 58 /* available for programmer */
So after devmap with CONFIG_PAE, pmd_trans_huge() checks more than low bits
but also high bits. I didn't go further to check whether there can be any
real issue but IIUC that's not expected when the low/high trick introduced
(originally introduced in commit e585513b76f7b05d sololy for x86 PAE
fast-gup only).
>
> You might question why I made these changes at all: some days
> I question them too. Better though imperfect? Or deceptive?
I think it's probably a separate topic to address in all cases, so I think
this patch still make it slightly better on barrier() which I agree:
Acked-by: Peter Xu <peterx@...hat.com>
Thanks,
--
Peter Xu
Powered by blists - more mailing lists