lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Tue, 3 Nov 2020 12:03:27 -0500
From:   Peter Xu <peterx@...hat.com>
To:     "Ahmed S. Darwish" <a.darwish@...utronix.de>
Cc:     Jason Gunthorpe <jgg@...dia.com>, linux-kernel@...r.kernel.org,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Andrea Arcangeli <aarcange@...hat.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        "Aneesh Kumar K.V" <aneesh.kumar@...ux.ibm.com>,
        Christoph Hellwig <hch@....de>,
        Hugh Dickins <hughd@...gle.com>, Jan Kara <jack@...e.cz>,
        Jann Horn <jannh@...gle.com>,
        John Hubbard <jhubbard@...dia.com>,
        Kirill Shutemov <kirill@...temov.name>,
        Kirill Tkhai <ktkhai@...tuozzo.com>,
        Leon Romanovsky <leonro@...dia.com>,
        Linux-MM <linux-mm@...ck.org>, Michal Hocko <mhocko@...e.com>,
        Oleg Nesterov <oleg@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...hat.com>, Will Deacon <will@...nel.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Sebastian Siewior <bigeasy@...utronix.de>
Subject: Re: [PATCH v2 2/2] mm: prevent gup_fast from racing with COW during
 fork

On Tue, Nov 03, 2020 at 01:17:12AM +0100, Ahmed S. Darwish wrote:
> > > > diff --git a/mm/memory.c b/mm/memory.c
> > > > index c48f8df6e50268..294c2c3c4fe00d 100644
> > > > +++ b/mm/memory.c
> > > > @@ -1171,6 +1171,12 @@ copy_page_range(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma)
> > > >  		mmu_notifier_range_init(&range, MMU_NOTIFY_PROTECTION_PAGE,
> > > >  					0, src_vma, src_mm, addr, end);
> > > >  		mmu_notifier_invalidate_range_start(&range);
> > > > +		/*
> > > > +		 * The read side doesn't spin, it goes to the mmap_lock, so the
> > > > +		 * raw version is used to avoid disabling preemption here
> > > > +		 */
> > > > +		mmap_assert_write_locked(src_mm);
> > > > +		raw_write_seqcount_t_begin(&src_mm->write_protect_seq);
> > >
> > > Would raw_write_seqcount_begin() be better here?
> >
> > Hum..
> >
> > I felt no because it had the preempt stuff added into it, however it
> > would work - __seqcount_lock_preemptible() == false for the seqcount_t
> > case (see below)
> >
> > Looking more closely, maybe the right API to pick is
> > write_seqcount_t_begin() and write_seqcount_t_end() ??
> >
> 
> No, that's not the right API: it is also internal to seqlock.h.
> 
> Please stick with the official exported API: raw_write_seqcount_begin().
> 
> It should satisfy your needs, and the raw_*() variant is created exactly
> for contexts wishing to avoid the lockdep checks (e.g. NMI handlers
> cannot invoke lockdep, etc.)

Ahmed, Jason, feel free to correct me - but I feel like what Jason wanted here
is indeed the version that does not require disabling of preemption, a.k.a.,
write_seqcount_t_begin() and write_seqcount_t_end(), since it's preempt-safe if
the read side does not retry.  Not sure whether there's no "*_t_*" version of it.

Another idea is that maybe we can use the raw_write_seqcount_begin() version,
instead of in copy_page_range() but move it to copy_pte_range().  That would
not affect normal Linux on preemption I think, since when reach pte level we
should have disabled preemption already after all (by taking the pgtable spin
lock).  But again there could be extra overhead since we'll need to take the
write seqcount very often (rather than once per fork(), so maybe there's some
perf influence), also that means it'll be an extra/real disable_preempt() for
the future RT code if it'll land some day (since again rt_spin_lock should not
need to disable preemption, iiuc, which is used here).  So seems it's still
better to do in copy_page_range() as Jason proposed.

Thanks,

-- 
Peter Xu

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ