lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 05 Jun 2008 10:27:19 +0100
From:	Jeremy Fitzhardinge <jeremy@...p.org>
To:	Jan Beulich <jbeulich@...ell.com>
CC:	Ingo Molnar <mingo@...e.hu>, linux-kernel@...r.kernel.org
Subject: Re: operation ordering during pgd_alloc/pgd_free

Jan Beulich wrote:
> At present, pgd_ctor() adds a new pgd to pgd_list solely based on
> !SHARED_KERNEL_PMD. For PAE && !SHARED_KERNEL_PMD (i.e. Xen)
> this doesn't seem correct, as the pgd is still empty, which will confuse
> vmalloc_sync_all(). So in this case, list insertion should only happen at
> the end of pgd_prepopulate_pmd().
>   

How does vmalloc_sync_all() get confused?

> Likewise, pgd_free() calls pgd_mop_up_pmds() *before* pgd_dtor(),
> with the former zeroing pgd entries as it goes and only the latter
> removing the pgd from the list. Just as above this can confuse
> vmalloc_sync_all(), so here I would think that the two calls should just
> be swapped. However, if they get swapped, careful inspection of the
> interaction with save/restore will be needed - 

Yes, I specifically wanted to make sure that the pgd was on the list 
from before it had any entries until after it has any, to make sure that 
no pmds escape visibility from xen_mm_pin_all().  (Note to self: put a 
memory barrier to make sure the list update is complete before/after 
inserting/removing any pmd entries.)

> XenSource's Linux tree
> has a comment specifically to that effect:
>
> /*
>  * After this the pgd should not be pinned for the duration of this
>  * function's execution. We should never sleep and thus never race:
>  *  1. User pmds will not become write-protected under our feet due
>  *     to a concurrent mm_pin_all().
>  *  2. The machine addresses in PGD entries will not become invalid
>  *     due to a concurrent save/restore.
>  */
>
> Since that tree doesn't support preemption, this is perhaps fine, but
> likely going to cause problems in the (preemptable) pv-ops code.
>   

I don't think so.  When saving with preemption enabled, it first puts 
all processes in the freezer before entering stop_machine_run(); a 
process constructing a pagetable should be finished by the time it can 
be frozen.

But I think there's a problem *without* preemption.  
pmd_prepopulate_pgd() allocates new pmds with GFP_KERNEL, and so it can 
block, which undermines the precondition of the comment you quote.  This 
allows an unlisted and unpinned pgd to be missed at save time.  I could 
just use the freezer unconditionally, but there was some concern about 
how much time it would take on a busy system.

Alternatively, a different ordering would fix it:

   1. preallocate - but don't install - the pmds
   2. take pgd_lock
   3. install pmds into pgd
   4. insert pgd onto list
   5. release pgd_lock

Holding pgd_lock will prevent both vmalloc_sync_all() and 
xen_mm_pin_all() from being able to visit the pgd while it is in its 
transitional state.

> The issue with vmalloc_sync_all() would even go unnoticed, since the
> patch to unify the pgd_list mechanism with x86-64 removed the
> BUG_ON() that was meant to trigger on issues like this.
>   

Is there an inherent reason vmalloc_sync_all can't deal with a partially 
constructed pgd?  Couldn't it just skip them, as if it wasn't (or 
rather, not yet) on the list?  In fact, that looks like what it does now.

Thanks for looking at this.

    J
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ