linux-kernel - Re: kernel 3.0: BUG: soft lockup: find_get

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CANsGZ6Zs0sPMqcToGpw8H1vLcAewyj4aii+iy0V9oyhk7RqzjA@mail.gmail.com>
Date:	Sun, 16 Oct 2011 20:02:16 -0700
From:	Hugh Dickins <hughd@...gle.com>
To:	Linus Torvalds <torvalds@...ux-foundation.org>
Cc:	Pawel Sikora <pluto@...k.net>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Mel Gorman <mgorman@...e.de>,
	Andrea Arcangeli <aarcange@...hat.com>, linux-mm@...ck.org,
	jpiszcz@...idpixels.com, arekm@...-linux.org,
	linux-kernel@...r.kernel.org
Subject: Re: kernel 3.0: BUG: soft lockup: find_get_pages+0x51/0x110

I've not read through and digested Andrea's reply yet, but I'd say
this is not something we need to rush into 3.1 at the last moment,
before it's been fully considered: the bug here is hard to hit,
ancient, made more likely in 2.6.35 by compaction and in 2.6.38 by
THP's reliance on compaction, but not a regression in 3.1 at all - let
it wait until stable.

Hugh

On Sun, Oct 16, 2011 at 3:37 PM, Linus Torvalds
<torvalds@...ux-foundation.org> wrote:
> What's the status of this thing? Is it stable/3.1 material? Do we have
> ack/nak's for it? Anybody?
>
>                               Linus
>
> On Thu, Oct 13, 2011 at 4:16 PM, Hugh Dickins <hughd@...gle.com> wrote:
>>
>> [PATCH] mm: add anon_vma locking to mremap move
>>
>> I don't usually pay much attention to the stale "? " addresses in
>> stack backtraces, but this lucky report from Pawel Sikora hints that
>> mremap's move_ptes() has inadequate locking against page migration.
>>
>>  3.0 BUG_ON(!PageLocked(p)) in migration_entry_to_page():
>>  kernel BUG at include/linux/swapops.h:105!
>>  RIP: 0010:[<ffffffff81127b76>]  [<ffffffff81127b76>]
>>                       migration_entry_wait+0x156/0x160
>>  [<ffffffff811016a1>] handle_pte_fault+0xae1/0xaf0
>>  [<ffffffff810feee2>] ? __pte_alloc+0x42/0x120
>>  [<ffffffff8112c26b>] ? do_huge_pmd_anonymous_page+0xab/0x310
>>  [<ffffffff81102a31>] handle_mm_fault+0x181/0x310
>>  [<ffffffff81106097>] ? vma_adjust+0x537/0x570
>>  [<ffffffff81424bed>] do_page_fault+0x11d/0x4e0
>>  [<ffffffff81109a05>] ? do_mremap+0x2d5/0x570
>>  [<ffffffff81421d5f>] page_fault+0x1f/0x30
>>
>> mremap's down_write of mmap_sem, together with i_mmap_mutex/lock,
>> and pagetable locks, were good enough before page migration (with its
>> requirement that every migration entry be found) came in; and enough
>> while migration always held mmap_sem.  But not enough nowadays, when
>> there's memory hotremove and compaction: anon_vma lock is also needed,
>> to make sure a migration entry is not dodging around behind our back.
>>
>> It appears that Mel's a8bef8ff6ea1 "mm: migration: avoid race between
>> shift_arg_pages() and rmap_walk() during migration by not migrating
>> temporary stacks" was actually a workaround for this in the special
>> common case of exec's use of move_pagetables(); and we should probably
>> now remove that VM_STACK_INCOMPLETE_SETUP stuff as a separate cleanup.
>>
>> Reported-by: Pawel Sikora <pluto@...k.net>
>> Cc: stable@...nel.org
>> Signed-off-by: Hugh Dickins <hughd@...gle.com>
>> ---
>>
>>  mm/mremap.c |    5 +++++
>>  1 file changed, 5 insertions(+)
>>
>> --- 3.1-rc9/mm/mremap.c 2011-07-21 19:17:23.000000000 -0700
>> +++ linux/mm/mremap.c   2011-10-13 14:36:25.097780974 -0700
>> @@ -77,6 +77,7 @@ static void move_ptes(struct vm_area_str
>>                unsigned long new_addr)
>>  {
>>        struct address_space *mapping = NULL;
>> +       struct anon_vma *anon_vma = vma->anon_vma;
>>        struct mm_struct *mm = vma->vm_mm;
>>        pte_t *old_pte, *new_pte, pte;
>>        spinlock_t *old_ptl, *new_ptl;
>> @@ -95,6 +96,8 @@ static void move_ptes(struct vm_area_str
>>                mapping = vma->vm_file->f_mapping;
>>                mutex_lock(&mapping->i_mmap_mutex);
>>        }
>> +       if (anon_vma)
>> +               anon_vma_lock(anon_vma);
>>
>>        /*
>>         * We don't have to worry about the ordering of src and dst
>> @@ -121,6 +124,8 @@ static void move_ptes(struct vm_area_str
>>                spin_unlock(new_ptl);
>>        pte_unmap(new_pte - 1);
>>        pte_unmap_unlock(old_pte - 1, old_ptl);
>> +       if (anon_vma)
>> +               anon_vma_unlock(anon_vma);
>>        if (mapping)
>>                mutex_unlock(&mapping->i_mmap_mutex);
>>        mmu_notifier_invalidate_range_end(vma->vm_mm, old_start, old_end);
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/