linux-kernel - Re: [Bug] Reproducible data corruption on i5-3340M: Please revert 53a59fc67!

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20130815151416.GF27864@dhcp22.suse.cz>
Date:	Thu, 15 Aug 2013 17:14:16 +0200
From:	Michal Hocko <mhocko@...e.cz>
To:	Linus Torvalds <torvalds@...ux-foundation.org>
Cc:	Ben Tebulin <tebulin@...glemail.com>, Mel Gorman <mgorman@...e.de>,
	Johannes Weiner <hannes@...xchg.org>,
	Balbir Singh <bsingharora@...il.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>,
	linux-mm <linux-mm@...ck.org>, Rik van Riel <riel@...hat.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	LKML <linux-kernel@...r.kernel.org>,
	Peter Zijlstra <peterz@...radead.org>
Subject: Re: [Bug] Reproducible data corruption on i5-3340M: Please revert
 53a59fc67!

On Thu 15-08-13 16:53:32, Michal Hocko wrote:
> On Thu 15-08-13 16:46:00, Michal Hocko wrote:
> > On Thu 15-08-13 15:40:31, Michal Hocko wrote:
> > > On Thu 15-08-13 05:02:31, Linus Torvalds wrote:
> > > > On Thu, Aug 15, 2013 at 2:25 AM, Ben Tebulin <tebulin@...glemail.com> wrote:
> > > > >
> > > > > I just cherry-picked e6c495a96ce0 into 3.9.11 and 3.7.10.
> > > > > Unfortunately this does _not resolve_ my issue (too good to be true) :-(
> > > > 
> > > > Ho humm. I've found at least one other bug, but that one only affects
> > > > hugepages. Do you perhaps have transparent hugepages enabled? But even
> > > > then it looks quite unlikely.
> > > 
> > > __unmap_hugepage_range is hugetlb not THP if you had that one in mind.
> > > And yes, it doesn't set the range which sounds buggy.
> > 
> > Or, did you mean tlb_remove_page called from zap_huge_pmd? That one
> > should be safe as tlb_remove_pmd_tlb_entry sets need_flush and that
> > means that the full range is flushed.
> 
> Dohh... But we need need_flush_all and that is not set here. So this
> really looks buggy.

This is a really dumb attempt to fix this but maybe it is worth trying
to confirm we are really seeing this problem. It still flushes too much
potentially but I am not sure how to find out the proper start...
Will think about it more.
---
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index a92012a..a16f452 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1381,7 +1381,11 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
 			VM_BUG_ON(!PageHead(page));
 			tlb->mm->nr_ptes--;
 			spin_unlock(&tlb->mm->page_table_lock);
-			tlb_remove_page(tlb, page);
+			if (!__tlb_remove_page(tlb, page)) {
+				tlb->start = 0;
+				tlb->end = addr + HPAGE_SIZE;
+				tlb_flush_mmu(tlb);
+			}
 		}
 		pte_free(tlb->mm, pgtable);
 		ret = 1;
-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/