linux-kernel - Re: swapoff() runs forever

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <alpine.LSU.2.00.1204112241510.28009@eggly.anvils>
Date:	Wed, 11 Apr 2012 23:40:26 -0700 (PDT)
From:	Hugh Dickins <hughd@...gle.com>
To:	Richard Weinberger <richard@....at>
cc:	Konstantin Khlebnikov <khlebnikov@...nvz.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"linux-mm@...ck.org" <linux-mm@...ck.org>,
	"paul.gortmaker@...driver.com" <paul.gortmaker@...driver.com>,
	Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: swapoff() runs forever

On Thu, 12 Apr 2012, Richard Weinberger wrote:
> Am 09.04.2012 20:40, schrieb Hugh Dickins:
> > I've not seen any such issue in recent months (or years), but
> > I've not been using UML either.  The most likely cause that springs
> > to mind would be corruption of the vmalloc'ed swap map: that would
> > be very likely to cause such a hang.
> 
> It does not look like a swap map corruption.
> If I restart most user space processes swapoff() terminates fine.

Right, thanks, that's very useful info.

> Maybe it is a refcounting problem?

You may prove to be correct; but since killing and restarting
processes fixes it up without (I presume) issuing warnings,
it doesn't sound like a refcounting problem to me.

> 
> > You say "recent Linux kernels": I wonder what "recent" means.
> > Is this something you can reproduce quickly and reliably enough
> > to do a bisection upon?
> > 
> 
> I can reproduce the issue on any UML kernel.
> The oldest I've tested was 2.6.20.
> Therefore, bug was not introduced by me. B-)

More useful info, thank you.

I think I've spotted two problems in the UML swp_entry_t handling;
but checking if I'm right, and if they're relevant, and how to fix them,
I'll leave to you - it's years since I tried UML and I remember 0.

One, likely to be your problem.  Take a look at unuse_pte_range() in
mm/swapfile.c, where it searches the page table for the swp_pte it's
trying to "unuse".  And take a look at set_pte() in
arch/um/include/asm/pgtable.h, which appears to add a mysterious
_PAGE_NEWPAGE bit into the page table entry.  And UML doesn't provide
an alternative to generic pte_same() in include/asm-genric/pgtable.h.

My guess is that the _NEWPAGE bit prevents swapoff from matching pte
against swap entry in all or some cases (I didn't look to see if
_NEWPAGE is sometimes cleared later).

Probably a good fix to try would be providing a UML pte_same() which
takes that into account; but I don't know what conditionals it should
contain, and whether it would become too inefficient.  Or, if _NEWPAGE
is always set in a swap pte, then swp_entry_to_pte() needs to set it.

(A word of warning if you're unfamiliar with swap entries: there's the
kernel's internal representation swp_entry_t, which has offset in the
low-order and type in the high-order, for efficient use with radix_tree
- see include/linux/swapops.h; and then there's the arch-dependent
representation as a page table entry, which rearranges the bits so
as not to be confused with a good present page table entry, and
traditionally has type on the lower side of offset.)

The other thing I noticed first, probably not relevant to the bug you're
seeing since I think you'd have mentioned if you had two swapfiles; but
the two or more swapfile case looks very broken to me.  _PAGE_PROTNONE is
0x010 but __swp_type(x) is (((x).val >> 4) & 0x3f): unless I'm confused,
a swap entry of type 1 will look just like a PROT_NONE pte.

Or maybe that's resolved by the _PAGE_NEWPAGE and _PAGE_NEWPROT bits,
I didn't spend time working out what they're up to.

include/linux/swap.h does not allow MAX_SWAPFILES to exceed 32,
so you can easily change __swp_type(x) to use 5 and 0x1f instead
(with 5 instead of 4 in __swp_entry too of course).  Though it doesn't
cause error, I wonder where the 11 in __swp_offset and __swp_entry
comes from: I think you can support larger swap by making it 10.

Hugh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/