linux-kernel - Re: [PATCH, RFC/T] Fix handling of write failures to swap devices

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <45483020.9010607@yahoo.com.au>
Date:	Wed, 01 Nov 2006 16:26:56 +1100
From:	Nick Piggin <nickpiggin@...oo.com.au>
To:	Richard Purdie <richard@...nedhand.com>
CC:	kernel list <linux-kernel@...r.kernel.org>,
	Andrew Morton <akpm@...l.org>, Hugh Dickins <hugh@...itas.com>
Subject: Re: [PATCH, RFC/T] Fix handling of write failures to swap devices

Richard Purdie wrote:
> On Sat, 2006-10-28 at 22:10 +1000, Nick Piggin wrote: 
> 
>>Richard Purdie wrote:
>>
>>>I'm open to suggestions as to how to do it. The main problem was that in
>>>in order to mark the page bad you needed to hold an extra reference on
>>>the page so that the "mark bad" code would be the last code to touch the
>>>page. The swapoff code doesn't need this and I don't like the idea of
>>>passing some count value to a common function as that would be
>>>confusing. I guess swapoff could start taking an extra reference but I
>>>can see people objecting to that too as it doesn't need it. 
>>
>>If you have the page locked (which you can't from where you're trying,
>>but we'll tackle that next), and the page is swapcache, then doesn't
>>that pin you a reference to the swap entry as well? So I still don't
>>see why you need that extra reference.... I know there is a
>>try_to_unuse_page/entry hiding in try_to_unuse somewhere ;)
> 
> 
> The final call to swap_free() releases the swap mapping and beyond that
> point you can't tell which swap page you were dealing with.

Right, and the final call to swap_free comes from delete_from_swap_cache
(or in your case, manually, after __delete_from_swap_cache).

You can mark it bad right there, and you don't have to worry about it
any more.

>>That isn't your only problem though, and we simply don't want to do
>>this (potentially expensive) unusing from interrupt context. Noting
>>the error and dealing with it in process context I think is the best
>>way to do it.
> 
> 
> The reasoning was that this circumstance should be extremely rare. If it
> happens, we have a hardware problem. Recovering from that hardware
> problem gracefully is more important than a slightly longer interrupt.
> But yes, process context would be nicer, *if* we can find a way to do
> it. 

s/nicer/required

The fact remains that you simply cannot do this. You are taking locks
which are not interrupt safe, and you are taking sleeping locks too.

>>BTW. what's the driver you're using? It might be useful to have an
>>option to schedule a timer for the completions (at least the ones
>>which generate errors).
> 
> 
> Its a custom driver. I'm sure I can force it into using interrupt
> context for the completions to try and test things although I'm also
> fairly sure the existing patch will break when I do that for the reasons
> you mention :-(. 

That's the general idea ;)

>>I can't for the life of me see how or why this is happening. I don't
>>doubt it is a problem, but it smells like another bug.
> 
> 
> I can't work out the code path it happens in and until I do, I'm not
> sure how I can track it down... 

Is your driver scribbling on the page memory when it encounters a write
error, or is the SIGBUS coming from a subsequent pagefault attempt on
that address? Stick a WARN_ON(1) in the VM_FAULT_SIGBUS case in
arch/arm/mm/fault.c to check.

>>Still, something must be triggering it somewhere.
> 
> 
> Something must be but I wish I knew what/where...

Let's try to find out :)

>>swap_writepage sounds better than . You've even got the page
>>already locked for you, so the job's half done ;)
>>
>>What performance implications did you imagine? The fastpath will
>>just be a single PageError test, and error case slowpath doesn't
>>matter too much beyond having something that actually works.
> 
> 
> If swap_writepage() is to check for an error, it will have to wait until
> the IO is complete with something like wait_on_page_writeback() before
> it can check. The performance implication is the extra
> wait_on_page_writeback() on every call to swap_writepage(). In the
> meantime, it will have to give up the page lock and then reacquire it.
> Unless you're thinking of something different?

Yes, I mean the other side of the writepage, ie. when page reclaim is
about to attempt to swap it out.

The attached (very untested, in need of splitting up) patch attempts to
solve these problems. Note that it is probably not going to prevent your
SIGBUS, so that will have to be found and fixed individually.

In the meantime, I'll run this through some testing when I get half a
chance.

-- 
SUSE Labs, Novell Inc.

View attachment "mm-swap-fail.patch" of type "text/plain" (16786 bytes)