lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20180108102234.GA818@jagdpanzerIV>
Date:   Mon, 8 Jan 2018 19:22:34 +0900
From:   Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>
To:     Michal Hocko <mhocko@...nel.org>
Cc:     Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>,
        Sergey Senozhatsky <sergey.senozhatsky@...il.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>,
        Minchan Kim <minchan@...nel.org>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH] mm: ratelimit end_swap_bio_write() error

On (01/08/18 09:37), Michal Hocko wrote:
[..]
> > the lockup is not the main problem and I'm not really trying to
> > address it here. we simply can fill up the entire kernel logbuf
> > with the same "Write-error on swap-device" errors.
> 
> Your changelog is rather modest on the information.

fair point!

> Could you be more specific on how the problem actually happens how
> likely it is?

ok. so what we have is

	slow_path / swap-out page
	 __zram_bvec_write(page)
	  compressed_page = zcomp_compress(page)
	   zs_malloc(compressed_page)
	    // no available zspage found, need to allocate new
	     alloc_zspage()
	     {
		for (i = 0; i < class->pages_per_zspage; i++)
		    page = alloc_page(gfp);
		    if (!page)
			    return NULL
	     }

	 return -ENOMEM
	...
	printk("Write-error on swap-device...");


zspage-s can consist of up to ->pages_per_zspage normal pages.
if alloc_page() fails then we can't allocate the entire zspage,
so we can't store the swapped out page, so it remains in ram
and we don't make any progress. so we try to swap another page
and may be do the whole zs_malloc()->alloc_zspage() again, may
be not. depending on how bad the OOM situation is there can be
few or many "Write-error on swap-device" errors.

> And again, I do not think the throttling is an appropriate counter
> measure. We do want to print those messages when a critical situation
> happens. If we have a fallback then simply do not print at all.

sure, but with the ratelimited printk we still print those messages.
we just don't print it for every single page we failed to write
to the device. the existing error messages can (*sometimes*) be noisy
and not very informative - "Write-error on swap-device (%u:%u:%llu)\n";
it's not like 1000 of those tell more than 1 or 10.

	-ss

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ