lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Sat, 6 Jan 2018 14:34:17 +0100
From:   Michal Hocko <mhocko@...nel.org>
To:     Sergey Senozhatsky <sergey.senozhatsky@...il.com>
Cc:     Andrew Morton <akpm@...ux-foundation.org>,
        Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>,
        Minchan Kim <minchan@...nel.org>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH] mm: ratelimit end_swap_bio_write() error

On Sat 06-01-18 19:03:13, Sergey Senozhatsky wrote:
> Hello,
> 
> On (01/06/18 10:41), Michal Hocko wrote:
> > On Sat 06-01-18 13:34:07, Sergey Senozhatsky wrote:
> > > Use the ratelimited printk() version for swap-device write error
> > > reporting. We can use ZRAM as a swap-device, and the tricky part
> > > here is that zsmalloc() stores compressed objects in memory, thus
> > > it has to allocates pages during swap-out. If the system is short
> > > on memory, then we begin to flood printk() log buffer with the
> > > same "Write-error on swap-device XXX" error messages and sometimes
> > > simply lockup the system.
> > 
> > Should we print an error in such a situation at all? Write-error
> > certainly sounds scare and it suggests something went really wrong.
> > My understading is that zram failed swap-out is not critical and
> > therefore the error message is not really useful.
> 
> I don't mind to get rid of it. up to you :)

I do not think we can get rid of it for all swap backends.

> > Or what should an admin do when seeing it?
> 
> zsmalloc allocation is just one possibility; an error in
> compressing algorithm is another one, yet is rather unlikely.
> most likely it's OOM which can cause problems. but in any case
> it's sort of unclear what should be done. an error can be a
> temporary one or a fatal one, just like in __swap_writepage()
> case. so may be both write error printk()-s can be dropped.

Then I would suggest starting with sorting out which of those errors are
critical and which are not and report the error accordingly. I am sorry
to be fuzzy here but I am not familiar with the code to be more
specific. Anyway ratelimiting sounds more like a paper over than a real
solution. Also it sounds quite scary that you can see so many failures
to actually lock up the system just by printing a message...
-- 
Michal Hocko
SUSE Labs

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ