[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <52631017.6010001@redhat.com>
Date: Sat, 19 Oct 2013 18:04:55 -0500
From: Eric Sandeen <sandeen@...hat.com>
To: "Theodore Ts'o" <tytso@....edu>
CC: Ext4 Developers List <linux-ext4@...r.kernel.org>
Subject: Re: [PATCH] ext4: add ratelimiting to ext4 messages
On 10/18/13 1:59 PM, Theodore Ts'o wrote:
> On Fri, Oct 18, 2013 at 09:08:40AM -0500, Eric Sandeen wrote:
>> On 10/17/13 8:28 PM, Theodore Ts'o wrote:
>>> In the case of a storage device that suddenly disappears, or in the
>>> case of significant file system corruption, this can result in a huge
>>> flood of messages being sent to the console. This can overflow the
>>> file system containing /var/log/messages, or if a serial console is
>>> configured, this can slow down the system so much that a hardware
>>> watchdog can end up triggering forcing a system reboot.
>>
>> Just out of curiosity, after the fs shuts down, is there still a flood
>> of messages? Shouldn't that clamp down on the errors?
>
> Not if we are running with errors=continue.
Maybe the ratelimit should depend on that then? I'm just concerned about
the possibility of filtering messages that, rather than being a nuisance,
are vital to figuring out what went wrong.
(granted, it's probably the first error or two that matters)
Or maybe it's only relevant with errors=continue, and errors=remount-ro
will be self-limiting in any case.
> There are some ugly
> patches in our tree which pipes error notifications to a netlink
> socket, which allows userspace to do something intelligent with
> errors, and because there are some errors where it's safe to continue
> (especially if you are willing to shut down block allocations to the
> block group where you don't trust the allocation bitmap), we tend to
> run with errors=continue.
hm... :)
> I think I mentioned the errors->netlink feature a while back, but
> there wasn't a whole lot of excitement about it, and the patches
> definitely need a lot of cleanup before they would be ready for
> upstream merging. If people are curious, I can look into getting the
> patches sent out, since we just finished rebasing them to 3.11.
>
>> If not, shouldn't it do so? xfs has a lot of short-circuiting if
>> the filesystem is shut down, so it (I think) won't get into paths that
>> will generate more errors.
>
> When xfs "shuts down" the file system, it doesn't allow any read or
> write accesses, right? So it's basically an even stronger version of
> errors=remount-ro. We should perhaps discuss whether it would be
> better to squelch errors if we've remounted the file system read-only,
> or whether we should implement a complete shutdown errors option.
Yeah, there is no errors=continue type option, that is probably too
dangerous in general for the majority of users.
I'd guess that w/ default remount-ro, the error flood isn't a risk.
> And of course, even if we did this, we would still need to squelch
> ext4_warning and ext4_msg output. (Although I agree with Lukas that
> it might not be a bad idea to review some of the messages that either
> get emitted via printk, or which are issued via ext4_msg(KERN_CRIT) to
> see if we should perhaps change some of those to ext4_error.)
*nod*
Thanks,
-Eric
> Regards,
>
> - Ted
>
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists