[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20140515141515.GA21632@thunk.org>
Date: Thu, 15 May 2014 10:15:15 -0400
From: Theodore Ts'o <tytso@....edu>
To: Jan Kara <jack@...e.cz>
Cc: Mateusz Guzik <mguzik@...hat.com>,
Dave Chinner <david@...morbit.com>,
linux-kernel@...r.kernel.org, linux-fsdevel@...r.kernel.org,
Josef Bacik <jbacik@...com>, Al Viro <viro@...IV.linux.org.uk>,
Eric Sandeen <esandeen@...hat.com>
Subject: Re: [PATCH 2/2] fs: print a message when freezing/unfreezing
filesystems
On Thu, May 15, 2014 at 03:46:10PM +0200, Jan Kara wrote:
> > Saving it in the superblock would require changing a bunch of file
> > systems. What if we store this information in memory, and print it
> > out under certain conditions (i.e., after a soft lockup detection, or
> > upon request of some magic sysrq request)?
> By 'superblock' I meant 'struct super_block' ;) So we are in agreement I
> believe.
Ah, yes, we're in agreement. I thought you were talking about the
on-disk superblock.
> > Or we could create a tunable threshold and print a message after a
> > file system has been frozen more than a particular specified duration,
> > with that duration set conservatively to something like 60 or 120
> > seconds by default.
> I was thinking about this as well but all these "warn after X seconds"
> warnings tend to have quite a few false positives in practice so dumping
> this in emergency-thaw sysrq handler or exposing the information somewhere
> in proc (e.g. mountinfo) would look like a better option to me.
Well, we already have the soft lockup warning, which sometimes has
some false positives, but in practice, if a process is runable but
doesn't get to run in 2 minutes (the default is 20 seconds, but we've
used 2 minutes to avoid the false positive problem on a super busy
system), something is probably clearly wrong.
Similarly, if a process is trying to write to a frozen file system,
and can't after two minutes, something is almost certainly wrong, or
least, it's something a system administrator should know about it. We
can argue over whether the default threshold should be 20 seconds, or
120 seconds, or 2 hours, but I think there would be agreement that for
pretty much any configuration, there is some delay after which
printing a message is actually the right thing to do. (Yes, "time
that a process is waiting to write to a frozen file system != time the
file system is frozen" --- the latter is easier to implement, but if
people feel strongly about it, the former isn't that much more
difficult.)
The problem with using an sysrq handler is the user has to know how to
use it. If the user files a bug saying the system has mysteriously
hung, the fact that the system log contains a hint as to what might be
going on would be very useful for an enterprise distribiution's help
desk. (Yes, this won't help if it's the root file system is the one
that's been frozen, unless the customer has configured remote syslog.
But for many cases, it might provide a vital clue that could save a
lot of time and support costs.)
- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists