lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 20 Jun 2011 20:26:08 +0800
From:	stufever@...il.com
To:	linux-ext4@...r.kernel.org
Cc:	Wang Shaoyan <wangshaoyan.pt@...bao.com>, Ted Tso <tytso@....edu>,
	Jan Kara <jack@...e.cz>, Lukas Czerner <lczerner@...hat.com>
Subject: [PATCH v2] ext4: Set file system to read-only by I/O error threshold

From: Wang Shaoyan <wangshaoyan.pt@...bao.com>

changes from v1 -> v2 :
When counter is greater than threshold, don't call ext4_abort(), but check the mount errors_* option.

Some version of Hadoop uses access(2) to check whether the data chunk harddisk is online, if access(2) returns "Read-only file system" error, hadoop marks the disk which it called access(2) to as offline. This method works for Ext3/4 with journal, because when jbd/jbd2 encounters I/O error, the file system will be set as read-only. For Ext4 no-journal mode, there is no jdb2 to set the file system as read-only when I/O error happens, the access(2) from Hadoop is not able to reliably detect hard disk offline condition.

This patch tries to fix the above problem from kernel side. 
1.Mount file system with errors=remount-ro
	mount -t -o errors=remount-ro /dev/sd[?] some_dir 
2.The counter reach the threshold:
	1) inside the sampling interval, I/O errors come more than pre-set threshold happens
	2) I/O errors always happen in continous sampling intervals, the sum of errors exceeds pre-set threshold

Then the application can find the file system is set as read-only, and call its own failure tolerance procedures.

There are 2 interface exported to user space via sysfs:
/sys/fs/ext4/sd[?]/eio_threshold --- I/O error threshold to check mount errors options
/sys/fs/ext4/sd[?]/eio_interval  --- sampling interval in second

If default value of eio_threshold is 0, everything happens as before.

Cc: Ted Tso <tytso@....edu>
Cc: Jan Kara <jack@...e.cz>
Cc: Lukas Czerner <lczerner@...hat.com>
Reviewed-by: Coly Li <bosong.ly@...bao.com>
Reviewed-by: Liu Yuan <tailai.ly@...bao.com>
Signed-off-by: Wang Shaoyan <wangshaoyan.pt@...bao.com>
---
 fs/ext4/ext4.h  |    7 +++++++
 fs/ext4/super.c |   51 +++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 58 insertions(+), 0 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 1921392..b08348e 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -1214,6 +1214,13 @@ struct ext4_sb_info {
 
 	/* Kernel thread for multiple mount protection */
 	struct task_struct *s_mmp_tsk;
+
+	/* IO error count */
+	spinlock_t s_eio_lock;
+	unsigned int s_eio_threshold;
+	unsigned int s_eio_interval;
+	unsigned int s_eio_counter;
+	unsigned long s_eio_last_jiffies;
 };
 
 static inline struct ext4_sb_info *EXT4_SB(struct super_block *sb)
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index cc5c157..1b3fc81 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -384,6 +384,19 @@ static void save_error_info(struct super_block *sb, const char *func,
 	ext4_commit_super(sb, 1);
 }
 
+static inline void inc_sb_error_count(struct super_block *sb)
+{
+	struct ext4_sb_info *sbi = EXT4_SB(sb);
+
+	if (time_after(sbi->s_eio_last_jiffies + sbi->s_eio_interval * HZ, jiffies)) {
+		sbi->s_eio_counter++;
+	} else {
+		sbi->s_eio_counter = 1;
+	}
+
+	sbi->s_eio_last_jiffies = jiffies;
+	ext4_msg(sb, KERN_CRIT, "IO error count total: %d", sbi->s_eio_counter);
+}
 
 /* Deal with the reporting of failure conditions on a filesystem such as
  * inconsistencies detected or read IO failures.
@@ -402,9 +415,21 @@ static void save_error_info(struct super_block *sb, const char *func,
 
 static void ext4_handle_error(struct super_block *sb)
 {
+	struct ext4_sb_info *sbi = EXT4_SB(sb);
+
 	if (sb->s_flags & MS_RDONLY)
 		return;
 
+	spin_lock(&sbi->s_eio_lock);
+	inc_sb_error_count(sb);
+	spin_unlock(&sbi->s_eio_lock);
+
+	/* When the counter is greater than threshold(default 0), 
+	 * like usual, we check the ERRORS_*. Otherwise, we just return.
+	 */
+	if (sbi->s_eio_counter <= sbi->s_eio_threshold)
+		return;
+
 	if (!test_opt(sb, ERRORS_CONT)) {
 		journal_t *journal = EXT4_SB(sb)->s_journal;
 
@@ -2471,6 +2496,22 @@ static ssize_t inode_readahead_blks_store(struct ext4_attr *a,
 	return count;
 }
 
+static ssize_t eio_interval_store(struct ext4_attr *a,
+					  struct ext4_sb_info *sbi,
+					  const char *buf, size_t count)
+{
+	unsigned long t;
+
+	if (parse_strtoul(buf, 0xffffffff, &t))
+		return -EINVAL;
+
+	if (t <= 0)
+		return -EINVAL;
+
+	sbi->s_eio_interval = t;
+	return count;
+}
+
 static ssize_t sbi_ui_show(struct ext4_attr *a,
 			   struct ext4_sb_info *sbi, char *buf)
 {
@@ -2524,6 +2565,9 @@ EXT4_RW_ATTR_SBI_UI(mb_order2_req, s_mb_order2_reqs);
 EXT4_RW_ATTR_SBI_UI(mb_stream_req, s_mb_stream_request);
 EXT4_RW_ATTR_SBI_UI(mb_group_prealloc, s_mb_group_prealloc);
 EXT4_RW_ATTR_SBI_UI(max_writeback_mb_bump, s_max_writeback_mb_bump);
+EXT4_RW_ATTR_SBI_UI(eio_threshold, s_eio_threshold);
+EXT4_ATTR_OFFSET(eio_interval, 0644, sbi_ui_show,
+		 eio_interval_store, s_eio_interval);
 
 static struct attribute *ext4_attrs[] = {
 	ATTR_LIST(delayed_allocation_blocks),
@@ -2540,6 +2584,8 @@ static struct attribute *ext4_attrs[] = {
 	ATTR_LIST(mb_stream_req),
 	ATTR_LIST(mb_group_prealloc),
 	ATTR_LIST(max_writeback_mb_bump),
+	ATTR_LIST(eio_threshold),
+	ATTR_LIST(eio_interval),
 	NULL,
 };
 
@@ -3464,6 +3510,11 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
 	sbi->s_stripe = ext4_get_stripe_size(sbi);
 	sbi->s_max_writeback_mb_bump = 128;
 
+	spin_lock_init(&sbi->s_eio_lock);
+	sbi->s_eio_threshold = 0;
+	sbi->s_eio_interval = 5;
+	sbi->s_eio_counter = 0;
+
 	/*
 	 * set up enough so that it can read an inode
 	 */
-- 
1.7.4.1

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ