linux-kernel - Re: [PATCH -mm] fault-inject: avoid unwanted data race to task->fail

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CACT4Y+bAia6-OqHwGjSwhavpffcY3oXqHQQ7Y=D7sK72iKaU=g@mail.gmail.com>
Date:   Tue, 1 Aug 2017 15:45:08 +0200
From:   Dmitry Vyukov <dvyukov@...gle.com>
To:     Lu Fengqi <lufq.fnst@...fujitsu.com>
Cc:     Akinobu Mita <akinobu.mita@...il.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH -mm] fault-inject: avoid unwanted data race to task->fail_nth

On Tue, Aug 1, 2017 at 3:09 PM, Lu Fengqi <lufq.fnst@...fujitsu.com> wrote:
> On Fri, Jul 14, 2017 at 01:14:52AM +0900, Akinobu Mita wrote:
>>The fault-inject-make-fail-nth-read-write-interface-symmetric.patch in
>>-mm tree allows users to set task->fail_nth for non current task by procfs.
>>On the other hand, the current task's fail_nth is decreased to zero in
>>fault-injection path without any specific locks.
>>
>>So we need to prevent the task->fail_nth from being unexpected value by
>>data races (for example, setting task->fail_nth to zero while decreasing
>>the current->fail_nth).  In this fix, we use READ_ONCE() and WRITE_ONCE()
>>to prevent the compiler from creating unsolicited accesses.
>>
>>Cc: Dmitry Vyukov <dvyukov@...gle.com>
>>Reported-by: Dmitry Vyukov <dvyukov@...gle.com>
>>Signed-off-by: Akinobu Mita <akinobu.mita@...il.com>
>>---
>> fs/proc/base.c     | 5 +++--
>> lib/fault-inject.c | 7 +++++--
>> 2 files changed, 8 insertions(+), 4 deletions(-)
>>
>>diff --git a/fs/proc/base.c b/fs/proc/base.c
>>index ecc8a25..719c2e9 100644
>>--- a/fs/proc/base.c
>>+++ b/fs/proc/base.c
>>@@ -1370,7 +1370,7 @@ static ssize_t proc_fail_nth_write(struct file *file, const char __user *buf,
>>       task = get_proc_task(file_inode(file));
>>       if (!task)
>>               return -ESRCH;
>>-      task->fail_nth = n;
>>+      WRITE_ONCE(task->fail_nth, n);
>>       put_task_struct(task);
>>
>>       return count;
>>@@ -1386,7 +1386,8 @@ static ssize_t proc_fail_nth_read(struct file *file, char __user *buf,
>>       task = get_proc_task(file_inode(file));
>>       if (!task)
>>               return -ESRCH;
>>-      len = snprintf(numbuf, sizeof(numbuf), "%u\n", task->fail_nth);
>>+      len = snprintf(numbuf, sizeof(numbuf), "%u\n",
>>+                      READ_ONCE(task->fail_nth));
>>       len = simple_read_from_buffer(buf, count, ppos, numbuf, len);
>>       put_task_struct(task);
>>
>>diff --git a/lib/fault-inject.c b/lib/fault-inject.c
>>index 09ac73c1..7d315fd 100644
>>--- a/lib/fault-inject.c
>>+++ b/lib/fault-inject.c
>>@@ -107,9 +107,12 @@ static inline bool fail_stacktrace(struct fault_attr *attr)
>>
>> bool should_fail(struct fault_attr *attr, ssize_t size)
>> {
>>-      if (in_task() && current->fail_nth) {
>>-              if (--current->fail_nth == 0)
>>+      if (in_task()) {
>>+              unsigned int fail_nth = READ_ONCE(current->fail_nth);
>>+
>>+              if (fail_nth && !WRITE_ONCE(current->fail_nth, fail_nth - 1))
>>                       goto fail;
>>+
>>               return false;
>>       }
>>
>>--
>>2.7.4
>>
>>
>>
> hi
>
> I'm a btrfs developer. I found that fail_make_request didn't produce the
> expected IO ERROR when running xfstests on linux 4.13-rc1.
>
> That testcase enable fail_make_request by the following commands:
> # echo 100 > /sys/kernel/debug/fail_make_request/probability
> # echo 2 > /sys/kernel/debug/fail_make_request/times
> # echo 0 > /sys/kernel/debug/fail_make_request/verbose
> # echo 1 > /sys/block/sda/sda1/make-it-fail
> # dd if=/dev/zero of=/dev/sda1 bs=128K count=1 oflag=direct
>
> As I understand it, after applying this patch, I have to write
> /proc/<dd pid>/file-nth firstly so that dd process can catch the IO ERROR.
> However, the dd process is so fast that I can't write file-nth.
>
> So, could you tell me how to produce IO ERROR under these circumstances?

Hi,

fail-nth is orthogonal to the existing mechanisms, so if you have a
setup that fails all sites with certain probability, that should
continue to work.

If you are writing a new facility and want to use fail-nth, then the
test process itself needs to cooperate and write fail-nth accordingly.
See the original patch for an example of how to do it:
https://groups.google.com/d/msg/syzkaller/DbB4rjYd82s/3MHDwtcqCAAJ