linux-ext4 - [Bug 194071] New: data loss using fallocate and mmap

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <bug-194071-13602@https.bugzilla.kernel.org/>
Date:   Mon, 06 Feb 2017 10:59:24 +0000
From:   bugzilla-daemon@...zilla.kernel.org
To:     linux-ext4@...r.kernel.org
Subject: [Bug 194071] New: data loss using fallocate and mmap

https://bugzilla.kernel.org/show_bug.cgi?id=194071

            Bug ID: 194071
           Summary: data loss using fallocate and mmap
           Product: File System
           Version: 2.5
    Kernel Version: 4.4.0+
          Hardware: x86-64
                OS: Linux
              Tree: Mainline
            Status: NEW
          Severity: high
          Priority: P1
         Component: ext4
          Assignee: fs_ext4@...nel-bugs.osdl.org
          Reporter: michael@...rm64.com
        Regression: No

Created attachment 254231
  --> https://bugzilla.kernel.org/attachment.cgi?id=254231&action=edit
Example C program

After calling fallocate() on a shared mmap'ed file and writing data into the
newly allocated region, occasionally (first observed after running for ~1 week)
some data is replaced by 0s. The address and size of corrupted data is also not
reproducible.

The initial failure was debugged and reduced to a C++ program that failed with
both gcc and clang, and later to the attached C program. The amount allocated
every iteration was reduced to 1 byte because that caused faster failures, and
wasn't reproducible with higher power of 2 sizes.

Is this a bug or user error?

OS: Ubuntu 16.04.1 LTS
kernel versions: 4.4.0-38-generic, 4.9.7-040907-generic
block device: Observed on both /dev/ram0 and local SSD
ext4 mount options: (rw, relatime,data=ordered)

Unable to reproduce when using the "FALLOC_FL_ZERO_RANGE" flag, and on a tmpfs
ram disk.

Reproduction steps:
sudo mkdir /mnt/ram0
sudo mkfs.ext4 /dev/ram0
sudo mount /dev/ram0 /mnt/ram0/
gcc -O2 tests_mmap_fallocate.c -o tests_mmap_fallocate_gcc
while sudo rm -f /mnt/ram0/tests_mmap_fallocate && sudo
./tests_mmap_fallocate_gcc; do date && sleep 1; done
...
...
...
Value has been modified
(Also nothing found in /var/log/kern.log)

On a development machine the failure only occurs after several days of running
in a loop, but fails within minutes on a virtualized Linux machine on a server.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.