linux-kernel - Re: [REGRESSION] ptrace broken from "cgroup: cgroup v2 freezer" (76f969e)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20190513170354.GB10982@tower.DHCP.thefacebook.com>
Date:   Mon, 13 May 2019 17:03:58 +0000
From:   Roman Gushchin <guro@...com>
To:     "Alex Xu (Hello71)" <alex_y_xu@...oo.ca>
CC:     "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "tj@...nel.org" <tj@...nel.org>,
        "oleg@...hat.com" <oleg@...hat.com>,
        "Kernel Team" <Kernel-team@...com>
Subject: Re: [REGRESSION] ptrace broken from "cgroup: cgroup v2 freezer"
 (76f969e)

Hi Alex!

Thank you for the report!
It's super clear, and contains all the details, so it took me 30s
to reproduce the issue. Really appreciate your effort!



On Sun, May 12, 2019 at 09:20:12PM -0400, Alex Xu (Hello71) wrote:
> Hi,
> 
> I was trying to use strace recently and found that it exhibited some 
> strange behavior. I produced this minimal test case:
> 
> #include <unistd.h>
> 
> int main() {
>     write(1, "a", 1);
>     return 0;
> }
> 
> which, when run using "gcc test.c && strace ./a.out" produces this 
> strace output:
> 
> [ pre-main omitted ]
> write(1, "a", 1)                        = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
> write(1, "a", 1)                        = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
> write(1, "a", 1)                        = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
> write(1, "a", 1)                        = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
> write(1, "a", 1)                        = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
> write(1, "a", 1)                        = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
> [ repeats forever ]
> 
> The correct result is of course:
> 
> [ pre-main omitted ]
> write(1, "a", 1)                        = 1
> exit_group(0)                           = ?
> +++ exited with 0 +++
> 
> Strangely, this only occurs when outputting to a tty-like output. 
> Running "strace ./a.out" from a native Linux x86 console or a terminal 
> emulator causes the abnormal behavior. However, the following commands 
> work correctly:
> 
> - strace ./a.out >/dev/null
> - strace ./a.out >/tmp/a # /tmp is a standard tmpfs
> - strace ./a.out >&- # causes -1 EBADF (Bad file descriptor)
> 
> "strace -o /tmp/a ./a.out" hangs and produces the above (infinite) 
> output to /tmp/a.
> 
> I bisected this to 76f969e, "cgroup: cgroup v2 freezer". I reverted the 
> entire patchset (reverting only that one caused a conflict), which 
> resolved the issue. I skimmed the patch and came up with this 
> workaround, which also resolves the issue. I am not at all clear on the 
> technical workings of the patchset, but it seems to me like a process's 
> frozen status is supposed to be "suspended" when a frozen process is 
> ptraced, and "unsuspended" when ptracing ends. Therefore, it seems 
> suspicious to always "enter frozen" whether or not the cgroup is 
> actually frozen. It seems like the code should instead check if the 
> cgroup is actually frozen, and if so, restore the frozen status.

So, the thing is that when the freezer tries to freeze all tasks
in the cgroup, some tasks may sleep (e.g. being SIGSTOPPed),
and the freezer can't get them out of this state and put them back correctly
after unfreezing. So instead it leaves such tasks in the original state
and treats them as frozen. This is why we need this unconditional
cgroup_enter_frozen(). It's not the problem.

Anyway, I'm sure that with great help from Oleg we'll be able
to fix the issue very soon (I already posted a preliminary patch).

Once again, thank you for the report!

Roman