[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <a6dbc5b3-b12e-36b4-0aef-f319264d6e8f@redhat.com>
Date: Wed, 1 Sep 2021 10:28:00 +0200
From: David Hildenbrand <david@...hat.com>
To: "Eric W. Biederman" <ebiederm@...ssion.com>
Cc: Andy Lutomirski <luto@...nel.org>,
Linus Torvalds <torvalds@...ux-foundation.org>,
David Laight <David.Laight@...lab.com>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
"H. Peter Anvin" <hpa@...or.com>,
Al Viro <viro@...iv.linux.org.uk>,
Alexey Dobriyan <adobriyan@...il.com>,
Steven Rostedt <rostedt@...dmis.org>,
"Peter Zijlstra (Intel)" <peterz@...radead.org>,
Arnaldo Carvalho de Melo <acme@...nel.org>,
Mark Rutland <mark.rutland@....com>,
Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
Jiri Olsa <jolsa@...hat.com>,
Namhyung Kim <namhyung@...nel.org>,
Petr Mladek <pmladek@...e.com>,
Sergey Senozhatsky <sergey.senozhatsky@...il.com>,
Andy Shevchenko <andriy.shevchenko@...ux.intel.com>,
Rasmus Villemoes <linux@...musvillemoes.dk>,
Kees Cook <keescook@...omium.org>,
Greg Ungerer <gerg@...ux-m68k.org>,
Geert Uytterhoeven <geert@...ux-m68k.org>,
Mike Rapoport <rppt@...nel.org>,
Vlastimil Babka <vbabka@...e.cz>,
Vincenzo Frascino <vincenzo.frascino@....com>,
Chinwen Chang <chinwen.chang@...iatek.com>,
Michel Lespinasse <walken@...gle.com>,
Catalin Marinas <catalin.marinas@....com>,
"Matthew Wilcox (Oracle)" <willy@...radead.org>,
Huang Ying <ying.huang@...el.com>,
Jann Horn <jannh@...gle.com>, Feng Tang <feng.tang@...el.com>,
Kevin Brodsky <Kevin.Brodsky@....com>,
Michael Ellerman <mpe@...erman.id.au>,
Shawn Anastasio <shawn@...stas.io>,
Steven Price <steven.price@....com>,
Nicholas Piggin <npiggin@...il.com>,
Christian Brauner <christian.brauner@...ntu.com>,
Jens Axboe <axboe@...nel.dk>,
Gabriel Krisman Bertazi <krisman@...labora.com>,
Peter Xu <peterx@...hat.com>,
Suren Baghdasaryan <surenb@...gle.com>,
Shakeel Butt <shakeelb@...gle.com>,
Marco Elver <elver@...gle.com>,
Daniel Jordan <daniel.m.jordan@...cle.com>,
Nicolas Viennot <Nicolas.Viennot@...sigma.com>,
Thomas Cedeno <thomascedeno@...gle.com>,
Collin Fijalkovich <cfijalkovich@...gle.com>,
Michal Hocko <mhocko@...e.com>,
Miklos Szeredi <miklos@...redi.hu>,
Chengguang Xu <cgxu519@...ernel.net>,
Christian König <ckoenig.leichtzumerken@...il.com>,
"linux-unionfs@...r.kernel.org" <linux-unionfs@...r.kernel.org>,
Linux API <linux-api@...r.kernel.org>,
the arch/x86 maintainers <x86@...nel.org>,
linux-fsdevel@...r.kernel.org, Linux-MM <linux-mm@...ck.org>,
Florian Weimer <fweimer@...hat.com>,
Michael Kerrisk <mtk.manpages@...il.com>
Subject: Re: [PATCH v1 0/7] Remove in-tree usage of MAP_DENYWRITE
On 27.08.21 00:13, Eric W. Biederman wrote:
> David Hildenbrand <david@...hat.com> writes:
>
>> On 26.08.21 19:48, Andy Lutomirski wrote:
>>> On Fri, Aug 13, 2021, at 5:54 PM, Linus Torvalds wrote:
>>>> On Fri, Aug 13, 2021 at 2:49 PM Andy Lutomirski <luto@...nel.org> wrote:
>>>>>
>>>>> I’ll bite. How about we attack this in the opposite direction: remove the deny write mechanism entirely.
>>>>
>>>> I think that would be ok, except I can see somebody relying on it.
>>>>
>>>> It's broken, it's stupid, but we've done that ETXTBUSY for a _loong_ time.
>>>
>>> Someone off-list just pointed something out to me, and I think we should push harder to remove ETXTBSY. Specifically, we've all been focused on open() failing with ETXTBSY, and it's easy to make fun of anyone opening a running program for write when they should be unlinking and replacing it.
>>>
>>> Alas, Linux's implementation of deny_write_access() is correct^Wabsurd, and deny_write_access() *also* returns ETXTBSY if the file is open for write. So, in a multithreaded program, one thread does:
>>>
>>> fd = open("some exefile", O_RDWR | O_CREAT | O_CLOEXEC);
>>> write(fd, some stuff);
>>>
>>> <--- problem is here
>>>
>>> close(fd);
>>> execve("some exefile");
>>>
>>> Another thread does:
>>>
>>> fork();
>>> execve("something else");
>>>
>>> In between fork and execve, there's another copy of the open file description, and i_writecount is held, and the execve() fails. Whoops. See, for example:
>>>
>>> https://github.com/golang/go/issues/22315
>>>
>>> I propose we get rid of deny_write_access() completely to solve this.
>>>
>>> Getting rid of i_writecount itself seems a bit harder, since a handful of filesystems use it for clever reasons.
>>>
>>> (OFD locks seem like they might have the same problem. Maybe we should have a clone() flag to unshare the file table and close close-on-exec things?)
>>>
>>
>> It's not like this issue is new (^2017) or relevant in practice. So no
>> need to hurry IMHO. One step at a time: it might make perfect sense to
>> remove ETXTBSY, but we have to be careful to not break other user
>> space that actually cares about the current behavior in practice.
>
> It is an old enough issue that I agree there is no need to hurry.
>
> I also ran into this issue not too long ago when I refactored the
> usermode_driver code. My challenge was not being in userspace
> the delayed fput was not happening in my kernel thread. Which meant
> that writing the file, then closing the file, then execing the file
> consistently reported -ETXTBSY.
>
> The kernel code wound up doing:
> /* Flush delayed fput so exec can open the file read-only */
> flush_delayed_fput();
> task_work_run();
>
> As I read the code the delay for userspace file descriptors is
> always done with task_work_add, so userspace should not hit
> that kind of silliness, and should be able to actually close
> the file descriptor before the exec.
>
>
> On the flip side, I don't know how anything can depend upon getting an
> -ETXTBSY. So I don't think there is any real risk of breaking userspace
> if we remove it.
At least in LTP, we have two test cases testing exactly that behavior:
testcases/kernel/syscalls/creat/creat07.c
testcases/kernel/syscalls/execve/execve04.c
--
Thanks,
David / dhildenb
Powered by blists - more mailing lists