[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230605204256.GA52412@kernel.org>
Date: Mon, 5 Jun 2023 23:42:56 +0300
From: Mike Rapoport <rppt@...nel.org>
To: "Edgecombe, Rick P" <rick.p.edgecombe@...el.com>
Cc: "rostedt@...dmis.org" <rostedt@...dmis.org>,
"tglx@...utronix.de" <tglx@...utronix.de>,
"deller@....de" <deller@....de>,
"mcgrof@...nel.org" <mcgrof@...nel.org>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"nadav.amit@...il.com" <nadav.amit@...il.com>,
"linux@...linux.org.uk" <linux@...linux.org.uk>,
"davem@...emloft.net" <davem@...emloft.net>,
"linux-mips@...r.kernel.org" <linux-mips@...r.kernel.org>,
"linuxppc-dev@...ts.ozlabs.org" <linuxppc-dev@...ts.ozlabs.org>,
"hca@...ux.ibm.com" <hca@...ux.ibm.com>,
"catalin.marinas@....com" <catalin.marinas@....com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"kent.overstreet@...ux.dev" <kent.overstreet@...ux.dev>,
"linux-s390@...r.kernel.org" <linux-s390@...r.kernel.org>,
"palmer@...belt.com" <palmer@...belt.com>,
"chenhuacai@...nel.org" <chenhuacai@...nel.org>,
"tsbogend@...ha.franken.de" <tsbogend@...ha.franken.de>,
"linux-trace-kernel@...r.kernel.org"
<linux-trace-kernel@...r.kernel.org>,
"mpe@...erman.id.au" <mpe@...erman.id.au>,
"linux-parisc@...r.kernel.org" <linux-parisc@...r.kernel.org>,
"x86@...nel.org" <x86@...nel.org>,
"christophe.leroy@...roup.eu" <christophe.leroy@...roup.eu>,
"linux-riscv@...ts.infradead.org" <linux-riscv@...ts.infradead.org>,
"will@...nel.org" <will@...nel.org>,
"dinguyen@...nel.org" <dinguyen@...nel.org>,
"naveen.n.rao@...ux.ibm.com" <naveen.n.rao@...ux.ibm.com>,
"sparclinux@...r.kernel.org" <sparclinux@...r.kernel.org>,
"linux-modules@...r.kernel.org" <linux-modules@...r.kernel.org>,
"bpf@...r.kernel.org" <bpf@...r.kernel.org>,
"linux-arm-kernel@...ts.infradead.org"
<linux-arm-kernel@...ts.infradead.org>,
"song@...nel.org" <song@...nel.org>,
"linux-mm@...ck.org" <linux-mm@...ck.org>,
"loongarch@...ts.linux.dev" <loongarch@...ts.linux.dev>,
"akpm@...ux-foundation.org" <akpm@...ux-foundation.org>
Subject: Re: [PATCH 12/13] x86/jitalloc: prepare to allocate exectuatble
memory as ROX
On Mon, Jun 05, 2023 at 04:10:21PM +0000, Edgecombe, Rick P wrote:
> On Mon, 2023-06-05 at 11:11 +0300, Mike Rapoport wrote:
> > On Sun, Jun 04, 2023 at 10:52:44PM -0400, Steven Rostedt wrote:
> > > On Thu, 1 Jun 2023 16:54:36 -0700
> > > Nadav Amit <nadav.amit@...il.com> wrote:
> > >
> > > > > The way text_poke() is used here, it is creating a new writable
> > > > > alias
> > > > > and flushing it for *each* write to the module (like for each
> > > > > write of
> > > > > an individual relocation, etc). I was just thinking it might
> > > > > warrant
> > > > > some batching or something.
> >
> > > > I am not advocating to do so, but if you want to have many
> > > > efficient
> > > > writes, perhaps you can just disable CR0.WP. Just saying that if
> > > > you
> > > > are about to write all over the memory, text_poke() does not
> > > > provide
> > > > too much security for the poking thread.
> >
> > Heh, this is definitely and easier hack to implement :)
>
> I don't know the details, but previously there was some strong dislike
> of CR0.WP toggling. And now there is also the problem of CET. Setting
> CR0.WP=0 will #GP if CR4.CET is 1 (as it currently is for kernel IBT).
> I guess you might get away with toggling them both in some controlled
> situation, but it might be a lot easier to hack up then to be made
> fully acceptable. It does sound much more efficient though.
I don't think we'd really want that, especially looking at
WARN_ONCE(bits_missing, "CR0 WP bit went missing!?\n");
at native_write_cr0().
> > > Batching does exist, which is what the text_poke_queue() thing
> > > does.
> >
> > For module loading text_poke_queue() will still be much slower than a
> > bunch
> > of memset()s for no good reason because we don't need all the
> > complexity of
> > text_poke_bp_batch() for module initialization because we are sure we
> > are
> > not patching live code.
> >
> > What we'd need here is a new batching mode that will create a
> > writable
> > alias mapping at the beginning of apply_relocate_*() and
> > module_finalize(),
> > then it will use memcpy() to that writable alias and will tear the
> > mapping
> > down in the end.
>
> It's probably only a tiny bit faster than keeping a separate writable
> allocation and text_poking it in at the end.
Right, but it still will be faster than text_poking every relocation.
> > Another option is to teach alternatives to update a writable copy
> > rather
> > than do in place changes like Song suggested. My feeling is that it
> > will be
> > more intrusive change though.
>
> You mean keeping a separate RW allocation and then text_poking() the
> whole thing in when you are done? That is what I was trying to say at
> the beginning of this thread. The other benefit is you don't make the
> intermediate loading states of the module, executable.
>
> I tried this technique previously [0], and I thought it was not too
> bad. In most of the callers it looks similar to what you have in
> do_text_poke(). Sometimes less, sometimes more. It might need
> enlightening of some of the stuff currently using text_poke() during
> module loading, like jump labels. So that bit is more intrusive, yea.
> But it sounds so much cleaner and well controlled. Did you have a
> particular trouble spot in mind?
Nothing in particular, except the intrusive part. Except the changes in
modules.c we'd need to teach alternatives to deal with a writable copy.
> [0]
> https://lore.kernel.org/lkml/20201120202426.18009-5-rick.p.edgecombe@intel.com/
--
Sincerely yours,
Mike.
Powered by blists - more mailing lists