lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20200605222257.44882-1-saravanak@google.com>
Date:   Fri,  5 Jun 2020 15:22:57 -0700
From:   Saravana Kannan <saravanak@...gle.com>
To:     Catalin Marinas <catalin.marinas@....com>,
        Will Deacon <will@...nel.org>
Cc:     Saravana Kannan <saravanak@...gle.com>,
        Ard Biesheuvel <ard.biesheuvel@...aro.org>,
        kernel-team@...roid.com, linux-arm-kernel@...ts.infradead.org,
        linux-kernel@...r.kernel.org
Subject: [PATCH v1] arm64/module: Optimize module load time by optimizing PLT counting

When loading a module, module_frob_arch_sections() tries to figure out
the number of PLTs that'll be needed to handle all the RELAs. While
doing this, it tries to dedupe PLT allocations for multiple
R_AARCH64_CALL26 relocations to the same symbol. It does the same for
R_AARCH64_JUMP26 relocations too.

To make checks for duplicates easier/faster, it sorts the relocation
list by type, symbol and addend. That way, to check for a duplicate
relocation, it just needs to compare with the previous entry.

However, sorting the entire relocation array is unnecessary and
expensive (O(n log n)) because there are a lot of other relocation types
that don't need deduping or can't be deduped.

So this commit partitions the array into entries that need deduping and
those that don't. And then sorts just the part that needs deduping. And
when CONFIG_RANDOMIZE_BASE is disabled, the sorting is skipped entirely
because PLTs are not allocated for R_AARCH64_CALL26 and R_AARCH64_JUMP26
if it's disabled.

This gives significant reduction in module load time for modules with
large number of relocations with no measurable impact on modules with a
small number of relocations. In my test setup with CONFIG_RANDOMIZE_BASE
enabled, the load time for one module went down from 268ms to 100ms.
Another module went down from 143ms to 83ms.

This commit also disables the sorting if CONFIG_RANDOMIZE_BASE is
disabled because it looks like PLTs are not allocated for
R_AARCH64_CALL26 and R_AARCH64_JUMP26 if it's disabled.

Cc: Ard Biesheuvel <ard.biesheuvel@...aro.org>
Signed-off-by: Saravana Kannan <saravanak@...gle.com>
---
 arch/arm64/kernel/module-plts.c | 37 ++++++++++++++++++++++++++++++++-
 1 file changed, 36 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kernel/module-plts.c b/arch/arm64/kernel/module-plts.c
index 65b08a74aec6..bf5118b3b828 100644
--- a/arch/arm64/kernel/module-plts.c
+++ b/arch/arm64/kernel/module-plts.c
@@ -253,6 +253,36 @@ static unsigned int count_plts(Elf64_Sym *syms, Elf64_Rela *rela, int num,
 	return ret;
 }
 
+static bool rela_needs_dedup(Elf64_Rela *rela)
+{
+	return ELF64_R_TYPE(rela->r_info) == R_AARCH64_JUMP26
+	       || ELF64_R_TYPE(rela->r_info) == R_AARCH64_CALL26;
+}
+
+/* Group the CALL26/JUMP26 relas toward the beginning of the array. */
+static int partition_dedup_relas(Elf64_Rela *rela, int numrels)
+{
+	int i = 0, j = numrels - 1;
+	Elf64_Rela t;
+
+	while (i < j) {
+		while (rela_needs_dedup(rela + i) && i < j)
+			i++;
+		while (!rela_needs_dedup(rela + j) && i < j)
+			j--;
+		if (i < j) {
+			t = *(rela + j);
+			*(rela + j) = *(rela + i);
+			*(rela + i) = t;
+		}
+	}
+	/* If the entire array needs dedup, make sure i == numrels */
+	if (rela_needs_dedup(rela + i))
+		i++;
+
+	return i;
+}
+
 int module_frob_arch_sections(Elf_Ehdr *ehdr, Elf_Shdr *sechdrs,
 			      char *secstrings, struct module *mod)
 {
@@ -291,6 +321,7 @@ int module_frob_arch_sections(Elf_Ehdr *ehdr, Elf_Shdr *sechdrs,
 	for (i = 0; i < ehdr->e_shnum; i++) {
 		Elf64_Rela *rels = (void *)ehdr + sechdrs[i].sh_offset;
 		int numrels = sechdrs[i].sh_size / sizeof(Elf64_Rela);
+		int num_dedup_rels = 0;
 		Elf64_Shdr *dstsec = sechdrs + sechdrs[i].sh_info;
 
 		if (sechdrs[i].sh_type != SHT_RELA)
@@ -300,8 +331,12 @@ int module_frob_arch_sections(Elf_Ehdr *ehdr, Elf_Shdr *sechdrs,
 		if (!(dstsec->sh_flags & SHF_EXECINSTR))
 			continue;
 
+		if (IS_ENABLED(CONFIG_RANDOMIZE_BASE))
+			num_dedup_rels = partition_dedup_relas(rels, numrels);
 		/* sort by type, symbol index and addend */
-		sort(rels, numrels, sizeof(Elf64_Rela), cmp_rela, NULL);
+		if (num_dedup_rels)
+			sort(rels, num_dedup_rels, sizeof(Elf64_Rela),
+			     cmp_rela, NULL);
 
 		if (!str_has_prefix(secstrings + dstsec->sh_name, ".init"))
 			core_plts += count_plts(syms, rels, numrels,
-- 
2.27.0.278.ge193c7cf3a9-goog

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ