linux-kernel - Is per_cpu_ptr_to

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-Id: <201112140033.58951.ptesarik@suse.cz>
Date:	Wed, 14 Dec 2011 00:33:58 +0100
From:	Petr Tesarik <ptesarik@...e.cz>
To:	linux-mm@...ck.org
Cc:	LKML <linux-kernel@...r.kernel.org>,
	Vivek Goyal <vgoyal@...hat.com>
Subject: Is per_cpu_ptr_to_phys broken?

Hi folks,

while trying to understand a weird kdump failure, I found out that the 
secondary kernel doesn't get the correct NT_PRSTATUS notes from the primary 
kernel. Further research reveals that the notes are correctly generated, 
corresponding elfcorehdr program headers are created by kexec, but the 
physical address is wrong.

The trouble is that the crash_notes per-cpu variable is not page-aligned:

crash_notes = 0xc08e8ed4
PER-CPU OFFSET VALUES:
  CPU 0: 3711f000
  CPU 1: 37129000
  CPU 2: 37133000
  CPU 3: 3713d000

So, the per-cpu addresses are:
  crash_notes on CPU 0: f7a07ed4 => phys 36b57ed4
  crash_notes on CPU 1: f7a11ed4 => phys 36b4ded4
  crash_notes on CPU 2: f7a1bed4 => phys 36b43ed4
  crash_notes on CPU 3: f7a25ed4 => phys 36b39ed4

However, /sys/devices/system/cpu/cpu*/crash_notes says:
/sys/devices/system/cpu/cpu0/crash_notes: 36b57000
/sys/devices/system/cpu/cpu1/crash_notes: 36b4d000
/sys/devices/system/cpu/cpu2/crash_notes: 36b43000
/sys/devices/system/cpu/cpu3/crash_notes: 36b39000

As you can see, all values are rounded down to a page boundary. Consequently, 
this is where kexec sets up the NOTE segments, and thus where the secondary 
kernel is looking for them. However, when the first kernel crashes, it saves 
the notes to the unaligned addresses, where they are not found.

The value in the crash_notes sysfs attribute are computed as follows:

        addr = per_cpu_ptr_to_phys(per_cpu_ptr(crash_notes, cpunum));

Note that the per-cpu addresses lie between VMALLOC_START (0xf79fe000 on this 
machine) and VMALLOC_END (0xff1fe000).

Now, the per_cpu_ptr_to_phys() function aligns all vmalloc addresses to a page 
boundary. This was probably right when Vivek Goyal introduced that function 
(commit 3b034b0d084221596bf35c8d893e1d4d5477b9cc), because per-cpu addresses
were only allocated by vmalloc if booted with percpu_alloc=page, but this is 
no longer the case, because per-cpu variables are now always allocated that 
way AFAICS.

So, shouldn't we add the offset within the page inside per_cpu_ptr_to_phys?

Signed-off-by: Petr Tesarik <ptesarik@...e.cz>

diff --git a/mm/percpu.c b/mm/percpu.c
index 3bb810a..4c13334 100644
--- a/mm/percpu.c
+++ b/mm/percpu.c
@@ -998,6 +998,7 @@ phys_addr_t per_cpu_ptr_to_phys(void *addr)
 	bool in_first_chunk = false;
 	unsigned long first_low, first_high;
 	unsigned int cpu;
+	phys_addr_t page_addr;

 	/*
 	 * The following test on unit_low/high isn't strictly
@@ -1023,9 +1024,10 @@ phys_addr_t per_cpu_ptr_to_phys(void *addr)
 		if (!is_vmalloc_addr(addr))
 			return __pa(addr);
 		else
-			return page_to_phys(vmalloc_to_page(addr));
+			page_addr = page_to_phys(vmalloc_to_page(addr));
 	} else
-		return page_to_phys(pcpu_addr_to_page(addr));
+		page_addr = page_to_phys(pcpu_addr_to_page(addr));
+	return page_addr + ((unsigned long)addr & ~PAGE_MASK);
 }

 /**

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/