[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKv+Gu_hPaxrVtsBOoviRraYk4FWnT9zQVCVF=i27xd_nGHryw@mail.gmail.com>
Date: Tue, 28 Aug 2018 18:09:09 +0200
From: Ard Biesheuvel <ard.biesheuvel@...aro.org>
To: Nicholas Piggin <nicholas.piggin@...il.com>
Cc: Andreas Schwab <schwab@...ux-m68k.org>,
"<netdev@...r.kernel.org>" <netdev@...r.kernel.org>,
linuxppc-dev@...abs.org, Jessica Yu <jeyu@...nel.org>,
Michael Ellerman <mpe@...erman.id.au>,
Will Deacon <will.deacon@....com>,
Ingo Molnar <mingo@...nel.org>,
Andrew Morton <akpm@...ux-foundation.org>,
linux-arch <linux-arch@...r.kernel.org>
Subject: Re: Oops running iptables -F OUTPUT
On 28 August 2018 at 15:56, Ard Biesheuvel <ard.biesheuvel@...aro.org> wrote:
> Hello Andreas, Nick,
>
> On 28 August 2018 at 06:06, Nicholas Piggin <nicholas.piggin@...il.com> wrote:
>> On Mon, 27 Aug 2018 19:11:01 +0200
>> Andreas Schwab <schwab@...ux-m68k.org> wrote:
>>
>>> I'm getting this Oops when running iptables -F OUTPUT:
>>>
>>> [ 91.139409] Unable to handle kernel paging request for data at address 0xd0000001fff12f34
>>> [ 91.139414] Faulting instruction address: 0xd0000000016a5718
>>> [ 91.139419] Oops: Kernel access of bad area, sig: 11 [#1]
>>> [ 91.139426] BE SMP NR_CPUS=2 PowerMac
>>> [ 91.139434] Modules linked in: iptable_filter ip_tables x_tables bpfilter nfsd auth_rpcgss lockd grace nfs_acl sunrpc tun af_packet snd_aoa_codec_tas snd_aoa_fabric_layout snd_aoa snd_aoa_i2sbus snd_aoa_soundbus snd_pcm_oss snd_pcm snd_seq snd_timer snd_seq_device snd_mixer_oss snd sungem sr_mod firewire_ohci cdrom sungem_phy soundcore firewire_core pata_macio crc_itu_t sg hid_generic usbhid linear md_mod ohci_pci ohci_hcd ehci_pci ehci_hcd usbcore usb_common dm_snapshot dm_bufio dm_mirror dm_region_hash dm_log dm_mod sata_svw
>>> [ 91.139522] CPU: 1 PID: 3620 Comm: iptables Not tainted 4.19.0-rc1 #1
>>> [ 91.139526] NIP: d0000000016a5718 LR: d0000000016a569c CTR: c0000000006f560c
>>> [ 91.139531] REGS: c0000001fa577670 TRAP: 0300 Not tainted (4.19.0-rc1)
>>> [ 91.139534] MSR: 900000000200b032 <SF,HV,VEC,EE,FP,ME,IR,DR,RI> CR: 84002484 XER: 20000000
>>> [ 91.139553] DAR: d0000001fff12f34 DSISR: 40000000 IRQMASK: 0
>>> GPR00: d0000000016a569c c0000001fa5778f0 d0000000016b0400 0000000000000000
>>> GPR04: 0000000000000002 0000000000000000 80000001fa46418e c0000001fa0d05c8
>>> GPR08: d0000000016b0400 d00037fffff13000 00000001ff3e7000 d0000000016a6fb8
>>> GPR12: c0000000006f560c c00000000ffff780 0000000000000000 0000000000000000
>>> GPR16: 0000000011635010 00003fffa1b7aa68 0000000000000000 0000000000000000
>>> GPR20: 0000000000000003 0000000010013918 00000000116350c0 c000000000b88990
>>> GPR24: c000000000b88ba4 0000000000000000 d0000001fff12f34 0000000000000000
>>> GPR28: d0000000016b8000 c0000001fa20f400 c0000001fa20f440 0000000000000000
>>> [ 91.139627] NIP [d0000000016a5718] .alloc_counters.isra.10+0xbc/0x140 [ip_tables]
>>> [ 91.139634] LR [d0000000016a569c] .alloc_counters.isra.10+0x40/0x140 [ip_tables]
>>> [ 91.139638] Call Trace:
>>> [ 91.139645] [c0000001fa5778f0] [d0000000016a569c] .alloc_counters.isra.10+0x40/0x140 [ip_tables] (unreliable)
>>> [ 91.139655] [c0000001fa5779b0] [d0000000016a5b54] .do_ipt_get_ctl+0x110/0x2ec [ip_tables]
>>> [ 91.139666] [c0000001fa577aa0] [c0000000006233e0] .nf_getsockopt+0x68/0x88
>>> [ 91.139674] [c0000001fa577b40] [c000000000631608] .ip_getsockopt+0xbc/0x128
>>> [ 91.139682] [c0000001fa577bf0] [c00000000065adf4] .raw_getsockopt+0x18/0x5c
>>> [ 91.139690] [c0000001fa577c60] [c0000000005b5f60] .sock_common_getsockopt+0x2c/0x40
>>> [ 91.139697] [c0000001fa577cd0] [c0000000005b3394] .__sys_getsockopt+0xa4/0xd0
>>> [ 91.139704] [c0000001fa577d80] [c0000000005b5ab0] .__se_sys_socketcall+0x238/0x2b4
>>> [ 91.139712] [c0000001fa577e30] [c00000000000a31c] system_call+0x5c/0x70
>>> [ 91.139716] Instruction dump:
>>> [ 91.139721] 39290040 7d3d4a14 7fbe4840 409cff98 81380000 2b890001 419d000c 393e0060
>>> [ 91.139736] 48000010 7d57c82a e93e0060 7d295214 <815a0000> 794807e1 41e20010 7c210b78
>>> [ 91.139752] ---[ end trace f5d1d5431651845d ]---
>>
>> This is due to 7290d58095 ("module: use relative references for
>> __ksymtab entries"). This part of kernel/module.c -
>>
>> /* Divert to percpu allocation if a percpu var. */
>> if (sym[i].st_shndx == info->index.pcpu)
>> secbase = (unsigned long)mod_percpu(mod);
>> else
>> secbase = info->sechdrs[sym[i].st_shndx].sh_addr;
>> sym[i].st_value += secbase;
>>
>> Causes the distance to the target to exceed 32-bits on powerpc, so
>> it doesn't fit in a rel32 reloc. Not sure how other archs cope.
>>
>
> Apologies for the breakage. It does indeed appear to affect all
> architectures, and I'm a bit puzzled why you are the first one to spot
> it.
>
> I will try to find a clean way to special case the per-CPU variable
> __ksymtab references in the generic module code, and if that is too
> cumbersome, we can switch to 64-bit relative references (or rather,
> native word size relative references) instead. Or revert the whole
> thing ...
OK, after a bit of digging, and confirming that the arm64
implementation works as expected (its module loader actually detects
overflows of the 32-bit place relative relocations, so the problem
definitely does not occur there), I think I found the explanation why
this occurs on powerpc and not on x86 or arm64.
Could you please check whether this change makes the issue go away?
(whitespace damage courtesy of Gmail)
diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
index 6a501b25dd85..57d09d5ceb1a 100644
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -779,7 +779,6 @@ EXPORT_SYMBOL(__per_cpu_offset);
void __init setup_per_cpu_areas(void)
{
- const size_t dyn_size = PERCPU_MODULE_RESERVE + PERCPU_DYNAMIC_RESERVE;
size_t atom_size;
unsigned long delta;
unsigned int cpu;
@@ -795,7 +794,9 @@ void __init setup_per_cpu_areas(void)
else
atom_size = 1 << 20;
- rc = pcpu_embed_first_chunk(0, dyn_size, atom_size, pcpu_cpu_distance,
+ rc = pcpu_embed_first_chunk(PERCPU_MODULE_RESERVE,
+ PERCPU_DYNAMIC_RESERVE,
+ atom_size, pcpu_cpu_distance,
pcpu_fc_alloc, pcpu_fc_free);
if (rc < 0)
panic("cannot initialize percpu area (err=%d)", rc);
The git log does not explain why power deviates from x86 and arm64 in
the way it initializes the percpu areas.
Powered by blists - more mailing lists