[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAFBinCA=0XSSVmzfTgb4eSiVFr=XRHqLOVFGyK0++XRty6VjnQ@mail.gmail.com>
Date: Mon, 25 Mar 2019 19:31:26 +0100
From: Martin Blumenstingl <martin.blumenstingl@...glemail.com>
To: Liang Yang <liang.yang@...ogic.com>
Cc: Matthew Wilcox <willy@...radead.org>, mhocko@...e.com,
linux@...linux.org.uk, linux-kernel@...r.kernel.org,
rppt@...ux.ibm.com, linux-mm@...ck.org,
linux-mtd@...ts.infradead.org, linux-amlogic@...ts.infradead.org,
akpm@...ux-foundation.org, linux-arm-kernel@...ts.infradead.org
Subject: Re: 32-bit Amlogic (ARM) SoC: kernel BUG in kfree()
Hi Liang,
On Mon, Mar 25, 2019 at 11:03 AM Liang Yang <liang.yang@...ogic.com> wrote:
>
> Hi Martin,
>
> On 2019/3/23 5:07, Martin Blumenstingl wrote:
> > Hi Matthew,
> >
> > On Thu, Mar 21, 2019 at 10:44 PM Matthew Wilcox <willy@...radead.org> wrote:
> >>
> >> On Thu, Mar 21, 2019 at 09:17:34PM +0100, Martin Blumenstingl wrote:
> >>> Hello,
> >>>
> >>> I am experiencing the following crash:
> >>> ------------[ cut here ]------------
> >>> kernel BUG at mm/slub.c:3950!
> >>
> >> if (unlikely(!PageSlab(page))) {
> >> BUG_ON(!PageCompound(page));
> >>
> >> You called kfree() on the address of a page which wasn't allocated by slab.
> >>
> >>> I have traced this crash to the kfree() in meson_nfc_read_buf().
> >>> my observation is as follows:
> >>> - meson_nfc_read_buf() is called 7 times without any crash, the
> >>> kzalloc() call returns 0xe9e6c600 (virtual address) / 0x29e6c600
> >>> (physical address)
> >>> - the eight time meson_nfc_read_buf() is called kzalloc() call returns
> >>> 0xee39a38b (virtual address) / 0x2e39a38b (physical address) and the
> >>> final kfree() crashes
> >>> - changing the size in the kzalloc() call from PER_INFO_BYTE (= 8) to
> >>> PAGE_SIZE works around that crash
> >>
> >> I suspect you're doing something which corrupts memory. Overrunning
> >> the end of your allocation or something similar. Have you tried KASAN
> >> or even the various slab debugging (eg redzones)?
> > KASAN is not available on 32-bit ARM. there was some progress last
> > year [0] but it didn't make it into mainline. I tried to make the
> > patches apply again and got it to compile (and my kernel is still
> > booting) but I have no idea if it's still working. for anyone
> > interested, my patches are here: [1] (I consider this a HACK because I
> > don't know anything about the code which is being touched in the
> > patches, I only made it compile)
> >
> > SLAB debugging (redzones) were a great hint, thank you very much for
> > that Matthew! I enabled:
> > CONFIG_SLUB_DEBUG=y
> > CONFIG_SLUB_DEBUG_ON=y
> > and with that I now get "BUG kmalloc-64 (Not tainted): Redzone
> > overwritten" (a larger kernel log extract is attached).
> >
> > I'm starting to wonder if the NAND controller (hardware) writes more
> > than 8 bytes.
> > some context: the "info" buffer allocated in meson_nfc_read_buf is
> > then passed to the NAND controller IP (after using dma_map_single).
> >
> > Liang, how does the NAND controller know that it only has to send
> > PER_INFO_BYTE (= 8) bytes when called from meson_nfc_read_buf? all
> > other callers of meson_nfc_dma_buffer_setup (which passes the info
> > buffer to the hardware) are using (nand->ecc.steps * PER_INFO_BYTE)
> > bytes?
> >
> NFC_CMD_N2M and CMDRWGEN are different commands. CMDRWGEN needs to set
> the ecc page size (1KB or 512B) and Pages(2, 4, 8, ...), so
> PER_INFO_BYTE(= 8) bytes for each ecc page.
> I have never used NFC_CMD_N2M to transfer data before, because it is
> very low efficient. And I do a experiment with the attachment and find
> on overwritten on my meson axg platform.
>
> Martin, I would appreciate it very much if you would try the attachment
> on your meson m8b platform.
thank you for your debug patch! on my board 2 * PER_INFO_BYTE is not enough.
I took the idea from your patch and adapted it so I could print a
buffer with 256 bytes (which seems to be "big enough" for my board).
see the attached, modified patch
in the output I see that sometimes the first 32 bytes are not touched
by the controller, but everything beyond 32 bytes is modified in the
info buffer.
I also tried to increase the buffer size to 512, but that didn't make
a difference (I never saw any info buffer modification beyond 256
bytes).
also I just noticed that I didn't give you much details on my NAND chip yet.
from Amlogic vendor u-boot on Meson8m2 (all my Meson8b boards have
eMMC flash, but I believe the NAND controller on Meson8 to GXBB is
identical):
m8m2_n200_v1#amlnf chipinfo
flash info
name:B revision 20nm NAND 8GiB H27UCG8T2B, id:ad de 94 eb 74 44 0 0
pagesize:0x4000, blocksize:0x400000, oobsize:0x500, chipsize:0x2000,
option:0x8, T_REA:16, T_RHOH:15
hw controller info
chip_num:1, onfi_mode:0, page_shift:14, block_shift:22, option:0xc2
ecc_unit:1024, ecc_bytes:70, ecc_steps:16, ecc_max:40
bch_mode:5, user_mode:2, oobavail:32, oobtail:64384
Regards
Martin
View attachment "debug-256-buffer-output.txt" of type "text/plain" (8077 bytes)
Download attachment "nand_debug_martin.patch" of type "application/x-patch" (986 bytes)
Powered by blists - more mailing lists