lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 31 Dec 2019 13:11:27 -0800
From:   Dan Williams <dan.j.williams@...el.com>
To:     Dave Young <dyoung@...hat.com>
Cc:     Ingo Molnar <mingo@...hat.com>,
        Taku Izumi <izumi.taku@...fujitsu.com>,
        Michael Weiser <michael@...ser.dinsnail.net>,
        Ard Biesheuvel <ard.biesheuvel@...aro.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...nel.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        linux-efi <linux-efi@...r.kernel.org>, kexec@...ts.infradead.org
Subject: Re: [PATCH] efi: Fix handling of multiple contiguous efi_fake_mem= entries

On Mon, Dec 30, 2019 at 5:46 PM Dave Young <dyoung@...hat.com> wrote:
>
> Hi Dan,
> On 12/30/19 at 11:58am, Dan Williams wrote:
> > A recent test of efi_fake_mem=4G@9G:0x40000,4G@13G:0x40000 crashed with
> > a signature of:
> >
> >     BUG: unable to handle page fault for address: ffffffffff281000
> >     [..]
> >     RIP: 0010:efi_memmap_insert+0x11d/0x191
> >     [..]
> >     Call Trace:
> >      ? bgrt_init+0xbe/0xbe
> >      ? efi_arch_mem_reserve+0x1cb/0x228
> >      ? acpi_parse_bgrt+0xa/0xd
> >      ? acpi_table_parse+0x86/0xb8
> >      ? acpi_boot_init+0x494/0x4e3
> >      ? acpi_parse_x2apic+0x87/0x87
> >      ? setup_acpi_sci+0xa2/0xa2
> >      ? setup_arch+0x8db/0x9e1
> >      ? start_kernel+0x6a/0x547
> >      ? secondary_startup_64+0xb6/0xc0
> >
> > efi_memmap_insert() is attempting to insert entries past the end of the
> > new map. This condition is setup by efi_fake_mem() leaking empty entries
> > to the end of memory map, and then efi_arch_mem_reserve() trips over the
> > bad entry when attempting an additional efi_memmap_insert(). The empty
> > entry causes efi_memmap_insert() to attempt more memmap splits / copies
> > than efi_memmap_split_count() accounted for when sizing the new map.
> >
> > Update efi_fake_memmap() to cleanup lagging empty entries.
> >
> > This change is related to commit af1648984828 "x86/efi: Update e820 with
> > reserved EFI boot services data to fix kexec breakage" since that
> > introduces more occurrences where efi_memmap_insert() is invoked after
> > an efi_fake_mem= configuration has been parsed. Previously the side
> > effects of vestigial empty entries were benign, but with commit
> > af1648984828 that follow-on efi_memmap_insert() invocation triggers the
> > above crash signature.
> >
> > Fixes: 0f96a99dab36 ("efi: Add 'efi_fake_mem' boot option")
> > Fixes: af1648984828 ("x86/efi: Update e820 with reserved EFI boot services...")
> > Cc: Taku Izumi <izumi.taku@...fujitsu.com>
> > Cc: Michael Weiser <michael@...ser.dinsnail.net>
> > Cc: Dave Young <dyoung@...hat.com>
> > Cc: Ard Biesheuvel <ard.biesheuvel@...aro.org>
> > Cc: Thomas Gleixner <tglx@...utronix.de>
> > Cc: Ingo Molnar <mingo@...nel.org>
> > Signed-off-by: Dan Williams <dan.j.williams@...el.com>
> > ---
> >  drivers/firmware/efi/fake_mem.c |   22 +++++++++++++++++++++-
> >  1 file changed, 21 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/firmware/efi/fake_mem.c b/drivers/firmware/efi/fake_mem.c
> > index bb9fc70d0cfa..6df51ba93ae8 100644
> > --- a/drivers/firmware/efi/fake_mem.c
> > +++ b/drivers/firmware/efi/fake_mem.c
> > @@ -67,13 +67,33 @@ void __init efi_fake_memmap(void)
> >               return;
> >       }
> >
> > +     memset(new_memmap, 0, efi.memmap.desc_size * new_nr_map);
> >       for (i = 0; i < nr_fake_mem; i++)
> >               efi_memmap_insert(&efi.memmap, new_memmap, &efi_fake_mems[i]);
> >
> > +     /*
> > +      * efi_memmap_split_count() may over count the number of
> > +      * required splits in the case when contiguous fake entries are
> > +      * specified. Check that all new_nr_map entries were consumed.
> > +      */
> > +     for (i = new_nr_map; i > 0; i--) {
> > +             efi_memory_desc_t *md;
> > +             u64 start, end;
> > +
> > +             md = new_memmap + efi.memmap.desc_size * (new_nr_map - i - 1);
> > +             end = md->phys_addr + (md->num_pages << EFI_PAGE_SHIFT) - 1;
> > +             start = md->phys_addr;
> > +
> > +             if (start == 0 && end + 1 == 0)
> > +                     continue;
> > +             break;
> > +     }
> > +
> >       /* swap into new EFI memmap */
> >       early_memunmap(new_memmap, efi.memmap.desc_size * new_nr_map);
> >
> > -     efi_memmap_install(new_memmap_phy, new_nr_map);
> > +     /* install only the valid entries */
> > +     efi_memmap_install(new_memmap_phy, i);
> >
> >       /* print new EFI memmap */
> >       efi_print_memmap();
> >
>
> Although kernel bootup works with this patch, it still does not fix the
> issue I noticed, you can see:
> [root@...alhost ~]# cat /proc/cmdline
> BOOT_IMAGE=/bzImage root=/dev/vda3 ro audit=0 selinux=0 crashkernel=160M efi=debug console=ttyS0 console=tty0 3 efi_fake_mem=200M@5G:0x40000,300M@...0M:0x40000 earlyprintk=serial
> [root@...alhost ~]# dmesg|grep fake_mem
> [    0.000000] Command line: BOOT_IMAGE=/bzImage root=/dev/vda3 ro audit=0 selinux=0 crashkernel=160M efi=debug console=ttyS0 console=tty0 3 efi_fake_mem=200M@5G:0x40000,300M@...0M:0x40000 earlyprintk=serial
> [    0.000000] efi_fake_mem: add attr=0x0000000000040000 to [mem 0x0000000140000000-0x000000014c7fffff]
> [    0.000000] efi_fake_mem: add attr=0x0000000000040000 to [mem 0x000000015e000000-0x0000000170bfffff]
> [root@...alhost ~]# dmesg|grep SP
> [    0.085762] efi: mem48: [Conventional Memory|   |  |SP|  |  |  |  |  |   |WB|WT|WC|UC] range=[0x000000015e000000-0x0000000170bfffff] (300MB)
>
>
> With this patch applied, there is still only one md set "SP" attr.  That
> means only the last insert worked.
>
> void __init efi_memmap_insert(struct efi_memory_map *old_memmap, void *buf,
>                               struct efi_mem_range *mem)
>
> The above function will use the old_memmap as the base for each
> inserting.  the old_memmap == &efi.memmap, so when you do below:
>         for (i = 0; i < nr_fake_mem; i++)
>                 efi_memmap_insert(&efi.memmap, new_memmap, &efi_fake_mems[i]);
>
> Only the last inserting will take effect.  Below debug patch worked for
> me, but I thought you have found same bug so I did not add it in the
> reply, here it is, only for debugging purpose, not graceful:

Good find! I missed this because my test case was checking /proc/iomem
after booting and efi_fake_memmap_early() updates the e820 table.

>
> diff --git a/drivers/firmware/efi/fake_mem.c b/drivers/firmware/efi/fake_mem.c
> index bb9fc70d0cfa..097eaf7deb6a 100644
> --- a/drivers/firmware/efi/fake_mem.c
> +++ b/drivers/firmware/efi/fake_mem.c
> @@ -36,44 +36,48 @@ static int __init cmp_fake_mem(const void *x1, const void *x2)
>
>  void __init efi_fake_memmap(void)
>  {
> -       int new_nr_map = efi.memmap.nr_map;
> -       efi_memory_desc_t *md;
> -       phys_addr_t new_memmap_phy;
> -       void *new_memmap;
>         int i;
>
> +       pr_info("nr fake mem %d\n", nr_fake_mem);
>         if (!efi_enabled(EFI_MEMMAP) || !nr_fake_mem)
>                 return;
>
>         /* count up the number of EFI memory descriptor */
>         for (i = 0; i < nr_fake_mem; i++) {
> -               for_each_efi_memory_desc(md) {
> -                       struct range *r = &efi_fake_mems[i].range;
> -
> -                       new_nr_map += efi_memmap_split_count(md, r);
> +               int new_nr_map = efi.memmap.nr_map;
> +               efi_memory_desc_t md0;
> +               efi_memory_desc_t *md = &md0;
> +               phys_addr_t new_memmap_phy;
> +               void *new_memmap;
> +
> +               if (efi_mem_desc_lookup(efi_fake_mems[i].range.start, md)) {
> +                       pr_err("Failed to lookup EFI memory descriptor for %pa\n", &efi_fake_mems[i].range.start);
> +                       return;
> +               }
> +
> +               new_nr_map += efi_memmap_split_count(md, &efi_fake_mems[i].range);
> +
> +               pr_info("new nr %d\n", new_nr_map);
> +               /* allocate memory for new EFI memmap */
> +               new_memmap_phy = efi_memmap_alloc(new_nr_map);
> +               if (!new_memmap_phy){
> +                       pr_info("alloc new map failed\n");
> +                       return;}
> +
> +               /* create new EFI memmap */
> +               new_memmap = early_memremap(new_memmap_phy,
> +                                   efi.memmap.desc_size * new_nr_map);
> +               if (!new_memmap) {
> +                       pr_info("map new map failed\n");
> +                       memblock_free(new_memmap_phy, efi.memmap.desc_size * new_nr_map);
> +                       return;
>                 }
> -       }
> -
> -       /* allocate memory for new EFI memmap */
> -       new_memmap_phy = efi_memmap_alloc(new_nr_map);
> -       if (!new_memmap_phy)
> -               return;
> -
> -       /* create new EFI memmap */
> -       new_memmap = early_memremap(new_memmap_phy,
> -                                   efi.memmap.desc_size * new_nr_map);
> -       if (!new_memmap) {
> -               memblock_free(new_memmap_phy, efi.memmap.desc_size * new_nr_map);
> -               return;
> -       }
> -
> -       for (i = 0; i < nr_fake_mem; i++)
>                 efi_memmap_insert(&efi.memmap, new_memmap, &efi_fake_mems[i]);
> -
> -       /* swap into new EFI memmap */
> -       early_memunmap(new_memmap, efi.memmap.desc_size * new_nr_map);
> -
> -       efi_memmap_install(new_memmap_phy, new_nr_map);
> +               /* swap into new EFI memmap */
> +               early_memunmap(new_memmap, efi.memmap.desc_size * new_nr_map);
> +               efi_memmap_install(new_memmap_phy, new_nr_map);
> +               pr_info("inserted new map\n");
> +       }

Perhaps a prettier way to do this is to push the handling of each
efi_fake_mem entry into a subroutine. However, I notice when a memmap
allocated by efi_memmap_alloc() is replaced by another dynamically
allocated memmap the previous one isn't released. I have a series that
fixes that up as well.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ