lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 12 Feb 2021 12:57:24 -0800
From:   Saravana Kannan <saravanak@...gle.com>
To:     Geert Uytterhoeven <geert@...ux-m68k.org>
Cc:     Jonathan Corbet <corbet@....net>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        "Rafael J. Wysocki" <rafael@...nel.org>,
        Kevin Hilman <khilman@...nel.org>,
        Ulf Hansson <ulf.hansson@...aro.org>,
        Len Brown <len.brown@...el.com>, Len Brown <lenb@...nel.org>,
        Pavel Machek <pavel@....cz>,
        Michael Turquette <mturquette@...libre.com>,
        Stephen Boyd <sboyd@...nel.org>,
        Rob Herring <robh+dt@...nel.org>,
        Frank Rowand <frowand.list@...il.com>,
        Marc Zyngier <maz@...nel.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        "open list:DOCUMENTATION" <linux-doc@...r.kernel.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Linux PM list <linux-pm@...r.kernel.org>,
        linux-clk <linux-clk@...r.kernel.org>,
        "open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS" 
        <devicetree@...r.kernel.org>,
        ACPI Devel Maling List <linux-acpi@...r.kernel.org>,
        Marek Szyprowski <m.szyprowski@...sung.com>,
        Android Kernel Team <kernel-team@...roid.com>,
        Linux-Renesas <linux-renesas-soc@...r.kernel.org>
Subject: Re: [PATCH v4 0/8] Make fw_devlink=on more forgiving

On Fri, Feb 12, 2021 at 12:15 AM Geert Uytterhoeven
<geert@...ux-m68k.org> wrote:
>
> Hi Saravana,
>
> On Fri, Feb 12, 2021 at 4:00 AM Saravana Kannan <saravanak@...gle.com> wrote:
> > On Thu, Feb 11, 2021 at 5:00 AM Geert Uytterhoeven <geert@...ux-m68k.org> wrote:
> > >   1. R-Car Gen2 (Koelsch), R-Car Gen3 (Salvator-X(S), Ebisu).
> > >
> > >       - Commit 2dfc564bda4a31bc ("soc: renesas: rcar-sysc: Mark device
> > >         node OF_POPULATED after init") is no longer needed (but already
> > >         queued for v5.12 anyway)
> >
> > Rob doesn't like the proliferation of OF_POPULATED and we don't need
> > it anymore, so maybe work it out with him? It's a balance between some
> > wasted memory (struct device(s)) vs not proliferating OF_POPULATED.
>
> Rob: should it be reverted?  For v5.13?
> I guess other similar "fixes" went in in the mean time.
>
> > >       - Some devices are reprobed, despite their drivers returning
> > >         a real error code, and not -EPROBE_DEFER:
> >
> > Sorry, it's not obvious from the logs below where "reprobing" is
> > happening. Can you give more pointers please?
>
> My log was indeed not a full log, but just the reprobes happening.
> I'll send you a full log by private email.
>
> > Also, thinking more about this, the only way I could see this happen is:
> > 1. Device fails with error that's not -EPROBE_DEFER
> > 2. It somehow gets added to a device link (with AUTOPROBE_CONSUMER
> > flag) where it's a consumer.
> > 3. The supplier probes and the device gets added to the deferred probe
> > list again.
> >
> > But I can't see how this sequence can happen. Device links are created
> > only when a device is added. And is the supplier isn't added yet, the
> > consumer wouldn't have probed in the first place.
>
> The full log doesn't show any evidence of the device being added
> to a list in between the two probes.
>
> > Other than "annoying waste of time" is this causing any other problems?
>
> Probably not.  But see below.
>
> > >       - The PCI reprobing leads to a memory leak, for which I've sent a fix
> > >         "[PATCH] PCI: Fix memory leak in pci_register_io_range()"
> > >         https://lore.kernel.org/linux-pci/20210202100332.829047-1-geert+renesas@glider.be/
> >
> > Wrt PCI reprobing,
> > 1. Is this PCI never expected to probe, but it's being reattempted
> > despite the NOT EPROBE_DEFER error? Or
>
> There is no PCIe card present, so the failure is expected.
> Later it is reprobed, which of course fails again.
>
> > 2. The PCI was deferred probe when it should have probed and then when
> > it's finally reattemped and it could succeed, we are hitting this mem
> > leak issue?
>
> I think the leak has always been there, but it was just exposed by
> this unneeded reprobe.  I don't think a reprobe after that specific
> error path had ever happened before.
>
> > I'm basically trying to distinguish between "this stuff should never
> > be retried" vs "this/it's suppliers got probe deferred with
> > fw_devlink=on vs but didn't get probe deferred with
> > fw_devlink=permissive and that's causing issues"
>
> There should not be a probe deferral, as no -EPROBE_DEFER was
> returned.
>
> > >       - I2C on R-Car Gen3 does not seem to use DMA, according to
> > >         /sys/kernel/debug/dmaengine/summary:
> > >
> > >             -dma4chan0    | e66d8000.i2c:tx
> > >             -dma4chan1    | e66d8000.i2c:rx
> > >             -dma5chan0    | e6510000.i2c:tx
> >
> > I think I need more context on the problem before I can try to fix it.
> > I'm also very unfamiliar with that file. With fw_devlink=permissive,
> > I2C was using DMA? If so, the next step is to see if the I2C relative
> > probe order with DMA is getting changed and if so, why.
>
> Yes, I plan to dig deeper to see what really happens...

Try fw_devlink.strict (you'll need IOMMU enabled too). If that fixes
it and you also don't see this issue with fw_devlink=permissive, then
it means there's probably some unnecessary probe deferral that we
should try to avoid. At least, that's my hunch right now.

Thanks,
Saravana

>
> > >       - On R-Mobile A1, I get a BUG and a memory leak:
> > >
> > >             BUG: spinlock bad magic on CPU#0, swapper/1
>
> >
> > Hmm... I looked at this in bits and pieces throughout the day. At
> > least spent an hour looking at this. This doesn't make a lot of sense
> > to me. I don't even touch anything in this code path AFAICT.  Are
> > modules/kernel mixed up somehow? I need more info before I can help.
> > Does reverting my pm domain change make any difference (assume it
> > boots this far without it).
>
> I plan to dig deeper to see what really happens...
>
> Gr{oetje,eeting}s,
>
>                         Geert
>
> --
> Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@...ux-m68k.org
>
> In personal conversations with technical people, I call myself a hacker. But
> when I'm talking to journalists I just say "programmer" or something like that.
>                                 -- Linus Torvalds

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ