[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1c6e9939-63b0-d663-1d48-8655e04f0716@alu.unizg.hr>
Date: Tue, 18 Oct 2022 19:41:25 +0200
From: Mirsad Goran Todorovac <mirsad.todorovac@....unizg.hr>
To: Phillip Lougher <phillip@...ashfs.org.uk>,
Jintao Yin <nicememory@...il.com>
Cc: bagasdotme@...il.com, linux-kernel@...r.kernel.org,
marcmiltenberger@...il.com, regressions@...mhuis.info,
regressions@...ts.linux.dev, srw@...dewatkins.net,
Hsin-Yi Wang <hsinyi@...omium.org>
Subject: Re: BISECT result: 6.0.0-RC kernels trigger Firefox snap bug with
6.0.0-rc3 through 6.0.0-rc7
On 18. 10. 2022. 19:15, Phillip Lougher wrote:
> On 18/10/2022 03:15, Jintao Yin wrote:
>> On Sat, Oct 15, 2022 at 09:59:36PM +0100, Phillip Lougher wrote:
>>> Thorsten Leemhuis <regressions@...mhuis.info> wrote:
>>>>
>>>> Topposting, to make this easier to access for everyone.
>>>>
>>>> @Mirsad, thx for bisecting.
>>>>
>>>> @Phillip: if you want to see a problem description and the whole
>>>> backstory of the problem that apparently is caused by b09a7a036d20
>>>> ("squashfs: support reading fragments in readahead call"), see this
>>>> thread (Mirsad sadly started a new one with the quoted mail below):
>>>> https://lore.kernel.org/all/b0c258c3-6dcf-aade-efc4-d62a8b3a1ce2@alu.unizg.hr/
>>>>
>>>>
>>>
>>> The above backstory tends to suggest data corruption which is happening
>>> after a couple of hours especially on heavy loads, e.g. the comment
>>>
>>>> On 10/3/22 at 4:18 AM, Mirsad Goran Todorovac wrote:
>>>> The bug usually isn't showing immediately, but after a couple of hours
>>>> of running (especially with multimedia running inside Firefox).
>>>
>>> Which is typically caused by double freed buffers or race conditions in
>>> freeing and reusing.
>>>
>>> Thanks Mirsad for the following
>>>
>>> On Sat, 15 Oct 2022 16:59:44 +0200, Mirsad Goran Todorovac wrote:
>>>>
>>>> Here are the results of the requested bisect on the bug involving the
>>>> Firefox snap build 104.x, 105.0.x, squashfs and which was
>>>> manifested on
>>>> both Ubuntu snap and with snapd in AlmaLinux 8.6 (CentOS fork):
>>>>
>>>> mtodorov@...ac:~/linux/kernel/linux_stable$ git bisect log
>>>> git bisect start
>>>> # bad: [568035b01cfb107af8d2e4bd2fb9aea22cf5b868] Linux 6.0-rc1
>>>> git bisect bad 568035b01cfb107af8d2e4bd2fb9aea22cf5b868
>>>> # good: [51dd976781da8c0b47e106ed59a96d7c28972ce4] Linux 5.19.15
>>>> git bisect good 51dd976781da8c0b47e106ed59a96d7c28972ce4
>>>> # good: [3d7cb6b04c3f3115719235cc6866b10326de34cd] Linux 5.19
>>>> git bisect good 3d7cb6b04c3f3115719235cc6866b10326de34cd
>>>> # good: [b44f2fd87919b5ae6e1756d4c7ba2cbba22238e1] Merge tag
>>>> 'drm-next-2022-08-03' of git://anongit.freedesktop.org/drm/drm
>>>> git bisect good b44f2fd87919b5ae6e1756d4c7ba2cbba22238e1
>>>> # good: [6614a3c3164a5df2b54abb0b3559f51041cf705b] Merge tag
>>>> 'mm-stable-2022-08-03' of
>>>> git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
>>>> git bisect good 6614a3c3164a5df2b54abb0b3559f51041cf705b
>>>> # bad: [eb5699ba31558bdb2cee6ebde3d0a68091e47dce] Merge tag
>>>> 'mm-nonmm-stable-2022-08-06-2' of
>>>> git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
>>>> git bisect bad eb5699ba31558bdb2cee6ebde3d0a68091e47dce
>>>> # good: [24df5428ef9d1ca1edd54eca7eb667110f2dfae3] ALSA: hda/realtek:
>>>> Add quirk for HP Spectre x360 15-eb0xxx
>>>> git bisect good 24df5428ef9d1ca1edd54eca7eb667110f2dfae3
>>>> # good: [c993e07be023acdeec8e84e2e0743c52adb5fc94] Merge tag
>>>> 'dma-mapping-5.20-2022-08-06' of
>>>> git://git.infradead.org/users/hch/dma-mapping
>>>> git bisect good c993e07be023acdeec8e84e2e0743c52adb5fc94
>>>> # good: [4cfa6ff24a9744ba484521c38bea613134fbfcb3] powerpc/64e: Fix
>>>> kexec build error
>>>> git bisect good 4cfa6ff24a9744ba484521c38bea613134fbfcb3
>>>> # good: [24cb958695724ffb4488ef4f65892c0767bcd2f2] Merge tag
>>>> 's390-5.20-1' of
>>>> git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
>>>> git bisect good 24cb958695724ffb4488ef4f65892c0767bcd2f2
>>>> # good: [db98b43086275350294f5c6f797249b714d6316d] squashfs: always
>>>> build "file direct" version of page actor
>>>> git bisect good db98b43086275350294f5c6f797249b714d6316d
>>>> # good: [6ba592fa014f21f35a8ee8da4ca7b95a018f13e8] video: fbdev: s3fb:
>>>> Check the size of screen before memset_io()
>>>> git bisect good 6ba592fa014f21f35a8ee8da4ca7b95a018f13e8
>>>> # good: [b5a8466d37d30cfcc8015789f4a3f0c44b6c7bc6] Merge tag
>>>> 'for-5.20/fbdev-1' of
>>>> git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev
>>>> git bisect good b5a8466d37d30cfcc8015789f4a3f0c44b6c7bc6
>>>> # bad: [97d3b2676fc6bc4865eb825037f4492f0fb804eb] ocfs2: remove some
>>>> useless functions
>>>> git bisect bad 97d3b2676fc6bc4865eb825037f4492f0fb804eb
>>>> # bad: [591c32bddbe20ba0e172d9def3c7f22b9c926ad9] kernel/hung_task:
>>>> fix
>>>> address space of proc_dohung_task_timeout_secs
>>>> git bisect bad 591c32bddbe20ba0e172d9def3c7f22b9c926ad9
>>>> # bad: [b09a7a036d2035b14636cd4c4c69518d73770f65] squashfs: support
>>>> reading fragments in readahead call
>>>> git bisect bad b09a7a036d2035b14636cd4c4c69518d73770f65
>>>> mtodorov@...ac:~/linux/kernel/linux_stable$
>>>>
>>>> The git bisect stopped at the squashfs commit
>>>> b09a7a036d2035b14636cd4c4c69518d73770f65, so I included Phillip in
>>>> Cc:,
>>>> according to the Code of Conduct.
>>>
>>> Which identified the "squashfs: support reading fragments in
>>> readahead call"
>>> patch.
>>>
>>> There is a race-condition introduced in that patch, which involves
>>> cache
>>> releasing and reuse.
>>>
>>> The following diff will fix that race-condition. It would be great if
>>> someone could test and verify before sending it out as a patch.
>>>
>>> Thanks
>>>
>>> Phillip
>> Hi Phillip,
>> There is a logical bug in commit
>> 8fc78b6fe24c36b151ac98d7546591ed92083d4f
>> which is parent commit of commit
>> b09a7a036d2035b14636cd4c4c69518d73770f65.
>> In function squashfs_readahead(...),
>> file_end is initialized with i_size_read(inode) >> msblk->block_log,
>> which means the last block index of the file.
>> But later in the logic to check if the page is last one or not the
>> code is
>> if (pages[nr_pages - 1]->index == file_end && bytes) {
>> ...
>> }
>> , use file_end as the last page index of file but actually is the
>> last
>> block index, so for the common setup of page and block size, the
>> first
>> comparison is true only when pages[nr_pages - 1]->index is 0.
>> Otherwise, the trailing bytes of last page won't be zeroed.
>>
>> Maybe it's the real cause of the snap bug in some way.
>>
>
> That code segment is indeed the cause of the bug. But the logic to
> check if it is the last page is completely wrong and more broken than
> described.
>
> I will send a diff. This has aleady been shown to fix the issue with my
> reproducer.
>
> Thanks
Then it is no surprise that v6.0.2 build with Phillip's and Hsin-Yi's
patches also
failed, giving the same "Gah, tab crashed" and Verneed errors as before
given
the same Firefox windows and tabs test:
Oct 18 19:34:00 marvin-IdeaPad-3-15ITL6 firefox_firefox.desktop[7400]:
/snap/firefox/1943/gnome-platform/usr/lib/x86_64-linux-gnu/libXcomposite.so.1:
unsupported version 0 of Verneed record
Oct 18 19:34:00 marvin-IdeaPad-3-15ITL6 firefox_firefox.desktop[7400]:
Couldn't load XPCOM.
Oct 18 19:34:03 marvin-IdeaPad-3-15ITL6 firefox_firefox.desktop[7418]:
XPCOMGlueLoad error for file
/snap/firefox/1943/usr/lib/firefox/libmozgtk.so:
Oct 18 19:34:03 marvin-IdeaPad-3-15ITL6 firefox_firefox.desktop[7418]:
/snap/firefox/1943/gnome-platform/usr/lib/x86_64-linux-gnu/libXcomposite.so.1:
unsupported version 0 of Verneed record
Oct 18 19:34:03 marvin-IdeaPad-3-15ITL6 firefox_firefox.desktop[7418]:
Couldn't load XPCOM.
Oct 18 19:34:04 marvin-IdeaPad-3-15ITL6 firefox_firefox.desktop[5416]:
Missing chrome or resource URL:
resource://gre/modules/UpdateListener.sys.mjs
Oct 18 19:34:05 marvin-IdeaPad-3-15ITL6 systemd[1629]:
snap.firefox.firefox.945b7e54-eb2c-40d3-836d-96ac26e19293.scope:
Consumed 3min 19.877s CPU time.
Oct 18 19:34:19 marvin-IdeaPad-3-15ITL6 systemd[1629]: Started
Application launched by gnome-shell.
Oct 18 19:34:19 marvin-IdeaPad-3-15ITL6 systemd[1629]: Started
snap.firefox.firefox.b586e811-9aaa-4b30-afa8-54ae3b119798.scope.
Oct 18 19:34:20 marvin-IdeaPad-3-15ITL6 firefox_firefox.desktop[7483]:
cut: cut: no version information available (required by cut)
Oct 18 19:34:20 marvin-IdeaPad-3-15ITL6 firefox_firefox.desktop[7483]:
message repeated 5 times: [ cut: cut: no version information available
(required by cut)]
Oct 18 19:34:20 marvin-IdeaPad-3-15ITL6 firefox_firefox.desktop[7483]:
cut: symbol lookup error: cut: undefined symbol:
Oct 18 19:34:21 marvin-IdeaPad-3-15ITL6 firefox_firefox.desktop[7441]:
XPCOMGlueLoad error for file
/snap/firefox/1943/usr/lib/firefox/libmozgtk.so:
Oct 18 19:34:21 marvin-IdeaPad-3-15ITL6 firefox_firefox.desktop[7441]:
/snap/firefox/1943/gnome-platform/usr/lib/x86_64-linux-gnu/libXcomposite.so.1:
unsupported version 0 of Verneed record
Oct 18 19:34:21 marvin-IdeaPad-3-15ITL6 firefox_firefox.desktop[7441]:
Couldn't load XPCOM.
Oct 18 19:34:21 marvin-IdeaPad-3-15ITL6 systemd[1629]:
snap.firefox.firefox.b586e811-9aaa-4b30-afa8-54ae3b119798.scope:
Consumed 1.382s CPU time.
Oct 18 19:34:39 marvin-IdeaPad-3-15ITL6 systemd[1]: Starting Download
data for packages that failed at package install time...
Oct 18 19:34:39 marvin-IdeaPad-3-15ITL6 anacron[956]: Job `cron.daily'
started
Oct 18 19:34:39 marvin-IdeaPad-3-15ITL6 systemd[1]:
update-notifier-download.service: Deactivated successfully.
Oct 18 19:34:39 marvin-IdeaPad-3-15ITL6 systemd[1]: Finished Download
data for packages that failed at package install time.
Oct 18 19:34:39 marvin-IdeaPad-3-15ITL6 anacron[7529]: Updated timestamp
for job `cron.daily' to 2022-10-18
Oct 18 19:34:39 marvin-IdeaPad-3-15ITL6 cracklib: no dictionary update
necessary.
Oct 18 19:34:39 marvin-IdeaPad-3-15ITL6 anacron[956]: Job `cron.daily'
terminated
Oct 18 19:34:42 marvin-IdeaPad-3-15ITL6 dbus-daemon[782]: [system]
Activating via systemd: service name='org.freedesktop.timedate1'
unit='dbus-org.freedesktop.timedate1.service' requested by ':1.45'
(uid=0 pid=810 comm="/usr/lib/snapd/snapd ")
Oct 18 19:34:42 marvin-IdeaPad-3-15ITL6 systemd[1]: Starting Time & Date
Service...
Oct 18 19:34:42 marvin-IdeaPad-3-15ITL6 dbus-daemon[782]: [system]
Successfully activated service 'org.freedesktop.timedate1'
Oct 18 19:34:42 marvin-IdeaPad-3-15ITL6 systemd[1]: Started Time & Date
Service.
Oct 18 19:34:43 marvin-IdeaPad-3-15ITL6 snapd[810]: storehelpers.go:748:
cannot refresh: snap has no updates available: "bare",
"canonical-livepatch", "core", "core18", "core20", "gnome-3-34-1804",
"gnome-3-38-2004", "gtk-common-themes", "slack", "snap-store", "snapd",
"zoom-client"
I wish there is something I could do to help other than bug reports, but the
squashfs source is way too complex for me ...
Good luck then guys.
Thanks
--
Mirsad Goran Todorovac
Sistem inženjer
Grafički fakultet | Akademija likovnih umjetnosti
Sveučilište u Zagrebu
--
System engineer
Faculty of Graphic Arts | Academy of Fine Arts
University of Zagreb, Republic of Croatia
The European Union
Powered by blists - more mailing lists