@hauke @dangowrt
under some circumstances, the rootfs_data partition is ignored and a loop on rootfs is used instead for the overlay.
yet fixing this bug is dangerous:
- by definition, affected devices have
rootfs_data partitions which are not being used as overlay. so in order to work correctly, their current sysupgrade code must also ignore these partitions and instead store the preserved settings in the trailing part of rootfs. it follows that sysupgrade will break on all affected as soon as this bug is fixed.
- more concerning is that users might be using these large partitions for storage of adhoc data (they actually are!), for which they might not have a current backup, and a sysupgrade following a fix will silently wipe their data.
- even if they are not using these partitions, their contents might be needed to successfully return to stock.
the number of affected devices is unknown to me, but i can at least point to one...
example device
QNAP QHora-301w has both a NOR flash and a GPT-formatted eMMC. kernel, rootfs, and overlay partitions exist in the emmc. the first two are used, which i suppose is directed by the uboot env. but rootfs_data, which should be handled by fstools, is ignored.
MTD partitions:
spi-nor spi0.0: w25q64dw (8192 Kbytes)
dev: size erasesize name
mtd0: 00050000 00010000 "0:sbl1"
mtd1: 00010000 00010000 "0:mibib"
mtd2: 00180000 00010000 "0:qsee"
mtd3: 00010000 00010000 "0:devcfg"
mtd4: 00010000 00010000 "0:apdp"
mtd5: 00040000 00010000 "0:rpm"
mtd6: 00010000 00010000 "0:cdt"
mtd7: 00020000 00010000 "0:appsblenv"
mtd8: 00100000 00010000 "0:appsbl"
mtd9: 00040000 00010000 "0:art"
mtd10: 00080000 00010000 "0:ethphyfw1"
mtd11: 00080000 00010000 "0:ethphyfw2"
mtd12: 00350000 00010000 "reserved"
GPT partitions (showing rootfs_data):
Disk /dev/mmcblk0: 7634944 sectors, 3.6 GiB
Number Start (sector) End (sector) Size Code Name
1 34 32801 16.0 MiB FFFF 0:HLOS
2 32802 65569 16.0 MiB FFFF 0:HLOS_1
3 65570 98337 16.0 MiB FFFF 0:HLOS_2
4 98338 1146913 512.0 MiB FFFF rootfs
5 1146914 2195489 512.0 MiB FFFF rootfs_1
6 2195490 3244065 512.0 MiB FFFF rootfs_2
7 3244066 3252257 4.0 MiB FFFF 0:WIFIFW
8 3252258 3285025 16.0 MiB 8301 reserved
9 3285026 7591969 2.1 GiB FFFF rootfs_data <<======
mounted devices (showing the loop device):
/dev/root on /rom type squashfs (ro,relatime,errors=continue)
proc on /proc type proc (rw,nosuid,nodev,noexec,noatime)
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,noatime)
cgroup2 on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime,nsdelegate)
tmpfs on /tmp type tmpfs (rw,nosuid,nodev,noatime)
/dev/loop0 on /overlay type ext4 (rw,noatime) <<======
overlayfs:/overlay on / type overlay (rw,noatime,lowerdir=/,upperdir=/overlay/upper,workdir=/overlay/work,xino=off)
tmpfs on /dev type tmpfs (rw,nosuid,noexec,noatime,size=512k,mode=755)
devpts on /dev/pts type devpts (rw,nosuid,noexec,noatime,mode=600,ptmxmode=000)
debugfs on /sys/kernel/debug type debugfs (rw,noatime)
bpffs on /sys/fs/bpf type bpf (rw,nosuid,nodev,noexec,noatime,mode=700)
pstore on /sys/fs/pstore type pstore (rw,noatime)
the sysupgrade code for this device ignores rootfs_data and will thus break if this bug is fixed.
finally, here you can see a user using this rootfs_data partition for their own purposes.
root cause
i have no idea, in reporting this issue i am more concerned with the fallout that a fix to this issue could cause and how to avert it. that said, @pwned-pixel advanced an hypothesis for the cause of this bug here.
the overlay size problem
not long ago it used to be the case that the unwritten openwrt policy was to prefer implementing an overlay partition that was as large as possible. this clearly holds for devices with 256 MB of flash or less, and arguably for devices with 512 MB too.
however, i think this policy no longer holds for the newer eMMC devices we are currently seeing, with 4 or 8 GB of flash starting to be common.
clearly an 4GB overlay partition is completely overblown, and makes no sense to backup such a large partition in RAM during sysupgrades. packages will never use anywhere near that amount of storage, and the rest will have to be mostly empty or it will not fit in RAM. or if the device had a wild amount of RAM, then sysupgrades would take forever. and on top, you would not be able to use that space for anything mildly important, as an interrupted sysupgrade would mean that you loose all your data.
the good news is that openwrt ports for devices with eMMCs that i have seen so far do not go for an extremely large overlay; instead they prefer a reasonably sized overlay, and leave the large extra space unused. should they need it, advanced users can then use this large space as an extra storage partition that will survive sysupgrades (just like they would use USB storage).
the QNAP 301w cited above currently uses 512 MB (minus rootfs) for overlay, which i think is just perfect, and leaves the extra space (the rootfs_data partition in this case) to be used by advanced users.
the only problem with this device is... the stock name of the rootfs_data partition just happens to be a magic name for openwrt! the whole setup -which is just fine- will break when this bug is fixed. and there is nothing the port author can do to avert that.
the problem with magic partition names
fstools recognizes some magic partition names (or at least intends to), such as ubi and rootfs_data. this was a perfectly fine decision in the MTD era. after all, the port author is free to define the MTD partitions and their names in the DTS. on an MTD device, the simple act of flashing a custom kernel is equivalent to repartitioning the flash; there are no extra steps involved. similarly, to return to stock and its partition scheme, one simply has to reflash the stock kernel.
so in the MTD era there was no incentive to make these magic partition names configurable, as the actual partition names were already configurable in the DTS. however this changed with eMMC devices: GPT partitions and their names preexist and are no longer defined by the port author. so we have a situation in which neither the actual partition names nor the magic names in the code can be configured, and IMHO this needs to change.
some may argue that the solution is repartitioning during install, but i argue it is not:
- basic users should not need to go through the trouble of repartitioning their devices to install openwrt.
- repartitioning also makes it much harder to return to stock (see next point).
- repartitioning typically requires backing up large flash areas, which is a process almost no port author will document (and no one will go through anyway), if the option to return to stock is desired.
- purists would also want to back up the GPT itself.
- eMMC devices typically support multiple firmware slots, and port authors usually want to support having the stock firmware in one of the slots (if only during development) which may preclude repartitioning.
even if you disagree with the above points, there is one more reason which IMHO is beyond argument:
- there are official openwrt devices that use an eMMC and on which secure boot is enabled (eg: spectrum sax1v1k). my experience with secure boot-enabled qualcomm android devices is that during the last decade qualcomm has signed the GPTs and their bootloaders have verified them, bricking any device on which the GPT was altered. i do not see a reason why they would take the trouble to avoid doing that in their routers, so i have to assume routers are in the same situation. so for secure boot-enabled devices, repartitioning is simply not an option.
the logical solution to all this is to make some magic partition names configurable.
other reasons to make magic names configurable
some eMMC devices support multiple firmware slots and also include matching multiple overlay partitions. uboot boot scripts could be made to support the various firmware slots: for the desired slot, the script can load the right kernel and set the kernel command line to point to the right rootfs; but unfortunately this breaks down because there is no way to inform fsutils which overlay partition to use.
i wrote dual boot scripts for the spectrum sax1v1k (secure boot enabled) to support booting an alternative recovery initramfs OS in case of emergency. before this, this device would terminally brick if a sysupgrade failed. i also wanted to support dual slot boot, but i ran into the overlay magic name problem. so i created a pull request to fix this and allow a kernel parameter to override which overlay partition to use, if desired.
which magic names should be configurable?
i am new to openwrt, so i do not know answer. i know openwrt recognizes ubi and rootfs_data. but ubi should only be recognized on MTD devices, where partition names can be controlled at whim, so only rootfs_data is problematic AFAICT. are there more magic partition names being searched for on block devices?
how to configure the magic names
there should be more than one way:
- via a kernel parameter: this is absolutely needed to support multiple firmware slots. i provided a solution for this in the PR cited above.
- via the DTS or in some other way controlled by the port author:
- justification: this bug i am reporting here! before fixing it, we want to modify the port of qnap 301w to make it ignore the existing
rootfs_data, and thus avoid breaking these devices. we also want to avoid users loosing their extra data, or loosing stock partition contents that were not backed up. in short, we want a way to keep using the existing 500 MB loop device for overlay which is just the right size.
- from this we see that a "magic name disabled" special value would be desirable, which could maybe just be the empty string?
- one way of implementing this configuration in the DTS is by setting the kernel parameter proposed in the previous point in the DTS. but we would like the uboot scripts to be able to override this default! what happens if both the DTS and the actual command line provide the same parameter? any other implementation ideas?
- via a uboot variable: this is optional but nice. if a user chooses to repartition their device, it would be nice for them to have an easy to use method to override the selection of overlay partition. of course they can just modify the kernel command line, but this is more complex and error prone, and errors in the uboot can be hard to fix. this configuration should have the least precedence IMO. i can provide a PR for this if desired.
sorry that this text got so long, and thanks for reading it.
EDIT: i want to add that i do not have access to a qnap 301w device. (issues popped up while i was reviewing the 301w sysupgrade code, and @sppmasterspp helped to debug them.)
@hauke @dangowrt
under some circumstances, the
rootfs_datapartition is ignored and a loop onrootfsis used instead for the overlay.yet fixing this bug is dangerous:
rootfs_datapartitions which are not being used as overlay. so in order to work correctly, their current sysupgrade code must also ignore these partitions and instead store the preserved settings in the trailing part ofrootfs. it follows that sysupgrade will break on all affected as soon as this bug is fixed.the number of affected devices is unknown to me, but i can at least point to one...
example device
QNAP QHora-301w has both a NOR flash and a GPT-formatted eMMC. kernel, rootfs, and overlay partitions exist in the emmc. the first two are used, which i suppose is directed by the uboot env. but rootfs_data, which should be handled by fstools, is ignored.
MTD partitions:
GPT partitions (showing
rootfs_data):mounted devices (showing the loop device):
the sysupgrade code for this device ignores
rootfs_dataand will thus break if this bug is fixed.finally, here you can see a user using this
rootfs_datapartition for their own purposes.root cause
i have no idea, in reporting this issue i am more concerned with the fallout that a fix to this issue could cause and how to avert it. that said, @pwned-pixel advanced an hypothesis for the cause of this bug here.
the overlay size problem
not long ago it used to be the case that the unwritten openwrt policy was to prefer implementing an overlay partition that was as large as possible. this clearly holds for devices with 256 MB of flash or less, and arguably for devices with 512 MB too.
however, i think this policy no longer holds for the newer eMMC devices we are currently seeing, with 4 or 8 GB of flash starting to be common.
clearly an 4GB overlay partition is completely overblown, and makes no sense to backup such a large partition in RAM during sysupgrades. packages will never use anywhere near that amount of storage, and the rest will have to be mostly empty or it will not fit in RAM. or if the device had a wild amount of RAM, then sysupgrades would take forever. and on top, you would not be able to use that space for anything mildly important, as an interrupted sysupgrade would mean that you loose all your data.
the good news is that openwrt ports for devices with eMMCs that i have seen so far do not go for an extremely large overlay; instead they prefer a reasonably sized overlay, and leave the large extra space unused. should they need it, advanced users can then use this large space as an extra storage partition that will survive sysupgrades (just like they would use USB storage).
the QNAP 301w cited above currently uses 512 MB (minus rootfs) for overlay, which i think is just perfect, and leaves the extra space (the rootfs_data partition in this case) to be used by advanced users.
the only problem with this device is... the stock name of the rootfs_data partition just happens to be a magic name for openwrt! the whole setup -which is just fine- will break when this bug is fixed. and there is nothing the port author can do to avert that.
the problem with magic partition names
fstools recognizes some magic partition names (or at least intends to), such as
ubiandrootfs_data. this was a perfectly fine decision in the MTD era. after all, the port author is free to define the MTD partitions and their names in the DTS. on an MTD device, the simple act of flashing a custom kernel is equivalent to repartitioning the flash; there are no extra steps involved. similarly, to return to stock and its partition scheme, one simply has to reflash the stock kernel.so in the MTD era there was no incentive to make these magic partition names configurable, as the actual partition names were already configurable in the DTS. however this changed with eMMC devices: GPT partitions and their names preexist and are no longer defined by the port author. so we have a situation in which neither the actual partition names nor the magic names in the code can be configured, and IMHO this needs to change.
some may argue that the solution is repartitioning during install, but i argue it is not:
even if you disagree with the above points, there is one more reason which IMHO is beyond argument:
the logical solution to all this is to make some magic partition names configurable.
other reasons to make magic names configurable
some eMMC devices support multiple firmware slots and also include matching multiple overlay partitions. uboot boot scripts could be made to support the various firmware slots: for the desired slot, the script can load the right kernel and set the kernel command line to point to the right rootfs; but unfortunately this breaks down because there is no way to inform fsutils which overlay partition to use.
i wrote dual boot scripts for the spectrum sax1v1k (secure boot enabled) to support booting an alternative recovery initramfs OS in case of emergency. before this, this device would terminally brick if a sysupgrade failed. i also wanted to support dual slot boot, but i ran into the overlay magic name problem. so i created a pull request to fix this and allow a kernel parameter to override which overlay partition to use, if desired.
which magic names should be configurable?
i am new to openwrt, so i do not know answer. i know openwrt recognizes
ubiandrootfs_data. butubishould only be recognized on MTD devices, where partition names can be controlled at whim, so onlyrootfs_datais problematic AFAICT. are there more magic partition names being searched for on block devices?how to configure the magic names
there should be more than one way:
rootfs_data, and thus avoid breaking these devices. we also want to avoid users loosing their extra data, or loosing stock partition contents that were not backed up. in short, we want a way to keep using the existing 500 MB loop device for overlay which is just the right size.sorry that this text got so long, and thanks for reading it.
EDIT: i want to add that i do not have access to a qnap 301w device. (issues popped up while i was reviewing the 301w sysupgrade code, and @sppmasterspp helped to debug them.)