JCI (the company that Mazda hired to design the CMU) has created a flawed design. If your CMU is broken, there is no repair that a user can do. Do NOT contact me asking how to fix your CMU.
The controller unit of the Mazda Connect (aka MZD Connect) was designed by JCI (Johnson Controls Inc.).
It uses an 8MByte SPI NOR flash for booting. The lion's share of that memory goes to a Linux-based "failsafe" image to upgrade firmware. However, despite the name, it is anything BUT failsafe in operation.
The memory is segmented as follows:
|official partition name||offset prior to v31||offset after v31|
The bootstrap image corresponds to the Boot Devices Program image documented by Freescale for the i.MX6. Based on the state of the boot-select partition, it either executes code from ibc1 (normal operation) or ibc2 (upgrade operation).
Execution from the ibc1 goes to the NAND-flash image (~5GBytes of storage) main application. Execution from the ibc2 goes to the 7MB "fail-safe" image.
The "fail-safe" image is what drives the LCD display to provide the progress bar during the lengthy upgrade process. It reads an upgrade image from a USB drive and reprograms the NAND-flash as well as the firmware of on-board peripherals.
For the user, upgrading the firmware consists of two portions: the "failsafe" (bootstrap, ibc2, and fail-safe) and the application (ibc1, NAND-flash, and peripheral firmware).
JCI's reasoning was clearly rooted in the (misguided) belief that this made it failsafe, since the unit will only boot to ibc2 + fail-safe until the firmware is completely updated. However, as always, the devil is in the details.
The most vulnerable portion of the firmware update is any writes to the bootstrap. If the user's "failsafe" image contains a different bootstrap image, the code gleefully rewrites it. A power failure at this point effectively "bricks" the unit with no hope of recovery.
I take issue with such a design. In my opinion, such bootstrap code should NEVER be re-written in the field. It is such a small enough amount of code (under 3 kBytes) that it should be heavily tested and audited in development, programmed in the factory (preferably in a write-protected portion of memory), and never touched again. However, this is not what JCI has done.
Another questionable design is the "fail-safe". As an experiment, I deliberately disrupted power during the fail-safe execution.
The result was a "bricked" unit. It endlessly rebooted (briefly flashing the backlight of the display each time). I've provided the exact console output that was endlessly repeated here.
The failure mode was as follows: rather than adopt the safest practice of starting the upgrade process from the very beginning at every boot of the boot-strap code, it tries to detect existing NAND-flash partitions and resume from the point it left off. No doubt, they thought their super-fancy NAND-flash routines (advertised as "risk-free mobile data") would protect them. However, that hubris was to assume the entire file system AND the hardware was going to be in the exact same state as before the disruption. As a result, the "fail-safe" code just FAILS... and this is exactly what it should NOT do.
By deliberately corrupting the NAND-flash, I was able to force the "fail-safe" to start from the very beginning (instead of the partial-resume feature it wants), and that allowed me to "unbrick" the unit. However, for a normal user or mechanic, this bad behavior would require an expensive replacement of the unit.
back to main page contact