Debugging a Guru Meditation Error – Esp32 & FastLED
Ok, so I am going to use this “article” page to debugg an error that I have while developing Lesk
I went already pretty far into conceiving Lesk. To be short, I have a static class with around 60 effects for FastLED, all taken from Youtube Videos of course, I have implemented an AsyncWebServer, a Controller Class for all the stripes I have, a static Effect class that control each and every effect (such as Red value, Blue value, time value etc.).
However, I get, when activating the serial the infamous GURU MEDITATION ERROR : Core 1 panic’ed (Interrupt wdt timeout on CPU1).
From my first observations, this problem arises in 2 occasions:
- When I update the webpage that displays the effects
- When I change the settings/effects too fast
Here is the message:
Guru Meditation Error: Core 1 panic'ed (Interrupt wdt timeout on CPU1).
Core 1 register dump:
PC : 0x4008507a PS : 0x00050035 A0 : 0x400814e7 A1 : 0x3ffc4f7c
A2 : 0x00800000 A3 : 0x00818044 A4 : 0x28c08800 A5 : 0x3ffc4f5c
A6 : 0x00000008 A7 : 0x00000008 A8 : 0x00000001 A9 : 0x4008dc8a
A10 : 0x3ffcf408 A11 : 0x00000000 A12 : 0x3ffc9b40 A13 : 0x3ffc9b24
A14 : 0x00000000 A15 : 0x00000015 SAR : 0x00000020 EXCCAUSE: 0x00000006
EXCVADDR: 0x00000000 LBEG : 0x40084a5d LEND : 0x40084a65 LCOUNT : 0x00000027
Core 1 was running in ISR context:
EPC1 : 0x400f7aaf EPC2 : 0x00000000 EPC3 : 0x400814e7 EPC4 : 0x00000000
Backtrace: 0x40085077:0x3ffc4f7c |<-CORRUPTED
Core 0 register dump:
PC : 0x40193a6a PS : 0x00060535 A0 : 0x800f6a04 A1 : 0x3ffbcab0
A2 : 0x00000000 A3 �
Let’s investigate what is a Guru Meditation error using this link: https://docs.espressif.com/projects/esp-idf/en/latest/esp32/api-guides/fatal-errors.html
Definition I found: It is a fatal error that doesn’t allow to continue the right execution of the program. Should be handled. In our case, as we have a Watchdog timeout, we can check the PC(program counter) and the SP(Stack pointer) to locate where the problem has happened. Here, we should have a RTC Watchdog Timeout problem.
This led us to another link: https://docs.espressif.com/projects/esp-idf/en/latest/esp32/api-reference/system/wdts.html
Before reading the article, I can infer that the problem comes from the fact that the interrupt that sends data to the LEDs take too much time (I read another article already), but let’s see if this hypothesis is right. “When the IWDT times out, the default action is to invoke the panic handler and display the panic reason as Interrupt wdt timeout on CPU0 or Interrupt wdt timeout on CPU1 (as applicable)”
Then we learn that it is possible to config it : CONFIG_ESP_INT_WDT or CONFIG_ESP_INT_WDT_TIMEOUT_MS (Need to check where to try that). But it is advised to change the code. Hence, it should be better to either have segment of 4 leds OR increase the time. I will try with a segment of 4 LEDs now. –> So I did the test and… It fails from having only 4 leds. To be clear, the problem that arises is that, under 6 leds, there is another way to manage the leds. More than 6 LEDs and it’s ok. I don’t know what to do for now on this. But I have decided to not look more as I would need to investigate both FastLed and Wifi libraries. Let’s move on.
By using Chat GPT, I have been advised to use the “xtensa-esp32-elf-addr2line” which is a tool that converts memory addresses into file name and line numbers. This can help me understand where does the error is. However, I have had several situations where I couldn’t remember how to use it, so here is a quick guide right now for myself:
- Check that it’s installed by going to [Your file structure]\.platformio\packages\toolchain-xtensa-esp32\bin and see if you can find the xtensa-esp32-elf-addr2line.exe
- Then, run the command [your file structure]\.platformio\packages\toolchain-xtensa32\bin\xtensa-esp32-elf-addr2line -pfiaC -e [your file structure]\.pio\build\lolin32_lite\firmware.elf [Addresses here]
From there, I can see where are located the problem in the program.
For instance, xtensia redirected me to the file clackless_rmt_esp32.cpp from the FastLED library for something that is a problem of interrupt. This is where my research have stopped.
I have found this interesting link that I need to investigate more: https://github.com/FastLED/FastLED/wiki/Interrupt-problems
From what I understood, depending on the type of stripe I’m using, here ws2812, then it takes some time to write out the information/data to the led. 30 micros seconds per pixel. So 300 micro second as I have 10 leds now (should be adaptable). However, I have decided to change the Task Watchdog timeout period in the platformi.ini and I still have the same problem…
Another problem is that, I don’t really understand the behavior of the error so it’s as well quite complex.
17/10 : So one thing that I tried to do is to deactivate the watchdog on core 1 when the interrupt is triggered. To structure my thought, I tried to use disableCore1DWT() line 426 and enableCore1WDT line 460 in the method ESP32RMTController::interruptHandler from this file : .pio\libdeps\lolin32_lite\FastLED\src\platforms\esp\clockless_rmt_esp32.cpp . Let’s try it… and nope !
Even if the behavior that I am simulating would barely be the one of a real user (which feels that less possibilities of bug could occur), I still need to be sure of the bug. Right now, I will move the disableCore1WDT() in the loop and check the new behavior… and Nope ! Even worse, when I placed back the enable/disable of the watchdog, I got more problems… This is driving me crazy to be completely honest. It feels as if there is no logical reason for the error to appear. Sure, it is supposed to be a question of interrupt taking too much time, but here it is, sometimes it triggers when I change a setting too fast –> Makes sense, sometimes it happen even if I don’t change a setting rapidly –> Doesn’t make much sense, sometimes it happens when I reload a web page –> doesn’t make much sense either.
Chat GPT suggest: Setting max refresh rate to 60 (tried, difficult to see real improvement tbh), use core pinning (not tried), increase the watchdog timeout (tried, not satisfied, need to retry –> Gave some good results but not 100% sure), use an async web server (already here), check memory allocation (not tired), debugging with watchdog info. To be continued…
21st of October
Another thing that Chat GPT advised to do was to 1) #include “esp_task_wdt.h” 2) use esp_tasj_wdt_delete(NULL) at the beginning of the section where the problem could arise and 3) use esp_task_wdt_add(NULL) at the end of the problem. So far, this seems to bug more but as well recover from mistakes. I am for keeping this configuration right now and see what we can work on. !!! It appeared this configuration didn’t work and led to a lot of problem but…
So I disabled the disableCore1WDT() and enableCore1WDT() function int the clockless_rmt_esp32.cpp and right now I got really good results. I will post the problem on Reddit to check if this could have another origin but I think that increasing the WDT timeout was indeed the best choice. Let’s post on Reddit.
So far, the Chat GPT suggestions have been:
Debouncing or Throttling : Good resource – https://www.youtube.com/watch?v=cjIswDCKgu0 – But this should come after understanding the whole problem.
Check LED Update Frequency
Async Handling of Web Requests
Guard Against Multiple Requests
Optimize ISR Workload
Other behavior to fix
The system needs to reboot when an error executes
