I had a bit of free time over the weekend so I did a bit of testing and experimenting. I can conclude that by commenting out the define for TFT_CS in the setup file of the tft_espi library it did not default back to GPIO 5. I initially ran only the example sketch Bouncing Cube with the screens CS set to GPIO 15. The program reported 95 fps on the serial monitor. I then set up a sketch to run both Bouncing Cube and Boing Ball at the same time with output to two screens. The GPIO's used were 15 and 21. The program reported around 53 fps which I believe is to be expected. So, I thought I'd try running on dual cores expecting to get the same fps or maybe slightly higher but was disappointed to find that the fps dropped to 23!! Not what I expected!! Can anybody shine some light on why the program runs remarkably slower on dual cores than when running on just one core? Thanks in advance.
I've pinned this comment as the work you have done is excellent. The dual core problem might be because (this is a total guess) each core is halting whilst it negotiates the use of the spi bus, as they can't just plough ahead as data would become corrupted. Whereas in single core it knows its got full control so no overhead for negotiation of the bus. This is just a guess mind.
@@XTronical Came here to ask about the dual core and to reply to this, so my work here is done! But yes I 100% agree it will be contention for the bus, regardless of whether you are writing to one or two screens. One thing to consider if using this for a dual core display / game is that you could have one core doing the calculations for the screens and then another core in a loop just updating those displays. For a game and watch like you suggest, this may work well. I'm going to try this approach with a master system emulator and have one core running the emulator and the other just shifting the changed pixels to over the bus.
@@MikeDX2 If you have more speed than memory, or SPI bandwidth (as can be the case on the ESP32) I found that graphics hashes can be useful for some emulator workloads. Divide each line into 16 pixel chunks for example, calculate a display line of pixel data through the emulation but instead of just shifting those pixels to the SPI bus instead check if the hash for that block of pixels has changes from the last display write. If it has changed perform the SPI write to update that block of pixels, if not then discard the update. Under some conditions that gives a speed up, you can also add a flag to bypass the hash test code if the emulator is aware that a complete screen re-draw is due (due to a pan or scroll for example) .... just a thought :-)
I understand that in the setup file if you comment out the line that defines TFT_CS then the tft_espi library will not control any pin for the chip select. That way you don't have a pin that you can't use for any other purpose.
If you want to drive two displays without loosing speed then you can write a display driver to generate two clock plus MOSI signals in parallel (i've done this on an RPI before). Snag is memory plus everything and its dog would need re-coding but with DMA driven IO the speed would be exactly the same as one display (at least for the I/O part of the code).
Would you still not have more data on the spi bus? So twice data for example would be twice slow? The spi bus has a max data throughput, how could you send more than the hardware /wire bandwidth can manage?
@@XTronical I've been digging through the ESP32 datasheets. Looks like I was wrong (ish), the ESP32 has a DMA option for some peripherals but not GPIO (best I can tell). The ESP32s I2S device can clock multiple GPIO lines under DMA control, so it would be possible to generate multiple SPI output streams using DMA driven output. See ESP32 VGA output for examples of driving multiple bits in parallel via DMA. Some ARM CPUs have DMA driven I/O, a table is built containing the states to output for example, that table is then clocked out as output states, when the table gets to a certain depth an IRQ is raised so that a new table can be built before the first one is complete (double buffering). Looks like the ESP has a similar system but not directly for GPIO lines itself but only for the higher level devices. DMA driven output typically does not requite more bandwidth that a single output bit as the DMA engine takes a word from RAM plus stored mask and presents that to the memory mapped GPIO output register, IE as many bits of GPIO as the native word length per clock. The RPI boards can do this (I used it to drive LED displays and generate multiple DMX512 streams), I think the Pico can (not entirely sure, the PICO state machine can do the same job). SPI as output only is just two bits, clock + data so would be a good candidate for this approach.
@@XTronical "So twice data for example would be twice slow?" - To help think about it, ask "twice as slow as what?" The DMA controller has its own clock, the memory has a bandwidth, the CPU(core) has its own bandwidth, the output peripheral has its own bandwidth. The instruction set typically only reads a single op-code+data combination and typically may effect one bit or up to "word length" bits with that one instruction. The idea of DMA driven IO is that the DMA controller reads from RAM and writes to the device without the need for the CPU... DMA may contend with the CPU for RAM but typically RAM is faster than the CPU so more bandwidth and a lot more efficiency can be gained using DMA. The DMA transaction requires the CPU to set up a table of values to kick it off but does not requite CPU intervention while that table is clocked to the output. The result is more throughput, how much depends on many real world things ....
Wow, some very detailed replies. Yes I've tried pushing the SPI bus before and you reach a point where it will not work reliably, whether that's because of physical wire length or the device etc. causing the issue. I was pushing an st7735 display hard once and reached a limit. So as I allude to in the video if you run two displays on that same bus you reduce the update rate ( or at least I perhaps should have said potentially with the other factors involved). Using two SPI busses obviously can get around this but I'm using one bus.
Interesting. As you mentioned in another video, the TFT_eSPI library is really not architected in a way that can support multiple displays, so the limitation here is that you can only use displays that have the same driver, and effectively all the same pins except of course for CS. This is clever of course, but I don't think this would work with two different displays. It's certainly a neat trick nonetheless, and in fact if you do have two displays it's probably more likely than not that they are similar. Thanks for sharing!
It would be hard to play a video due to bandwidth issues. Your video would need to be uncompressed as processor not powerful enough to do in real time, and your framerate would be quite awful once you've pulled the data from sd card to the screen. And that's just if using one screen. There are extra overheads with two screens. You'd have to go SD card route as not enough memory for more than about a second of video internally.
Using the tft_espi library you set the pins in a header file. This is shown in the video for setting up one screen which I link to in the video description.
Now that is Phenomenal Thank you for showing these great projects you are truly on the next level Edit I think RTOS would also be good for this type of functions
Yes you could but you wouldn't necessarily gain as much speed (if any, depends on what your doing} as you might imagine. As the spi bus would be a shared resource only one core could send data at a time. So there is a bottleneck there. However if most of your graphics work was say intensive calculations (I. E. 3d stuff) then yes you might see some increase in fps.
I've just thought though. With this library it would require more recoding to make it work. Another library might be more suitable. This library as alluded to in the video was really imagined to just work with one screen, one SPI bus. I wonder if any other library's would fair better.
I had a bit of free time over the weekend so I did a bit of testing and experimenting. I can conclude that by commenting out the define for TFT_CS in the setup file of the tft_espi library it did not default back to GPIO 5. I initially ran only the example sketch Bouncing Cube with the screens CS set to GPIO 15. The program reported 95 fps on the serial monitor. I then set up a sketch to run both Bouncing Cube and Boing Ball at the same time with output to two screens. The GPIO's used were 15 and 21. The program reported around 53 fps which I believe is to be expected. So, I thought I'd try running on dual cores expecting to get the same fps or maybe slightly higher but was disappointed to find that the fps dropped to 23!! Not what I expected!! Can anybody shine some light on why the program runs remarkably slower on dual cores than when running on just one core? Thanks in advance.
I've pinned this comment as the work you have done is excellent. The dual core problem might be because (this is a total guess) each core is halting whilst it negotiates the use of the spi bus, as they can't just plough ahead as data would become corrupted. Whereas in single core it knows its got full control so no overhead for negotiation of the bus. This is just a guess mind.
@@XTronical Came here to ask about the dual core and to reply to this, so my work here is done! But yes I 100% agree it will be contention for the bus, regardless of whether you are writing to one or two screens. One thing to consider if using this for a dual core display / game is that you could have one core doing the calculations for the screens and then another core in a loop just updating those displays. For a game and watch like you suggest, this may work well. I'm going to try this approach with a master system emulator and have one core running the emulator and the other just shifting the changed pixels to over the bus.
@@MikeDX2 If you have more speed than memory, or SPI bandwidth (as can be the case on the ESP32) I found that graphics hashes can be useful for some emulator workloads. Divide each line into 16 pixel chunks for example, calculate a display line of pixel data through the emulation but instead of just shifting those pixels to the SPI bus instead check if the hash for that block of pixels has changes from the last display write. If it has changed perform the SPI write to update that block of pixels, if not then discard the update. Under some conditions that gives a speed up, you can also add a flag to bypass the hash test code if the emulator is aware that a complete screen re-draw is due (due to a pan or scroll for example) .... just a thought :-)
I understand that in the setup file if you comment out the line that defines TFT_CS then the tft_espi library will not control any pin for the chip select. That way you don't have a pin that you can't use for any other purpose.
Give it a go. I thought I'd tried that and it still defaulted to 5 but maybe misremembered.
If you want to drive two displays without loosing speed then you can write a display driver to generate two clock plus MOSI signals in parallel (i've done this on an RPI before). Snag is memory plus everything and its dog would need re-coding but with DMA driven IO the speed would be exactly the same as one display (at least for the I/O part of the code).
Sounds excellent
Would you still not have more data on the spi bus? So twice data for example would be twice slow? The spi bus has a max data throughput, how could you send more than the hardware /wire bandwidth can manage?
@@XTronical I've been digging through the ESP32 datasheets. Looks like I was wrong (ish), the ESP32 has a DMA option for some peripherals but not GPIO (best I can tell). The ESP32s I2S device can clock multiple GPIO lines under DMA control, so it would be possible to generate multiple SPI output streams using DMA driven output. See ESP32 VGA output for examples of driving multiple bits in parallel via DMA. Some ARM CPUs have DMA driven I/O, a table is built containing the states to output for example, that table is then clocked out as output states, when the table gets to a certain depth an IRQ is raised so that a new table can be built before the first one is complete (double buffering). Looks like the ESP has a similar system but not directly for GPIO lines itself but only for the higher level devices. DMA driven output typically does not requite more bandwidth that a single output bit as the DMA engine takes a word from RAM plus stored mask and presents that to the memory mapped GPIO output register, IE as many bits of GPIO as the native word length per clock. The RPI boards can do this (I used it to drive LED displays and generate multiple DMX512 streams), I think the Pico can (not entirely sure, the PICO state machine can do the same job). SPI as output only is just two bits, clock + data so would be a good candidate for this approach.
@@XTronical "So twice data for example would be twice slow?" - To help think about it, ask "twice as slow as what?" The DMA controller has its own clock, the memory has a bandwidth, the CPU(core) has its own bandwidth, the output peripheral has its own bandwidth. The instruction set typically only reads a single op-code+data combination and typically may effect one bit or up to "word length" bits with that one instruction. The idea of DMA driven IO is that the DMA controller reads from RAM and writes to the device without the need for the CPU... DMA may contend with the CPU for RAM but typically RAM is faster than the CPU so more bandwidth and a lot more efficiency can be gained using DMA. The DMA transaction requires the CPU to set up a table of values to kick it off but does not requite CPU intervention while that table is clocked to the output. The result is more throughput, how much depends on many real world things ....
Wow, some very detailed replies. Yes I've tried pushing the SPI bus before and you reach a point where it will not work reliably, whether that's because of physical wire length or the device etc. causing the issue. I was pushing an st7735 display hard once and reached a limit. So as I allude to in the video if you run two displays on that same bus you reduce the update rate ( or at least I perhaps should have said potentially with the other factors involved). Using two SPI busses obviously can get around this but I'm using one bus.
hello friend! help is needed ! I can't flash esp32 sketch animated eyes 2, could you make a video for dummies?
I have 2 displays 1.44 inches on the driver st7735
Sorry, it would be some time before I could look at this
@@XTronical 😞👌
Я подписан на вас !!! Жду новых видео🤝
Interesting. As you mentioned in another video, the TFT_eSPI library is really not architected in a way that can support multiple displays, so the limitation here is that you can only use displays that have the same driver, and effectively all the same pins except of course for CS. This is clever of course, but I don't think this would work with two different displays. It's certainly a neat trick nonetheless, and in fact if you do have two displays it's probably more likely than not that they are similar. Thanks for sharing!
Would it be possible to play a video but stretched to fit across two of these displays?
It would be hard to play a video due to bandwidth issues. Your video would need to be uncompressed as processor not powerful enough to do in real time, and your framerate would be quite awful once you've pulled the data from sd card to the screen. And that's just if using one screen. There are extra overheads with two screens. You'd have to go SD card route as not enough memory for more than about a second of video internally.
This solution might be a 'kludge' but if it works then that's all that matters - thank you for the comprehensive content!
How can I define new pins
For my tft I want to connect
Ili9341 and an rfid module the same Time 🤔
Using the tft_espi library you set the pins in a header file. This is shown in the video for setting up one screen which I link to in the video description.
Now that is Phenomenal
Thank you for showing these great projects you are truly on the next level
Edit I think RTOS would also be good for this type of functions
Amazing work, thanks!
Very helpful thank you!
Handy tip if you have a lot of info to display.
Could you run the screens on separate cores of the ESP32? Great video, keep up the good work.
Yes you could but you wouldn't necessarily gain as much speed (if any, depends on what your doing} as you might imagine. As the spi bus would be a shared resource only one core could send data at a time. So there is a bottleneck there. However if most of your graphics work was say intensive calculations (I. E. 3d stuff) then yes you might see some increase in fps.
@@XTronical But there are 2 SPI bus in ESP32 right..
Yed, this solution uses one. Using two would be a good solution for using two cores. Good suggestion.
I've just thought though. With this library it would require more recoding to make it work. Another library might be more suitable. This library as alluded to in the video was really imagined to just work with one screen, one SPI bus. I wonder if any other library's would fair better.
Cool but not too usefull...
A few years ago I was testing 4 LCDs with STM32, it worked without any issues. My video: v=tBNnuXBP5rg