Visualization of Data Preparation for a Neural Net (RNN/LSTM/GRU)

แชร์
ฝัง
  • เผยแพร่เมื่อ 7 ส.ค. 2020
  • Data is collected from a Gyroscope/Accellerometer/Magnetometer board, although only Gyro readings are used in this implementation (gx, gy, gz).
    I prepared this video when figuring out how I wanted to handle the data.
    (The music is "A New Beginning" by Esther Abrami, found in the TH-cam Music Library)

ความคิดเห็น • 15

  • @Ruhgtfo
    @Ruhgtfo 3 ปีที่แล้ว +4

    Whoal great job for making this visualization~ salute

  • @saminchowdhury7995
    @saminchowdhury7995 3 ปีที่แล้ว +3

    My god this was beautiful
    More power to you my friend

    • @jaggztech
      @jaggztech  3 ปีที่แล้ว +1

      Wow. I'm glad you liked it. :)

  • @blankblank2162
    @blankblank2162 2 ปีที่แล้ว

    Hey man, amazing video. Great representation. Quick question, what is the song's name?

    • @jaggztech
      @jaggztech  2 ปีที่แล้ว

      "A New Beginning" by Esther Abrami. Enjoy! (I added it to the video description now too).

  • @patite3103
    @patite3103 3 ปีที่แล้ว

    Great video! I missed the comments. It would be much more valuable with comments!

  • @bhavulgauri7832
    @bhavulgauri7832 3 ปีที่แล้ว

    Hey may I know what software you used for this?

    • @jaggztech
      @jaggztech  3 ปีที่แล้ว +3

      Blender. (Gnuplot for the sample data (text graphs at the bottom -- cosest to us -- but I'm pretty sure you're not asking about those).

  • @yali_shanda
    @yali_shanda 8 หลายเดือนก่อน

    How do you overlap them in terms of tensor data and shape?

    • @jaggztech
      @jaggztech  7 หลายเดือนก่อน

      We take 6 readings as a single timestep (this exposes the LSTM to the time-based data in a "shallow" way, reducing on its need to try to use its internal capacity for relating the things through time). This makes a 3x6 (ie. (gx,gy,gz).shape * 6).
      We also provide sequences to the LSTM, and we have to choose that length. I showed the use of 7 in the visualization (and in the project) -- it is not to be confused with the unfortunate choice I made to show a batch count of 7 as well! (The batch count can vary depending on your resources and its impact on training; I won't go into detail about that here.)
      Anyway, ultimately I'm left with a full array of shape 3x6x7 as one batch item (X). (My output was a final mouse-cursor velocity (Vx,Vy), but nevermind that.)
      Finally: The overlap is a part of the choice of what to use for each of the sequence items (each "timestep") -- it does not directly impact the shape of that 3x6x7, but one does have to balance some considerations. Read on if you're interested:
      In this project, I chose 6 samples at a time (a 6-length window of sensor data through time) -- that's the 3x6. We have to slide through the data though, and that's the overlap-choice you see. **It's the analog of strides in convnets (CNNs)**. For instance, assuming a 1d set of samples (instead of gx,gy,gz, let's assume we just have one value 'x'). We would then have x1,x2,x3,x4,x5,x6. If we slide only 1 sample, the next timestep item would be x2,x3,..,x7, and the next x3,...,x8. This might be too slow for processing, so I chose to slide over 50% (3 values).
      So a single batch item is a sequence of these 6-length windows. Back to our 3x6 data, we'd have 7 rows, the first two shown here:
      array([
      [[gx1, gy1, gz1], [gx2, gy2, gz2], ..., [gx6, gy6, gz6]],
      [[gx4, gy4, gz4], [gx5, gy5, gz5], ..., [gx9, gy9, gz9]],
      ...
      ]) where the shape would be 3x6x7 (ie. shape == (7,6,3))
      So, as you can see, the overlap choice (the amount we slide) does not impact the shape. Instead, its considerations are on how it will work within our project's needs, and how the net will be processing it.
      Considerations:
      * Sliding 1 at a time might be too slow for live processing of data, especially since this was potentially destined to run as a small network on a microcontroller.
      * We could just feed in 1 sample at a time (gx,gy,gz), but this would place the burden of relating the time-based relationships on the LSTM. By doing 6 at a time we are basically saying, "here, Mr/Mrs. LSTM, this is some nice data where, in a shallow way, you can see the time-based relationship very easily. Figure it out from there."
      * Anti-consideration: Skipping 3 samples does mean the network has to learn the relationship through time, with that sliding amount, but it shouldn't be too big of an issue with the way our data changes.
      Additional Data/Training Considerations:
      * By providing the larger step, we are giving the LSTM a greater view through time, while still providing the 6-sequential samples in each timestep. This is a sort of "best of three worlds" aim, where it can handle the high-frequency changes (6-sequential), the relation between windows with overlap, and an expanded overall view (from the full sequence of timesteps).
      * With the overlap we're also making a bit more use of our available labeled training data. We effectively double our count of X values, without additional augmentation methods being involved.
      * Augmentation is a separate issue. Some random noise, dropout, etc. can be used to make the network more capable (more 'robust').
      * The LSTM is going to see each 6-length timestep at a time, and then a bunch of those one after the otehr (7 of them in this example). For this project, the next batch item is unrelated to that sequence, so the LSTM state is cleared inbetween.
      * It is not completely out of the question to leave the state, and the network then must learn to transition from that prior unrelated-in-time sensor data, through the change; this *could* make the network more robust, or it might also just waste network capacity, while also being unreflective of the final real-world constant progression of sensor data through time during inference. I went ahead and just let it reset and consider each sequence separately.
      * By using that overlap with the expanded full sequence window, we're also presenting more of the same data, but slightly offset. This, as mentioned, makes more use of our training data, but it also can assist with the network learning to be more resilient to noise.
      In this project, the sensor data can be noisy from different sources: Inherent sensor noise, friction in the device during use causing fluctuations, or even patient/user unique characteristics (maybe one of them is shaky?) These can all occur at different timescales/frequencies, so I was attempting to get the whole thing very capable with minimum resources, handling high and low frequency noise, while also providing the ability to make use of crucial high and low frequency information like the way the user moves, their gestures, and even if they intended on moving a mouse cursor faster or slower.
      I hope I more than answered your question. While you may have just wanted to know the 3x6x7, I figured I would go all-out and provide extra detail in case it's useful to you or someone else.

  • @tamimiemran9705
    @tamimiemran9705 2 ปีที่แล้ว +3

    I somwhow understand it less now

  • @varunahlawat9013
    @varunahlawat9013 ปีที่แล้ว +2

    Awful man

    • @jaggztech
      @jaggztech  2 หลายเดือนก่อน

      Uhh. Thanks? You too! :)