For the Gram matrix (at 57:00) I don't see how the efficient reshaping method for finding the Gram matrix is the same as the initially presented way. The reshaped matrix version suggests that G_i,j = dot product between a H x W slice for filter i, and a H x W slice for filter j. However, the initally presented way suggests that G_i,j will contain, on it's diagonal, a product between two different elements in the same H x W slice, which is impossible in the efficient version. Is there a nuance to the reshaping that isn't provided in the lecture or did I miss something? Also note when I say H x W slice, I mean the activation map for one filter.
Yh i get that but, the matrix multiplication on the slide looks like a simplification. Let F = C x HW. F*(F^T) has a sum of H*W elements as every element. whereas if we did it pair wise it would have a sum of ~ (HW)^2 elements in each element; maybe it's an interesting trick like how variance can be calculated as the sum of all squared pairwise differences, I'm not bothered to derive that tho lol.@@mrfli24
@@vitalispyskinas5595 okay, let me explain why these computations are the same. The first method computes the sum over (HxW) feature vectors v of outer products vv^T = VV^T where V is the matrix whose columns are vectors v (not a trick, just linear algebra fact). The matrix V of size (C, HxW) is the flatten matrix F in the slide.
@@mrfli24 okay, but even though you can view a matrix multiplication as a sum of outer products, v_1 is never outer-producted with v_2. There are only ever outer products of the form v_k*v_k, and never v_k*v_j, for k!=j.
I'm having a hard time understanding the aspect where 'fast neural style transfer' differs from the paper that you improve upon that optimizes on the loss of both content and style(but is much slower).
"So there's this great paper by Johnson et al" was comedy GOLD
For the Gram matrix (at 57:00) I don't see how the efficient reshaping method for finding the Gram matrix is the same as the initially presented way. The reshaped matrix version suggests that G_i,j = dot product between a H x W slice for filter i, and a H x W slice for filter j. However, the initally presented way suggests that G_i,j will contain, on it's diagonal, a product between two different elements in the same H x W slice, which is impossible in the efficient version.
Is there a nuance to the reshaping that isn't provided in the lecture or did I miss something?
Also note when I say H x W slice, I mean the activation map for one filter.
Looking at the slides, I think it's the initialy presentation that more clearly illustrates what is happening, please let me know if that's not true.
The efficient thing is that computation is done via matrix multiplication, otherwise they are the same.
Yh i get that but, the matrix multiplication on the slide looks like a simplification. Let F = C x HW. F*(F^T) has a sum of H*W elements as every element. whereas if we did it pair wise it would have a sum of ~ (HW)^2 elements in each element; maybe it's an interesting trick like how variance can be calculated as the sum of all squared pairwise differences, I'm not bothered to derive that tho lol.@@mrfli24
@@vitalispyskinas5595 okay, let me explain why these computations are the same. The first method computes the sum over (HxW) feature vectors v of outer products vv^T = VV^T where V is the matrix whose columns are vectors v (not a trick, just linear algebra fact). The matrix V of size (C, HxW) is the flatten matrix F in the slide.
@@mrfli24 okay, but even though you can view a matrix multiplication as a sum of outer products, v_1 is never outer-producted with v_2. There are only ever outer products of the form v_k*v_k, and never v_k*v_j, for k!=j.
1:01:00 "so now some really brilliant person had an idea" XD
I said WOW out loud like three times during this lecture... WOL
I also cannot wait to share these beautiful images to my friends
I'm having a hard time understanding the aspect where 'fast neural style transfer' differs from the paper that you improve upon that optimizes on the loss of both content and style(but is much slower).
Like you said just stick a feed forward network in there to do it in a single pass but im not sure what you mean by that.
WOW