Thanks for the good sample! Be aware about the limitations, though! The split_text_into_chunks() has a naive implementation. What if the placement of the 4000th falls into the middle of the word, or even the sentence? What if that word or sentence was a pivotal one for the meaning of the text?! The split should be smart enough to 1. Jump to the 4000th place, 2. See if it is the end of the paragraph (or sentence, or word) 3. Reduce the size of the chunk from the end till reaching an end of the paragraph. 4. Consider it an end of chunk. This easy improvement should improve the trustfulness of the implementation. Another consideration is worth disclosing the limitation of this approach: what if the PDF content has pictures the text refers to? We should probably switch to using a multi-modal Generative AI, like Bard, or something.
Yep, in some cases the script we shared here wouldn't be enough, adding some additional functionality to handle these kinds of situations wouldn't be a bad idea. Thanks for sharing! 😊
Happy new year!
Happy New Year! ❤
Happy new years
Thanks for the good sample!
Be aware about the limitations, though! The split_text_into_chunks() has a naive implementation. What if the placement of the 4000th falls into the middle of the word, or even the sentence? What if that word or sentence was a pivotal one for the meaning of the text?! The split should be smart enough to
1. Jump to the 4000th place,
2. See if it is the end of the paragraph (or sentence, or word)
3. Reduce the size of the chunk from the end till reaching an end of the paragraph.
4. Consider it an end of chunk.
This easy improvement should improve the trustfulness of the implementation.
Another consideration is worth disclosing the limitation of this approach: what if the PDF content has pictures the text refers to? We should probably switch to using a multi-modal Generative AI, like Bard, or something.
Yep, in some cases the script we shared here wouldn't be enough, adding some additional functionality to handle these kinds of situations wouldn't be a bad idea. Thanks for sharing! 😊
Thanks 🙏
Hey dude are you gonna remake your c++ tutorial from 12 years ago?
Bucky Roberts from New Boston, is that you?
No, I don’t know this FRAUDDDDD