If you were having a hard time understanding TF-IDF check out: moz.com/blog/inverse-document-frequency-and-the-importance-of-uniqueness "Document frequency measures commonness, and we prefer to measure rareness. The classic way that this is done is with a formula that looks like this: IDFj = log(n/TFj)" For each term we are looking at, we take the total number of documents in the document set (n) and divide it by the number of documents containing our term (TFj). This gives us more of a measure of rareness. However, we don't want the resulting calculation to say that the word "mobilegeddon" is 1,000 times more important in distinguishing a document than the word "boat," as that is too big of a scaling factor. This is the reason we take the Log Base 10 of the result, to dampen that calculation. For those of you who are not mathematicians, you can loosely think of the Log Base 10 of a number as being a count of the number of zeros - i.e., the Log Base 10 of 1,000,000 is 6, and the log base 10 of 1,000 is 3. So instead of saying that the word "mobilegeddon" is 1,000 times more important, this type of calculation suggests it's three times more important, which is more in line with what makes sense from a search engine perspective."
Amazing, loved the way you started with sql and making a use case and then going to details of ES. Do you have anything similar for sql vs NoSQL dbs.. ???
I wonder if the presenter is unaware of full-text search capabilities in RDBMSes, or just ignoring it for effect? The initial example (10:18) of building an SQL query with con-/disjunctions is cute, but also irrelevant because IF you've decided to use MySQL/innodb for search -- not saying it's good idea -- you would surely use the most powerful tool for it, which is to index text as FULLTEXT and use MATCH AGAINST queries, not LIKE-style queries. This would give you stemming and relevance scoring, basically fixing the problems of the example as given. That said, you should never use MySQL for anything.
I think you have to decide what works best for your specific situation. I think the initial example was a pretty good example. While MySQL might work well in certain situations, it might not work well when you need to perform fulltext search on a couple of hundred-thousands or even millions of documents. Not even mentioning joined tables. RDBMS is great in storing structured data, Elasticsearch is great for search. I even have used both. I used PostgreSQL for storing my structured data and I indexed specific data (the columns which I needed for fulltext search) in Elasticsearch. Elasticsearch returned the results which included the id's (primary keys) from PostgreSQL, which I could use to select that specific record including the joined data from the database.
For some data sets and query loads, MySQL/MariaDB/innodb would be completely inadequate, where Elasticsearch would shine, thanks to its ability to widely distribute and replicate data and query load. As long as Elasticsearch (and similar products) exists, and where the use case calls for these features, MySQL/Mariadb is simply irrelevant.
Hello John, I have been working on an App that is connected to firebase. I want to implement the elastic search to retrieve matching values in my database. Can i take a moment of your time and help me out?
hi all, gr8 video! can someone tell me how to use the elasticsearch for a seach api. (drf + elasticsearch). PS: i cant use any high level api like the elasticsearch-dsl . Many thanks:)
This video is gold hidden amongst rocks.
Great presentation and explanation
Excellent introduction! Entertaining and eloquent speaker.
Amazing unfolding of the beautiful tech! The best material on ES I have viewed by far! Thank you Sir!
Great talk, but always repeat the questions. We're unaware what question is being answered.
starts at 11:54
If you were having a hard time understanding TF-IDF check out: moz.com/blog/inverse-document-frequency-and-the-importance-of-uniqueness
"Document frequency measures commonness, and we prefer to measure rareness. The classic way that this is done is with a formula that looks like this: IDFj = log(n/TFj)"
For each term we are looking at, we take the total number of documents in the document set (n) and divide it by the number of documents containing our term (TFj). This gives us more of a measure of rareness. However, we don't want the resulting calculation to say that the word "mobilegeddon" is 1,000 times more important in distinguishing a document than the word "boat," as that is too big of a scaling factor.
This is the reason we take the Log Base 10 of the result, to dampen that calculation. For those of you who are not mathematicians, you can loosely think of the Log Base 10 of a number as being a count of the number of zeros - i.e., the Log Base 10 of 1,000,000 is 6, and the log base 10 of 1,000 is 3. So instead of saying that the word "mobilegeddon" is 1,000 times more important, this type of calculation suggests it's three times more important, which is more in line with what makes sense from a search engine perspective."
Actual lecture starts at 11:45.
Fun talk , but if want you want to get to the core subject just start @13:00
thx
best showcase and explanation - thx John
Amazing, loved the way you started with sql and making a use case and then going to details of ES.
Do you have anything similar for sql vs NoSQL dbs.. ???
Just what I was looking for, thanks!
Damn! This video is "under-viewed"!
Excellent Explanation.
Cat farming is a good example
I wonder if the presenter is unaware of full-text search capabilities in RDBMSes, or just ignoring it for effect?
The initial example (10:18) of building an SQL query with con-/disjunctions is cute, but also irrelevant because IF you've decided to use MySQL/innodb for search -- not saying it's good idea -- you would surely use the most powerful tool for it, which is to index text as FULLTEXT and use MATCH AGAINST queries, not LIKE-style queries.
This would give you stemming and relevance scoring, basically fixing the problems of the example as given.
That said, you should never use MySQL for anything.
I think he was just trying to give a quick example on why not to use your database for full text search purposes.
I think you have to decide what works best for your specific situation. I think the initial example was a pretty good example. While MySQL might work well in certain situations, it might not work well when you need to perform fulltext search on a couple of hundred-thousands or even millions of documents. Not even mentioning joined tables.
RDBMS is great in storing structured data, Elasticsearch is great for search. I even have used both. I used PostgreSQL for storing my structured data and I indexed specific data (the columns which I needed for fulltext search) in Elasticsearch. Elasticsearch returned the results which included the id's (primary keys) from PostgreSQL, which I could use to select that specific record including the joined data from the database.
Why should we not use MySQL for anything?
Because it is his/her opinion.
For some data sets and query loads, MySQL/MariaDB/innodb would be completely inadequate, where Elasticsearch would shine, thanks to its ability to widely distribute and replicate data and query load. As long as Elasticsearch (and similar products) exists, and where the use case calls for these features, MySQL/Mariadb is simply irrelevant.
Hello John, I have been working on an App that is connected to firebase. I want to implement the elastic search to retrieve matching values in my database. Can i take a moment of your time and help me out?
Bro keep rocking
hi all, gr8 video! can someone tell me how to use the elasticsearch for a seach api. (drf + elasticsearch). PS: i cant use any high level api like the elasticsearch-dsl . Many thanks:)
so we just change the horrible part on the where part for the horrible part iin everything else part with lot of brackets and more .. {}
Python 2 ://