Probabilistic Record Linkage of Hospital Patients - Chris Oakman

How Fuzzy Text Search Works

Fuzzy Logic - Computerphile

Gold Coins and a Hidden Safe Found Inside the Wall! 🏛️✨

การให้อาหารแมวที่ถูกต้อง ดร.น.สพ.ปรารมภ์ #คุยกับอุ๋ย #อุ๋ยบุดด้าเบลส #แมว #จิตเวชสัตว์เลี้ยง

Haunted House 😰😨 LeoNata family #shorts

What's in a Name? Fast Fuzzy String Matching - Seth Verrinder & Kyle Putnam - Midwest.io 2015

Midwest.io

มุมมอง 23 867

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 14 พ.ย. 2024

ความคิดเห็น • 17

@mtbmaryland404 5 ปีที่แล้ว ⁺⁶
Did they publish their implementation (on GitHub, etc)? Very interested in seeing the actual implementation.
@ogamedia1 3 ปีที่แล้ว ⁺¹
Thanks for this. The exact same problem I’m trying to resolve.
@junaid621 6 ปีที่แล้ว ⁺⁶
Any python implementation?
@topsiterings 4 ปีที่แล้ว ⁺¹
Nice work!
@shashwatkaundinya127 9 ปีที่แล้ว ⁺³
I work with Huge Consumer Data (200 ~300 Million records) and am using JWT for name matching and it takes us a lot of time to process the data even after string normalization.
How do you convert String into Bits WITHOUT iterating over each character in the string and mapping it against alphabets A-Z in Java.
If you iterate over each character of both the Strings (both names) to map it, wouldn't that be time-consuming?
Please, tell me how did you manage it.
@sethverrinder3547 9 ปีที่แล้ว ⁺²
+Shashwat Kaundinya we did iterate over each character to create the bitmaps. There are a couple of reasons why this worked for us:
1. The bitmaps get used many times so the cost of calculating them gets spread out over multiple searches
2. They're pretty quick to create. Our 64-bit implementation creates about 2.3 million bitmaps per second so it could encode your full set in a couple of minutes. You could speed this up with a lookup table and if you restrict the character range to A-Z then you could use the character values directly: something like
bitmap = bitmap | (1L
@shashwatkaundinya127 9 ปีที่แล้ว
+Kyle Putnam Thanks for the reply.
I work with data from Banks and Insurance companies and they provide us the data in text files, so pre-processing it and remodeling it should be the way to go.
I'll try and compute those Bitmaps on each record and store them in a new file with previous data.
In my case, I have to work with not just names but addresses and IDs and phone# and emails and twitter accounts too.
I will sit with a team soon to decide how to best use your technique for our case.
Thanks man
@shashwatkaundinya127 9 ปีที่แล้ว
+Seth Verrinder Thanks for the reply.
@Basistech-page 6 ปีที่แล้ว ⁺²
Rosette name matching (www.rosette.com/capability/name-matching/#tech-specs) solves phonetic similarity, transliteration, nicknames, missing spaces or hyphens, titles and honorifics, truncated name components, missing name components, out-of-order name components, initials, names split inconsistently across database fields, same name in multiple languages, semantically similar names, and semantically similar names across language.
@mohdkhairi4552 8 ปีที่แล้ว
Hi,
Congratulation. This is terrific.
I got a question. How you deal with spacing and symbols character?.
@TheBjjninja 5 ปีที่แล้ว
Simple preprocessing. Usually step 1
@rbettsx 7 ปีที่แล้ว ⁺⁴
As I watched this, I couldn't help thinking the problem was being solved in the wrong domain! Who cares about the lengths and vagaries of English spelling, especially transliterations of foreign alphabets? To improve quality (reduce false positives and negatives,) I was thinking, don't we want to match on the sounds of these names? That got me looking up phonetic algorithms.
I wonder whether KP & SV considered preprocessing the dataset, indexing it with something like one of the Metaphone algorithms? They then could have performed their bitwise matching on the reduced dataset generated by exact matches of the short, standard-length keys generated by the phonetics. Maybe they weren't allowed to..
@quas0rx 7 ปีที่แล้ว
Robin Betts exactly my thoughts too and lucene /solr provides support for foreign languages too
5 ปีที่แล้ว ⁺¹
@First Last At a high level, I think what Karthy referencing is called Soundex (en.wikipedia.org/wiki/Soundex) but it's not converting words to a sound wave but instead trying to match on the consonants (skip vowel unless the word begins with it and replacing them some digit(s) as the wiki explains). Comparing wave forms wouldn't be efficient with the amount of data the wave form of a word vs what the string representation would be.
@Kenlauderdale123 6 ปีที่แล้ว
how did yiu build the bit masks for each string?
@ashwin58336 6 ปีที่แล้ว
what if the names had Mr., Mrs,. Sgt, Dr or any other salutations ? how will this be tackled?

ต่อไป

เล่นอัตโนมัติ

Probabilistic Record Linkage of Hospital Patients - Chris Oakman

Probabilistic Record Linkage of Hospital Patients - Chris Oakman

How Fuzzy Text Search Works

How Fuzzy Text Search Works

Fuzzy Logic - Computerphile

Fuzzy Logic - Computerphile

Gold Coins and a Hidden Safe Found Inside the Wall! 🏛️✨

Gold Coins and a Hidden Safe Found Inside the Wall! 🏛️✨

การให้อาหารแมวที่ถูกต้อง ดร.น.สพ.ปรารมภ์ #คุยกับอุ๋ย #อุ๋ยบุดด้าเบลส #แมว #จิตเวชสัตว์เลี้ยง

การให้อาหารแมวที่ถูกต้อง ดร.น.สพ.ปรารมภ์ #คุยกับอุ๋ย #อุ๋ยบุดด้าเบลส #แมว #จิตเวชสัตว์เลี้ยง

Haunted House 😰😨 LeoNata family #shorts

Haunted House 😰😨 LeoNata family #shorts

"ทนายตั้ม" ช็อก! "ทนายสายหยุด" ขอไม่ทำคดี | ข่าวมื้อเย็น 12/11/67

"ทนายตั้ม" ช็อก! "ทนายสายหยุด" ขอไม่ทำคดี | ข่าวมื้อเย็น 12/11/67

Large Scale Fuzzy Name Matching (Zhe Sun & Daniel van der Ende)

Large Scale Fuzzy Name Matching (Zhe Sun & Daniel van der Ende)

Jiaqi Liu Fuzzy Search Algorithms How and When to Use Them PyCon 2017

Jiaqi Liu Fuzzy Search Algorithms How and When to Use Them PyCon 2017

Exploring NLP Fuzzy Matching Algorithms

Exploring NLP Fuzzy Matching Algorithms

Testing the hard stuff and staying sane - John Hughes - Midwest.io 2016

Testing the hard stuff and staying sane - John Hughes - Midwest.io 2016

Mike Mull: The Art and Science of Data Matching

Mike Mull: The Art and Science of Data Matching

Functional Programming in 40 Minutes • Russ Olsen • GOTO 2018

Functional Programming in 40 Minutes • Russ Olsen • GOTO 2018

Fuzzy Matching in R (Example) | Approximate String, Name & Text Search | adist(), agrep() & amatch()

Fuzzy Matching in R (Example) | Approximate String, Name & Text Search | adist(), agrep() & amatch()

Cheuk Ting Ho - Fuzzy Matching Smart Way of Finding Similar Names Using Fuzzywuzzy

Cheuk Ting Ho - Fuzzy Matching Smart Way of Finding Similar Names Using Fuzzywuzzy

Reliable Concurrency Without the Actor Model - Andrew Rademacher - Midwest.io 2015

Reliable Concurrency Without the Actor Model - Andrew Rademacher - Midwest.io 2015

Make A List Eat With Alek EP.12 ‘อาเล็ก’ พา ‘หลิง - ออม’ ตะลุยกินทั่วห้างฉลองสิ้นปีแบบจัดเต็ม

Make A List Eat With Alek EP.12 ‘อาเล็ก’ พา ‘หลิง - ออม’ ตะลุยกินทั่วห้างฉลองสิ้นปีแบบจัดเต็ม

fellow fellow - พรุ่งนี้ไม่มีใครรู้ feat. INK WARUNTORN [OFFICIAL MV]

fellow fellow - พรุ่งนี้ไม่มีใครรู้ feat. INK WARUNTORN [OFFICIAL MV]

Does the rabbit have so many children now?#Short #Officer Rabbit #angel

Does the rabbit have so many children now?#Short #Officer Rabbit #angel

รถถัง จิตรเมืองนนท์ vs จาค็อบ สมิธ ONE 169 | 9 พ.ย.67

รถถัง จิตรเมืองนนท์ vs จาค็อบ สมิธ ONE 169 | 9 พ.ย.67

Live!🔴 ทีมชาติไทย VS ทีมชาติเลบานอน เชียร์สดฟุตบอลอุ่นเครื่อง FIFA DAY | 14 พ.ย. 67 #ทีมชาติไทย

Live!🔴 ทีมชาติไทย VS ทีมชาติเลบานอน เชียร์สดฟุตบอลอุ่นเครื่อง FIFA DAY | 14 พ.ย. 67 #ทีมชาติไทย

台上一分钟，台下十年功（内容来源网络@原声社·非遗男团）#非遗文化 #国粹 #重庆 #国风 #杂技 #shorts

台上一分钟，台下十年功（内容来源网络@原声社·非遗男团）#非遗文化 #国粹 #重庆 #国风 #杂技 #shorts

BABYMONSTER (베이비몬스터) - CLIK CLAK @인기가요 inkigayo 20241110

BABYMONSTER (베이비몬스터) - CLIK CLAK @인기가요 inkigayo 20241110

ENHYPEN (엔하이픈) 'No Doubt' Official MV

ENHYPEN (엔하이픈) 'No Doubt' Official MV