Best way to do Named Entity Recognition in 2024 with GliNER and spaCy - Zero Shot NER

แชร์
ฝัง
  • เผยแพร่เมื่อ 3 พ.ค. 2024
  • GLiNER: github.com/urchade/GLiNER
    Gliner spaCy: github.com/theirstory/gliner-...
    The GLiNER repository is a generalist model for Named Entity Recognition (NER), designed to extract a wide range of entity types from text. It represents an advanced approach to recognizing various entities in text data.
    The gliner-spacy repository provides a SpaCy wrapper for GLiNER, facilitating the integration of GLiNER's advanced NER capabilities into the SpaCy environment. This wrapper supports customizable settings for processing text, such as chunk size, specific entity labels, and output style for entity recognition results.
    In this tutorial, I dive into the basics of the gliner-spacy repository, showing you how to seamlessly integrate GLiNER's robust NER capabilities with SpaCy's versatile NLP environment. Whether you're new to natural language processing or looking to enhance your projects with state-of-the-art entity recognition, this video is your go-to guide. Plus, get a clear understanding of zero-shot learning and its application in zero-shot NER. Don't forget to like, share, and subscribe for more insightful tutorials on NLP and AI technologies!
    Join this channel to get access to perks:
    / @python-programming
    If you enjoy this video, please subscribe.
    ✅Be my Patron: / wjbmattingly
    ✅PayPal: www.paypal.com/cgi-bin/webscr...
    If there's a specific video you would like to see or a tutorial series, let me know in the comments and I will try and make it.
    If you liked this video, check out www.PythonHumanities.com, where I have Coding Exercises, Lessons, on-site Python shells where you can experiment with code, and a text version of the material discussed here.
    You can follow me at:
    / wjb_mattingly
  • วิทยาศาสตร์และเทคโนโลยี

ความคิดเห็น • 33

  • @lfmtube
    @lfmtube 9 วันที่ผ่านมา

    Best video as always. Thanks!

  • @LexPodgorny
    @LexPodgorny 11 วันที่ผ่านมา

    Hi, great video!
    You've mentioned "... if you don't have training data". I am assuming that you mean that annotated data is not required, and instead the model relies on unsupervised approach.
    If this is correct, than for specialized texts it must rely on embedding training?
    Thanks!

  • @julkul99
    @julkul99 3 วันที่ผ่านมา

    this is very cool! Whats the benefit of using gliner-spacy over just using gliner by itsself?

  • @urchadezaratiana6781
    @urchadezaratiana6781 หลายเดือนก่อน +2

    gliner-spacy is awesome 💯

    • @python-programming
      @python-programming  หลายเดือนก่อน +2

      Thanks so much! The same can be said of GliNER!! 😀

    • @VenkatesanVenkat-fd4hg
      @VenkatesanVenkat-fd4hg หลายเดือนก่อน

      ​@@python-programmingWhether NER cannot be achieved using prompt Engg + LLM...? Can you educate on this..

  • @dorellazara2010
    @dorellazara2010 หลายเดือนก่อน +1

    It’s amazing

    • @python-programming
      @python-programming  หลายเดือนก่อน

      Thanks! Glad you like it! Curious how well it works for you.

  • @adilgun2775
    @adilgun2775 หลายเดือนก่อน +3

    Thanks for the video. One question, Is it possible to make a few shot in addition to zero shot with GliNER (without finetuning)

    • @python-programming
      @python-programming  หลายเดือนก่อน +1

      No problem! Great question. I have not seen an example of this. Sorry! There are already a few examples of few-shot spaCy libraries. Concicse concepts is one.

    • @adilgun2775
      @adilgun2775 หลายเดือนก่อน

      Thanks @@python-programming

  • @daviddeisadze7037
    @daviddeisadze7037 หลายเดือนก่อน +2

    Great video! What would you do to extract hard skills and soft skills from a resume and job description?
    I am thinking entity rulers from spacy and match it but I was wondering what you were thinking. Thanks!

    • @python-programming
      @python-programming  หลายเดือนก่อน

      Thanks! If you have a controlled vocab for these things then maybe a rules pipeline would work, but an ML model would likely be better since it wouls find things that are not in your list. You could also have a combination of both.

  • @JoseSanchez-xz5wt
    @JoseSanchez-xz5wt หลายเดือนก่อน +1

    Really cool! Can you make a video on how to further train the LatinCy model? I have a ton of additions to the lemma fixer custom component and I've noticed a few recurring patterns I want to fix generally

    • @python-programming
      @python-programming  หลายเดือนก่อน +1

      I can but it may be better to retrain from scratch. In these instances you can experience catastrophic forgetting. If you want to train from scratch, you could modify the original training data or add to it with your own. That said, if you simply need to adjust a component, that is entirely different. Would you mind explaining a bit more about what you want to do?

    • @JoseSanchez-xz5wt
      @JoseSanchez-xz5wt หลายเดือนก่อน +1

      Let me start by saying I'm new to this! I put together a Latin corpus of texts and I'm counting lemma frequencies. But I noticed that some verb forms are consistently off, like almost all pluperfect forms (like counting uiderat as the lemma instead of uideo). Instead of having to add a correction to the component for each verb, I wanted to see if there was a way to train the model to make it better at recognizing the lemma of verbs in pluperfect forms. Thanks for responding!@@python-programming

    • @python-programming
      @python-programming  หลายเดือนก่อน +1

      @@JoseSanchez-xz5wt ahh I see! In that case, I would reach out to Patrick directly: twitter.com/diyclassics?lang=en --- he's on Twitter as diyclassics and if you look him up on Google you can find his email as well. I don't want to put it here and have him get spam messages.

  • @ifrasaifi1124
    @ifrasaifi1124 หลายเดือนก่อน +1

    Great explanation! Can we use gliner to extract medicinal plants scientific name and their medicinal effects?

    • @python-programming
      @python-programming  หลายเดือนก่อน +2

      Thanks so much! Like most ML things, the best thing to do is try it out. Change the lagels to those exact label names and run it over a text. If you want to extract label names with spaCy, though I created bio spaCy that does precisely this.

    • @ifrasaifi1124
      @ifrasaifi1124 หลายเดือนก่อน +1

      ​@@python-programmingThank you so much, can you please share link to bio spacy?
      Additionally thank you so much for such amazing videos, they are really helpful!

    • @python-programming
      @python-programming  หลายเดือนก่อน +1

      @@ifrasaifi1124 Sure! Here it is: github.com/wjbmattingly/biospacy --- you are very welcome! I'm glad to hear you are finding my content helpful!

    • @ifrasaifi1124
      @ifrasaifi1124 หลายเดือนก่อน

      ​@@python-programming Thank you, can I also use it for relation extraction for example plant name linked to its medicinal effect?

    • @python-programming
      @python-programming  หลายเดือนก่อน +1

      @@ifrasaifi1124 That would be a separate component that does not yet exist. You want to look into entity linking and connecting the plant to a wiki_id and then connect that to a database of medicinal effects.

  • @WalkAloneLive
    @WalkAloneLive 17 วันที่ผ่านมา

    I had better results on GLiNER then on OpenAI 3.5 on zero-shot. A lot of False Positive. But at least we have what to filter later, and good is that it works very fast on low CPU needs. Still waiting for few-shot learning example, sure it will help a lot. Anyone tested domain-knowledge way of doing staff?

  • @aaroldaaroldson708
    @aaroldaaroldson708 27 วันที่ผ่านมา

    is there anything like this, but for text classification? e.g.: I have a list of labels (topics) and a list of texts. And it has to tell me what topics are mentioned in which text

  • @VenkatesanVenkat-fd4hg
    @VenkatesanVenkat-fd4hg หลายเดือนก่อน

    Whether NER cannot be achieved using prompt Engg + LLM...? Can you educate on this..