Summarize Papers with Python and ChatGPT

แชร์
ฝัง
  • เผยแพร่เมื่อ 5 ม.ค. 2025

ความคิดเห็น • 49

  • @Nightsbringer1
    @Nightsbringer1 ปีที่แล้ว +7

    For those wanting to work off this code, you may wish to change the line "open(pdf_summary_file, "w+",)" to "with open(pdf_summary_file, "w+", encoding='utf-8')". I have found that you get better results if you are making it work with PDF files with characters you wouldn't normally find on your keyboard (in my case, there's lots of Greek letters as I am making it summarise mathematical text).
    Also, don't forget to add openai.api_key = "your openai API Key" to make it work if you havent set your key as an environment variable!!

  • @automatalearninglab
    @automatalearninglab  ปีที่แล้ว +6

    Sup guys! Should have done this sooner but here is a notebook with the code that summarizes a paper from an url: github.com/EnkrateiaLucca/automating_work_research/blob/main/paper_search_with_chatgpt.ipynb
    Thanks for watching! :)

    • @mocanada304
      @mocanada304 ปีที่แล้ว +1

      Will it cost extra if we are using our chatgpt API? or is it included in the ChatGPT Plus subscription?
      Thank you!

    • @automatalearninglab
      @automatalearninglab  ปีที่แล้ว +1

      @@mocanada304 Sup Mo, I think it might cost extra, I don't think the subscription covers these API calls.... :(

  • @beatrizbelbut4862
    @beatrizbelbut4862 ปีที่แล้ว +1

    Best channel ever 🥰

  • @sant0sch
    @sant0sch 10 หลายเดือนก่อน +1

    Hey, thank you for that tutorial.
    I am quite fresh into that topic and wonder - wouldnt it make more sense, to first summarize the content without using AI and only after that feeding it to AI to refine it?
    Asking, since it can become quite expencive and maybe there is a way, to save tokens?
    At the moment I am experimenting around.
    But thank you for the tutorial! :)

    • @automatalearninglab
      @automatalearninglab  10 หลายเดือนก่อน

      I think the point here would be to have a tool to quickly get ths gist out of some content. What you're describing seems more like a good study methodology which although applicable, perhaps slightly outside the scope of the tool itself. But thanks for the feedback and for watching! B) cheers!

  • @_salvax_
    @_salvax_ ปีที่แล้ว +1

    great video tutorial. It works smoothly ! Thank you so much.
    I am a Python newbie... I can't find the right way to modify the first line when the pdf file is saved locally. Any suggestion ? Thanks a lot

    • @automatalearninglab
      @automatalearninglab  ปีที่แล้ว +1

      Sup Salvino! Thanks! I think you might have the wrong path, just make sure that the path points to your file, in my example I had it on a folder called pdfs

  • @jamescai6998
    @jamescai6998 ปีที่แล้ว +1

    Thanks for this great video! I am a complete newbie to Python. Can you kindly share what tool did you use to run this?

    • @automatalearninglab
      @automatalearninglab  ปีที่แล้ว

      Sure, I used vscde, jupyter and notebooks, running python 3.9 :)

  • @plasius2398
    @plasius2398 ปีที่แล้ว +1

    Legend.

  • @louisvandame3827
    @louisvandame3827 ปีที่แล้ว +1

    With the annoncement of GPT4 turbo and 125k tokens do you think it would be possible to pass it just as one single message ?
    To do this would this code work ?
    " for page_num in range(len(pdf_reader.pages)):

    page_text = page_text + pdf_reader.pages[page_num].extract_text().lower() "

  • @travisbickle9745
    @travisbickle9745 ปีที่แล้ว +1

    hey thanks for the code, it works like a charm when also incorporating Nightsbringer1's suggestions. However, one thing I wonder is to what extend the results are influenced by breaking up the text into pages and then letting ChatGDP write summaries for each page. For example, what happens when a crucial sentence starts on one page and then ends on the next one? This will effectively split the sentence into two parts and make it potentially intelligble. Depending on whether the information provided in this sentence entails a key aspect, this might lead to faulty summaries, or am I mistaken here? thanks!

    • @automatalearninglab
      @automatalearninglab  ปีที่แล้ว

      No You are not! You make a great point! It’s indeed one of the weakness of this approach. In an upcoming post I will share about using LangChain and embeddings in order to get better representations of the text. For summaries I think ideally we should have a better representation of the context that does not get lost in pages like you said! Thanks for watching! Cheers!

    • @travisbickle9745
      @travisbickle9745 ปีที่แล้ว +1

      @@automatalearninglab thanks and great! looking forward to that post! btw, in your original code it would also be nice to print a page counter. I just like to be reaffirmed as the code iterates through the pages :)

    • @automatalearninglab
      @automatalearninglab  ปีที่แล้ว +1

      @@travisbickle9745 nice thanks!

    • @travisbickle9745
      @travisbickle9745 ปีที่แล้ว +1

      @@automatalearninglab maybe one other question: there is currently no possibility to have chatgdp summarize the entire text with only one summary, right?

    • @automatalearninglab
      @automatalearninglab  ปีที่แล้ว

      @@travisbickle9745 There is a limitation of tokens with respect to big summaries like that. However I already recorded a new video on how to use LangChain to get around some of those limitations. Essentially the key thing is using embeddings and vectorization.

  • @rayyansyed3703
    @rayyansyed3703 ปีที่แล้ว +1

    Sir i am assigned this project by my teamleader , i am a new python biggener , can you please just text out the prequesites for this project , likewise i only know python i dont know jupyter and notebook , what does this does ? And if i want to do similar project what should i learn after python and how should i proceed ?

    • @automatalearninglab
      @automatalearninglab  ปีที่แล้ว

      HI! Just check out the github repo in the link in the description, it should have all you need to get started! :)

  • @ayo4757
    @ayo4757 ปีที่แล้ว +1

    cost money ask to 3.5 gpt turbo and use API? or all this is free?

    • @automatalearninglab
      @automatalearninglab  ปีที่แล้ว +1

      With the CHatGPT API, there is the cost of the API per 1000 tokens, yes.

    • @ayo4757
      @ayo4757 ปีที่แล้ว +1

      @@automatalearninglab i have GPT plus, do you know if i can Get the Api key of my 3.5 turbo ? o i need to pay extra?

    • @automatalearninglab
      @automatalearninglab  ปีที่แล้ว +1

      @@ayo4757 I think they are separate yes, the API is one thing, the subscription another.

  • @muskanrath7125
    @muskanrath7125 ปีที่แล้ว +1

    Can you please post videos on how to summarise papers using hugging face transformer or the hints regarding the same? Creating a language model , evaluating the model, fine tuning the model, Summarising excluding the abstract part and comparing the summary generated with Abstract of the paper. I am stuck. Please give me appropriate directions which would help me proceed.

    • @automatalearninglab
      @automatalearninglab  ปีที่แล้ว

      Sup Muskan! YEs sure! I will soon post another on on summarization and I will try to include your requests. I think in terms of evaluation though, you already have metrics used in benchmark papers that you can check out like ROUGE (Recall-Oriented Understudy for Gisting Evaluation), BLEU (Bilingual Evaluation Understudy), METEOR (Metric for Evaluation of Translation with Explicit ORdering), and CIDEr (Consensus-based Image Description Evaluation). Cheers! :)

  • @richardkelly9740
    @richardkelly9740 ปีที่แล้ว +1

    This is incredible! \
    Is there a way of changing the summarization threshold?
    And when I ran this code, it summarized it into a .txt file, and not into a .pdf, is that correct? And is there a way of converting the .txt into a .pdf.

    • @automatalearninglab
      @automatalearninglab  ปีที่แล้ว +1

      You can convert the .txt to a .pdf by using a library called: fpdf . The thing with the threshold I am not sure what you mean, you can certainly tweak the temperature parameter to suit your needs for a more creative (higher number close to 1) or more precise (smaller number close to 0) for the LLM API call.

    • @richardkelly9740
      @richardkelly9740 ปีที่แล้ว +1

      @@automatalearninglab What I mean by threshold is the length of summarization is quite long. Would I need to defien that in the 'role'/'content' part of the code?

    • @automatalearninglab
      @automatalearninglab  ปีที่แล้ว +1

      @@richardkelly9740 set the max number of tokens on the API call

    • @richardkelly9740
      @richardkelly9740 ปีที่แล้ว +1

      @@automatalearninglab Thank you!!!

    • @automatalearninglab
      @automatalearninglab  ปีที่แล้ว

      You’re very welcome! :)@@richardkelly9740