Oracle APEX: How to extract text from inside a PDF or Word Doc

แชร์
ฝัง
  • เผยแพร่เมื่อ 8 ม.ค. 2025

ความคิดเห็น • 20

  • @jeeves251
    @jeeves251 5 หลายเดือนก่อน +2

    This was super helpful. I'm passing the text from the PDF to the Cohere API and asking questions about what's in the PDF and it works like a dream. Thank you!

    • @chipbaber
      @chipbaber  4 หลายเดือนก่อน +1

      If you are on a new 23ai DB, there is a new pl/sql API that is also very useful in this regard. Check out dbms_vector_chain.utl_to_text -- docs.oracle.com/en/database/oracle/oracle-database/23/arpls/dbms_vector_chain1.html

  • @organismisimbiotici
    @organismisimbiotici 10 หลายเดือนก่อน +1

    wonderful! I've been trying to convert a PDF/BLOB to CLOB in APEX for days!! Thank you!

    • @chipbaber
      @chipbaber  9 หลายเดือนก่อน

      glad it could help

  • @kauecastelani4417
    @kauecastelani4417 ปีที่แล้ว

    at here in oracle g11, does not work. Just staying an ' - ' after sucess process, do you know why?

    • @chipbaber
      @chipbaber  ปีที่แล้ว

      What if anything do you see when you query your filtered_doc table? Or is this error slightly before that? This demo was done on a DB 21c.

  • @LiandiObermeyer
    @LiandiObermeyer 4 หลายเดือนก่อน

    Is there a way that I can do all this, but instead of inserting resumes into the table before the time, uploading them from a page in the app where it gets saved into the table? If so, how do I do this?

    • @chipbaber
      @chipbaber  4 หลายเดือนก่อน +1

      This is a video on how to make the web application to upload and save a file to the table from a web app: th-cam.com/video/f8hYQtAJ-WY/w-d-xo.html

    • @LiandiObermeyer
      @LiandiObermeyer 4 หลายเดือนก่อน

      @@chipbaber Thank you

  • @uselvan
    @uselvan ปีที่แล้ว

    Hi., Same like can we extract the image from a word/docx file.?

    • @chipbaber
      @chipbaber  ปีที่แล้ว

      I don't believe there are any native libraries in the database for this today. It can be done though in Java, found this example. gist.github.com/aspose-com-gists/7af5b641d0ab658dbddce3292649c227 So one path could be to use Oracle functions and java to consume the doc and output the images, then save the images inside the database or in object storage.

  • @asifiqbal5877
    @asifiqbal5877 2 ปีที่แล้ว

    Hello, can we upload records from oracle apex to an MDB format file?

    • @chipbaber
      @chipbaber  2 ปีที่แล้ว

      I am probably not the pro on this front with MS Access. But found this in the Oracle forums. asktom.oracle.com/pls/apex/f?p=100:11:0::::P11_QUESTION_ID:9523935800346847870

  • @luisf.rodriguezgarcia2888
    @luisf.rodriguezgarcia2888 ปีที่แล้ว

    Hi, I have done all steps and works really fine, but when I try to convert a large pdf (54 pages) it only give me a string like this SKM_C3320i23022111200 in the filtered_docs table. I'm wondering if there's any limit of size with this index ?

    • @chipbaber
      @chipbaber  ปีที่แล้ว +1

      So the filtered docs table stores the result of the index inside a CLOB. The max size of a CLOB should easily handle 54 MB. docs.oracle.com/en/database/oracle/oracle-database/19/refrn/datatype-limits.html . Couple small checks you probably tried already but just in case. After upload make sure to rebuild the index, example at bottom of markdown page resumeAdmin.Batch_Create_Filtered_Docs(); If that doesn't work and the PDF is something you can share I can take a look at it if you shoot me a link.

    • @luisf.rodriguezgarcia2888
      @luisf.rodriguezgarcia2888 ปีที่แล้ว

      @@chipbaberThanks !

    • @luisf.rodriguezgarcia2888
      @luisf.rodriguezgarcia2888 ปีที่แล้ว

      @@chipbaber Hi Chip, I sent you an email

    • @jeeves251
      @jeeves251 5 หลายเดือนก่อน

      This happened to me to. In my case it was because the PDF wasn't actually text but was a scanned image so there was nothing to read.

  • @deepakdakhore
    @deepakdakhore 2 ปีที่แล้ว

    Hi I am getting with CTX_DOC package, I am using 21.2 apex version and Database 12C
    ORA-20000: Oracle Text error:
    DRG-50857: oracle error in ctx_doc.filter
    ORA-20000: Oracle Text error:
    DRG-11207: user filter command exited with status 127

    • @chipbaber
      @chipbaber  2 ปีที่แล้ว

      Check to see if the user running ctx_doc has create table and create trigger privileges. A Text index internally creates some tables like DR$$R, DR$$I, DR$$K.To be able to create those tables, these privs are required.