Types of PDF - Computerphile

แชร์
ฝัง
  • เผยแพร่เมื่อ 17 มิ.ย. 2021
  • "Just send me a PDF!" - but what kind of PDF? As Professor Brailsford explains, PDF is simply a wrapper which can contain a variety of joys!
    / computerphile
    / computer_phile
    This video was filmed and edited by Sean Riley.
    Computer Science at the University of Nottingham: bit.ly/nottscomputer
    Computerphile is a sister project to Brady Haran's Numberphile. More at www.bradyharan.com

ความคิดเห็น • 396

  • @isaac10231
    @isaac10231 3 ปีที่แล้ว +798

    Life goal - finding something to be as passionate in life as this man is about crispy text.

    • @skuzzbunny
      @skuzzbunny 3 ปีที่แล้ว +15

      crispy text is the best!!!!!D

    • @unlokia
      @unlokia 3 ปีที่แล้ว +21

      CRISP, *_not_* "crispy". This is a silly error that seems to be propagating net-wide +as usual we can blame the yanks!!+
      A brand of creme donuts' products are named "crispy", images and text are *CRISP!!*

    • @CJT3X
      @CJT3X 3 ปีที่แล้ว +8

      @@unlokia no need to be so crispy ‘bout it

    • @DryPaperHammerBro
      @DryPaperHammerBro 3 ปีที่แล้ว +1

      @@skuzzbunny {o{obi,l. K.l k I 98xd

    • @kokoinmars
      @kokoinmars 3 ปีที่แล้ว

      Crispy text is nothing to scoff about.

  • @martinbean
    @martinbean 3 ปีที่แล้ว +455

    Imagine saying something as innocuous as “I’ll send you a PDF” to this guy and then getting a 2-hour lecture in response…

    • @FriedEgg101
      @FriedEgg101 3 ปีที่แล้ว +20

      Maybe you could cut the lecture short by following up with "it'll be PDF Normal".

    • @erwinmulder1338
      @erwinmulder1338 3 ปีที่แล้ว +17

      Professor Brailsford can lecture me all day.

    • @michaeldamolsen
      @michaeldamolsen 3 ปีที่แล้ว +7

      That would be the best day of the month for sure!

    • @swiftfox3461
      @swiftfox3461 3 ปีที่แล้ว +4

      I'd listen closely and turn off my phone to make sure I didn't miss anything.

    • @amicaaranearum
      @amicaaranearum 3 ปีที่แล้ว +6

      Professor Brailsford definitely made this video in response to receiving a low-quality PDF scanned from a photocopy.

  • @StraightOuttaJarhois
    @StraightOuttaJarhois 3 ปีที่แล้ว +662

    What PDF says to me isn't quality, but uniformity, as in it'll look the same no matter what device or software you're using to view it, even if it's a sheet of paper instead of a screen. (I know this isn't actually the case, but as I understand it, it's how it _should_ work.) So when I get a PDF, I trust that each line and character is exactly where it's supposed to be, and not shifted due to text reflow or different fonts or whatever. From that perspective it doesn't matter if it's using razor sharp vectors or blocky bitmaps.

    • @max15half
      @max15half 3 ปีที่แล้ว +54

      Well, you could be reasonably sure that a bitmap will not misplace your lines and characters.

    • @StraightOuttaJarhois
      @StraightOuttaJarhois 3 ปีที่แล้ว +18

      @@max15half Sure, but there are other qualities of bitmaps that make them less than ideal for text. PDF has the same advantages as other document formats while feeling more trustworthy than, say, a .doc or a .html, even if they're not always used to the fullest.

    • @Platoqp
      @Platoqp 3 ปีที่แล้ว +6

      I think that is how it started too. That said, if a professor asks for a PDF, it is a decent implication for some layout

    • @hirmuolio
      @hirmuolio 3 ปีที่แล้ว +11

      @@max15half But how are those bitmaps viewed by the receiver?
      Numeric ordered images but reader tries to open them in alphabetical order, size order or age order (whatever is the default on their image viewer).
      Varying image sizes and the image viewer scales them in stupid ways.
      PDF is still good system even if the content is just bitmaps. It keeps them all in correct scale and order.

    • @ccreutzig
      @ccreutzig 3 ปีที่แล้ว +8

      @@hammerhals These days, not everything in PDF is "statically linked." Many PDF viewers, including Acrobat, have a JavaScript engine, and for the modern type of PDF forms, where you may be able to add table rows etc., you kind of need that.
      That in turn means some people embed code in their PDF to, say, render animations etc.

  • @sedawk
    @sedawk 3 ปีที่แล้ว +280

    “I asked someone to send me a PDF and all I got was this lousy bit map” - would make a great t-shirt.

    • @SomethingUnreal
      @SomethingUnreal 3 ปีที่แล้ว +30

      Complete with blocky JPEG artifacts all around the text, of course!

    • @frankharr9466
      @frankharr9466 3 ปีที่แล้ว +5

      Don't tempt me.

    • @naughtiusmaximus789
      @naughtiusmaximus789 2 ปีที่แล้ว

      Grand Theft Auto : Vice City 100% completion reward

  • @StevenSeiller
    @StevenSeiller 3 ปีที่แล้ว +86

    🤓me before video: "Finally time to learn the differences between PDF/X, PDF/E, and PDF/A!"
    🤷‍♂️me after video: "Where is PDF(FTG), PDF(I), or PDF(I+HT) in my Adobe Save As...???"

  • @greatquux
    @greatquux 3 ปีที่แล้ว +182

    Brailsford’s eyesight is better than mine, he can use xterm at the default font size!

  • @thuokagiri5550
    @thuokagiri5550 3 ปีที่แล้ว +89

    How much we missed prof Brailsford

  • @mastertacosmith
    @mastertacosmith 3 ปีที่แล้ว +85

    This man needs a 40” ultrawide so he can truly enjoy a good typeface at scale

  • @IIARROWS
    @IIARROWS 3 ปีที่แล้ว +245

    I got worse: an Excel sheet with a picture pasted inside it.
    And not a picture of a table, a screenshot of the application I was working on.

    • @olik136
      @olik136 3 ปีที่แล้ว +16

      my architectural software has a library folder with a drawing file that contains a screenshot of that library folder telling you that certain files are hidden and can only be found with windows explorer...

    • @recklessroges
      @recklessroges 3 ปีที่แล้ว +2

      I'll send you a screen-shot of that in an HTML email ;-) /s

    • @david.mcmahan
      @david.mcmahan 3 ปีที่แล้ว +12

      I once had a client take a screenshot of their full desktop (with an opened PDF among many windows), paste it into a Word doc., crop it down to just a signature graphic, and then scale it back up because the signature was too small. This was their method of "extracting" the signature image from a PDF.
      Fair enough, but it was because they wanted the version of the signature we had already cleaned up to look better in print.

    • @JNCressey
      @JNCressey 3 ปีที่แล้ว +2

      @@david.mcmahan, can whoever they give the Word document to tell Word to show the full image to see everything they had open in the screenshot?

    • @david.mcmahan
      @david.mcmahan 3 ปีที่แล้ว +5

      @@JNCressey Yes, I could see everything they had opened on the screen. There was nothing bad, but it could have been a security incident.

  • @ToSMaster12345
    @ToSMaster12345 3 ปีที่แล้ว +48

    I was smiling in total bliss throughout the video! Finally I feel understood!
    This is the reason why I write all my documents in LaTeX and using vector images for figures that have embedded text! So that even the scalebar and axis labels in my plots can be selected or searched via text!
    Reject Bitmap! Embrace PDF-FTG! :D

    • @carlosmspk
      @carlosmspk ปีที่แล้ว +2

      I mean, anyone wtih academic background would understand you

  • @mikefochtman7164
    @mikefochtman7164 3 ปีที่แล้ว +15

    Reminded of a similar issue we had with old mechanical, piping, and electrical drawings, the kind that were literally 'blueprints'. They had been photographed onto microfische and the originals worn out/lost. Taking the microfische cards and having them scanned (causing even more loss of quality).
    Then a team of graphics artists would import the scanned image as 'background' into a modern drafting tool and literally 'trace' over each marking on the original. This basically re-drew the drawings using the scanned background image as the template. The final step was to 'hide' the background and voila! A modern, vector drawing that was searchable and could be manipulated with modern tools. If anyone suspected a mistake in the redrawing, we would 'unhide' the background to look at the scanned image, or even go back to the microfische (we kept a 30-year-old viewer on hand).
    I forget how much that cost, but it was about 3 graphics artists working over a year to do several hundred drawings. :(

  • @1337Unlucky
    @1337Unlucky 3 ปีที่แล้ว +64

    He clearly has strong views on PDFs, it's funny because it reminds me of me but explaining formats for photography and how to preserve quality. God i hate when they send photos via social media without using .zip or .rar and all the photos gets ultra compressed.
    It's not only about photos and not only about PDFs, I understand the man, it's about PRESERVATION. The world needs to understand better formats and ways to preserve content. I just love this man.

    • @ZaneDaMagicPufferDragon
      @ZaneDaMagicPufferDragon 3 ปีที่แล้ว +3

      💯 Preservation!!! I’m a Preservationist At Heart ❤️😉

    • @LordMegatherium
      @LordMegatherium 3 ปีที่แล้ว +6

      If it's about preservation then rar should be out of the picture because it's a closed format. It's unlikely that we won't be able to open them in 50+ years especially since we have a libre decompression implementation but the point still stands.

    • @Entertainment-
      @Entertainment- ปีที่แล้ว

      That's why I love Telegram, it does the compression too, but it also allows you to send pictures or any file for that matter in it's original size

  • @nikolayrayanov2895
    @nikolayrayanov2895 3 ปีที่แล้ว +9

    This is gold. I've tried to explain to people at work about different types of PDFs for years.

  • @jlivewell
    @jlivewell 3 ปีที่แล้ว +17

    Every time I watch a video by Dr. Brailsford, Phd, I add a new life regret …. That I didn’t meet him when I was 17 and learn everything from him.

    • @jackkraken3888
      @jackkraken3888 2 ปีที่แล้ว

      With someone like him you can never learn everything.

  • @noferblatz
    @noferblatz 3 ปีที่แล้ว +4

    This professor is positively the best you feature. His enthusiasm and his ability to explain complex technical concepts in a simple way is unmatched.

  • @drskelebone
    @drskelebone 3 ปีที่แล้ว +8

    I'm in a completely different field, and when the Professor states "if you want a straight line, you just say Line()" he is 100% talking to my soul and speaking the truth I have wanted to shout into so many faces.
    ty!

  • @Sam-th4jl
    @Sam-th4jl 3 ปีที่แล้ว +1

    i think i could listen to him talk about literally anything and find it interesting just because of his delivery

  • @TheAstronomyDude
    @TheAstronomyDude 3 ปีที่แล้ว +31

    How does post office OCR work? Sorting centers read the address off an envelope in a fraction of a second and they've been doing it for decades; long before Adobe.

    • @666Tomato666
      @666Tomato666 3 ปีที่แล้ว +32

      fundamentally the same technology, but they have the benefit that the address is highly redundant; can't read the full postcode? check the city and street name

    • @bluedeath996
      @bluedeath996 3 ปีที่แล้ว +15

      Combined with a very standardised way to format addresses. There is also a "lost letter" centre where a person decodes things the OCR can't read, but newer tech is better at the job.

    • @the_lenny1
      @the_lenny1 3 ปีที่แล้ว +2

      @@666Tomato666 yeah, and on top of that the most important information is the postcode, which is only numbers.

  • @m47h4r
    @m47h4r 2 ปีที่แล้ว +1

    This was a joy to watch! I respect people like him very much. Being genuinely interested in something and actually putting the time in to learn about its ins and outs. Never mind the fact that he uses Linux with a bunch of open terminals, that's just the cherry on top!

  • @deansundquist9601
    @deansundquist9601 3 ปีที่แล้ว

    The strive for excellence in typesetting is very noble. As always, thanks for the wonderful content Prof. Brailsford.

  • @balmar3
    @balmar3 3 ปีที่แล้ว +10

    Yesss! Professor is using Alpine, one of the best emailers out there. You should make some videos on the awesome power of terminal-based utilities.

  • @kasamikona
    @kasamikona 2 ปีที่แล้ว +3

    Prof Brailsford you're a very brave man pronouncing PNG as "ping" around these parts...

  • @YingwuUsagiri
    @YingwuUsagiri 3 ปีที่แล้ว +16

    As someone in an administrative job when someone says send me a PDF they mean "any quality yet not easily edited". Invoices for example are never allowed to be easily editable like Word or Excel (and yes that happens often enough). If they want infinitely scalable they'll ask for a Vector and if they want something that's super sharp made in InDesign etc. they'll ask for an INDD. In my almost decade of working in administrations PDF just means can't be edited (easily, because I am very well aware that you still can somehow).

    • @Starguy256
      @Starguy256 3 ปีที่แล้ว +1

      I edit PDFs every day in my work. Sometimes our software prints the wrong thing and instead of going in and trying to fix it, just edit it on the PDF before you send it. As long as it's FTG (as anything not produced by a photocopier should be) you just hit "Edit PDF" in Acrobat.

    • @lawrencedoliveiro9104
      @lawrencedoliveiro9104 2 ปีที่แล้ว

      The irony is that using vector graphics and actual text objects make it easier to edit the PDF file. The hardest type to edit is the one where every page is a bitmap.

  • @RhinoBlindado
    @RhinoBlindado 3 ปีที่แล้ว +3

    Prof B looking quite dapper today. Loved the video!

  • @PhilReynoldsLondonGeek
    @PhilReynoldsLondonGeek 3 ปีที่แล้ว +55

    The only real *problem* with PDF is that many organisations provide you with their forms as images. If they could be done as proper forms it would be far easier to actually use them.

    • @turpialito
      @turpialito 3 ปีที่แล้ว +14

      But isn't it that it's not actually a PDF problem, but rather people not using the proper PDF generator; in this case Adobe Forms (which AFAIR is bundled with Acrobat)?

    • @ophello
      @ophello 2 ปีที่แล้ว +2

      This isn’t a problem with PDF. It’s a problem with organizations.

  • @JNCressey
    @JNCressey 3 ปีที่แล้ว +19

    Some interesting wierd things I've encountered with PDFs:
    1. I remember some time last year I copied a JPEG out of a PDF container and found it had a slightly different format than regular JPEGs. I think normal JPEGs have the word "JFIF" at the beginning of the file but I think this had something else maybe "ADOBE" through I don't exactly remember, could have been a different word.
    2. Just today I found out there are two options to save a pdf from Microsoft edge. "Save as PDF" vs "Microsoft print to PDF", and the "Microsoft print to PDF" produced a file that was significantly larger and slower to load when viewing.
    3. some PDFs I've seen allow you to search and select text, but don't let you copy or print. I think it's called "secured PDF". I'm not sure why PDF viewers from companies other than adobe would respect those restrictions. Is there something in the file that fundamentally makes these actions impossible or does it just ask the program to disallow them?

    • @neumdeneuer1890
      @neumdeneuer1890 3 ปีที่แล้ว +12

      Response to point 3:
      Yes, the PDF just asks nicely to not allow copying. There are no technical restrictions and more then enough programms which ignore such requests.

    • @hanelyp1
      @hanelyp1 2 ปีที่แล้ว +1

      And a fair selection of the software you could use to read the open format PDF is open source. If such software did pay attention to a "no copy" flag it would be possible to alter the software to ignore it.

  • @harshjinger
    @harshjinger 3 ปีที่แล้ว +7

    Thanks... I rely on open source information to learn about computer based things that occurred even before I was born.
    Recently, I was looking into this exact question for a project of my own, And this is a perfect resource.
    I have never used Adobe's official softwares, being a novice ungrad student besides being broke, this serves as a great reference.
    Thanks a lot again...

  • @magacacciari3565
    @magacacciari3565 3 ปีที่แล้ว

    Huge fan of Professor B and his computer lores.

  • @geirtwo
    @geirtwo 11 หลายเดือนก่อน

    I wish this channel had more satisfying visuals.

  • @squishmastah4682
    @squishmastah4682 3 ปีที่แล้ว +12

    "[PDF] covers a multitude of sins."
    Yes. Especially at Hustler Magazine.

  • @tjarko72
    @tjarko72 3 ปีที่แล้ว +14

    I always tought that PDF(ftg) was closely related to postscript, I would have expected a mention of postscript. More mordern, also PDF/A.

    • @ZedaZ80
      @ZedaZ80 3 ปีที่แล้ว +1

      PostScript is lovely

    • @nezZario
      @nezZario 3 ปีที่แล้ว

      It is.

  • @ajayrangishetti5515
    @ajayrangishetti5515 3 ปีที่แล้ว +7

    Please do a video on explaining Pentium processor architecture, and about how multi-core processor perform out-of-order execution.

    • @DrSteveBagley
      @DrSteveBagley 3 ปีที่แล้ว +2

      We’ve done out of order execution

    • @ajayrangishetti5515
      @ajayrangishetti5515 3 ปีที่แล้ว

      @@DrSteveBagley thankyou I got it!!👍

  • @Richardincancale
    @Richardincancale 3 ปีที่แล้ว +12

    Do you remember desk-top search engines? I used to test them by hiding the word ‘marmalade’ in a PowerPoint in a zip file to test their ability to find and index text :-)

    • @ShankarSivarajan
      @ShankarSivarajan 3 ปีที่แล้ว

      Did that work?

    • @CJT3X
      @CJT3X 3 ปีที่แล้ว +1

      You mean like an early version of Spotlight/Alfred?

    • @Richardincancale
      @Richardincancale 3 ปีที่แล้ว

      @@CJT3X I recall that both Altavista and Hoogle had desktop indexing tools. Yes it worked and found my hidden marmalade!

    • @Richardincancale
      @Richardincancale 3 ปีที่แล้ว

      @@ShankarSivarajan Yup

  • @johnno4127
    @johnno4127 3 ปีที่แล้ว

    The searchable nature of image and hidden text or (image with text replaced by an actual font) is fantastic!
    .
    The vast quantity of extra spaces and line returns can get frustrating when trying to use that OCR text, though. It's also a pain when adobe put a random space in the middle of a word or between EACH LETTER and now you can't find what you're looking for.

  • @mickjames73
    @mickjames73 3 ปีที่แล้ว +3

    Pdf variability is very frustrating for blind or low vision people. You would often receive a document of instruction manual which was rendered as an image only and we used to have to print, rescan and ocr them (often quiite tricky with complex page layouts). Luckily there is now a fairly accurate builtin ocr engine in things like acrobat reader. The other issue with pdf variantion is many pdf dont confirm to standards for accessibility and thus become unusable, or difficult, when viewed with accessibility features turned on.

    • @Jebusankel
      @Jebusankel 3 ปีที่แล้ว

      I was frustrated recently that my auto insurance documents are all in bad bitmap PDF format. But if I complain to them and claim to be blind, I think they'll have some follow up questions. 😜

  • @jorisschellekens4630
    @jorisschellekens4630 3 ปีที่แล้ว

    The way most PDF libraries or programs handle OCR is by something the spec calls "optional content groups".
    Optional content groups allow you to mark any content in the pdf content stream with a particular tag (typically the layer name).
    Programs like Adobe will then show you a listing of all the layers. So you could imagine being able to toggle OCR on and off.

  • @Baxtexx
    @Baxtexx 3 ปีที่แล้ว +1

    Urg this reminds me of a software I was working on that was consuming pdfs and rebranding them. There were so many edge cases all the time!

  • @DaimlerSleeveValve
    @DaimlerSleeveValve 3 ปีที่แล้ว +4

    It surprised me that for the last couple of years, Google has been running OCR on the contents of PDFs which contain only images. I've located names mentioned only on signs visible in the backgrounds of pictures of something else.

  • @lablnet
    @lablnet 3 ปีที่แล้ว +1

    Nice love to see more video's like these

  • @Yupppi
    @Yupppi 3 ปีที่แล้ว +6

    I see new computerphile with prof. Brailsford's face and my week is immediately better. I even got to walk inside his home a little bit this time!
    After seeing bad photocopies of 80's device manuals, I too can get behind their obsession about pdf quality. Even the manufacturer's archives has that poor photocopy and the original pront could've been subpar.

  • @okusa7750
    @okusa7750 2 ปีที่แล้ว +2

    Feel like David Attenborough just lectured me about the types of PDF. Amazing passionate storyteller

  • @SteveMacSticky
    @SteveMacSticky 2 ปีที่แล้ว

    Very well explained

  • @MrBoubource
    @MrBoubource 3 ปีที่แล้ว +13

    My internship topic is to find the paragraphs containing some keywords in a pdf with 4 different formatting depending on its provider.
    I am beginning to hate it.

    • @DT-dc4br
      @DT-dc4br 3 ปีที่แล้ว +4

      Might be a job for a Linux shell script with awk / grep & sed

    • @MrBoubource
      @MrBoubource 3 ปีที่แล้ว +3

      @@DT-dc4br I went with python (and regex's) because I'm most familiar with it... But holy what a mess it is to covert pdf to html and plain text..

    • @etziowingeler3173
      @etziowingeler3173 3 ปีที่แล้ว

      Hahaha I can imagine

  • @TheFakeVIP
    @TheFakeVIP 3 ปีที่แล้ว +3

    I feel it bares also pointing out that correctly type-set text in PDF files that is reproduced from a font, not a bitmap, significantly increases the accessibility of such documents for people who use assistive technologies such as screen readers. PDF files are often ripped to shreds by the blind community for this exact reason. Even correctly produced PDFs that are, for instance, produced from a word processor, often cause problems for screen readers depending on how the text is drawn, and the competency of the software to add accessibility hints where appropriate. A common example of this is text in columns: quite often assistive technologies don't expect this, and so read it linearly (I.E. they read both columns at once). Properly tagging important landmarks such as headings can also be a great help, as screen reader users frequently navigate (or even summarise) a document simply by jumping between headings.

    • @williamchamberlain2263
      @williamchamberlain2263 3 ปีที่แล้ว

      Yes

    • @lawrencedoliveiro9104
      @lawrencedoliveiro9104 2 ปีที่แล้ว

      DJVU format deals with this by storing searchable text objects which are not rendered, separate from the actual page rendering.
      I think PDF allows this also.

  • @zombiegeorge749
    @zombiegeorge749 3 ปีที่แล้ว +5

    2:42 whats up with the edges of the screen?

    • @Computerphile
      @Computerphile  3 ปีที่แล้ว +4

      if you read the small text on the "newspaper" it helps explain it a little :) -Sean (basically I rotated it a little to fix my wonky camerawork and missed zooming it in)

  • @Gnsdtc
    @Gnsdtc 2 ปีที่แล้ว +1

    This is beautiful. The OCR version is PDF I+HT!

  • @trollhunter200
    @trollhunter200 3 ปีที่แล้ว

    You are just awesome Professor.
    👍👍👍

  • @ZaneDaMagicPufferDragon
    @ZaneDaMagicPufferDragon 3 ปีที่แล้ว

    PDF FTG FTW 🙌🏻 I LOVE ❤️ PDF AND ITS PROGRESS IS AMAZING 🤩 GREAT VIDEO PROFESSOR 👨🏻‍🏫 BRAILSFORD!!!

  • @Graham_Rule
    @Graham_Rule 3 ปีที่แล้ว

    The photocopier/scanner at work can scan to PDF/A which generates searchable text by doing OCR. Being internet enabled it can then send a copy by email (possibly bcc'd to Xerox or other third parties without our knowlege).

  • @delhatton
    @delhatton 3 ปีที่แล้ว +1

    OCR for pure text. Maybe OK. It will still require editing. OCR for numerical data, like some Excel sheets, by the time you've verified all the numbers, you might as well have retyped it.

  • @soccerox817
    @soccerox817 3 ปีที่แล้ว +32

    Exactly why I cant stand when people just ask for a PDF or send a poorly rendered pdf. Gotta write documents in LaTex and export a quality PDF

    • @peterwhitey4992
      @peterwhitey4992 3 ปีที่แล้ว +2

      LaTex is overrated.

    • @miran248
      @miran248 3 ปีที่แล้ว +14

      @@peterwhitey4992 Wouldn't say overrated, but maybe an overkill in most cases. Something like markdown should be more than enough for simple stuff (w/o math equations, ..)

    • @peterwhitey4992
      @peterwhitey4992 3 ปีที่แล้ว

      @@miran248 - I know it's practical to write in, but it's the result that I find overrated. You can always tell when a paper/book is written in LaTex. They all look the same. Especially textbooks written in LaTex are generally not very good.

    • @Platoqp
      @Platoqp 3 ปีที่แล้ว +1

      @@peterwhitey4992 It is excellent for writings that include mathematics and other scientific formulas

    • @michaelb2047
      @michaelb2047 3 ปีที่แล้ว +4

      @@peterwhitey4992 I would say most natural science textbooks are written in latex. You can change everything so you won’t notice that it was actually written with latex. You notice it only if they use the default template / font. Also they are often much cleaner / more consistent than „Word“ books for example.

  • @SeanBZA
    @SeanBZA 3 ปีที่แล้ว

    Also different types of PDF creator gives different file size outputs. Firefox PDF is massive, often bigger than the original, as it is a PDF of the page as it would be sent to the printer, but the PDF output from Debian is a lot smaller, just a file with the fonts and text, as the original document had.

  • @MartinOmander
    @MartinOmander 3 ปีที่แล้ว

    Excellent video! I have a request for future videos: please consider keeping the camera still if the subject is stationary. The shakycam effect unfortunately made me seasick and distracted from the professor's excellent performance.

  • @TimothyWhiteheadzm
    @TimothyWhiteheadzm 3 ปีที่แล้ว +16

    Expecting a certain quality of content from the pdf format is as ridiculous as expecting quality content on a web page. A container is just that. It can contain flowers, or manure. As for the OCR feature, that is great, but one wonders if that is part of 'pdf' or part of the tool that creates the pdf?

    • @harshjinger
      @harshjinger 3 ปีที่แล้ว

      Idk... About this... I would love to know more... Commenting for any followups

    • @majorgnu
      @majorgnu 3 ปีที่แล้ว +1

      It's a feature of the software that produced the PDF, obviously.
      Even if the format was extended at some point with features that facilitate this kind of use, the file itself still only contains the *result* of the OCR process, which was performed by whatever applications were used to produce it.

    • @drawapretzel6003
      @drawapretzel6003 3 ปีที่แล้ว +1

      Well, its not in the free version of adobe reader, thats for sure.
      Theres lots of free OCR software that can OCR a pdf for you, but yes, its included in the tools for an actual PDF creation software too.

    • @HetareKing
      @HetareKing 3 ปีที่แล้ว

      The actual OCRing happens in the creation tool, but this whole notion of having a bitmap overlay invisible text has to be encoded into the file and so the format has to support it. And since this functionality only really makes sense in the context of the OCR feature, I think it's fair to say it's part of "PDF".

    • @JNCressey
      @JNCressey 3 ปีที่แล้ว

      I suppose if the creator of the pdf has a bitmap with text that is obviously unOCRable (maybe stylised text) they would manually add the hidden text, getting the same effect but without OCR.
      Styles that come to mind that OCR wouldn't work well on could be extra objects between the letters (google doodles), people posing in letter shapes (it's fun to stay at the YMCA), drawing just the negative space, bubble text or drawing just the shadows of the text, leaving out lines (E as 3 horizontal lines, A without the horizontal part), or using characters of other alphabets that look similar (like in r/grssk).

  • @adrianalexandrov7730
    @adrianalexandrov7730 ปีที่แล้ว

    That's kinda how djvu worked: saving text as a high detailed foreground and compressing background. That was miracle how scanned hundreds of pages book could fit into just a few Mb

  • @saranchance5650
    @saranchance5650 3 ปีที่แล้ว +1

    Pdf has additional accessibility features that the variants you described make possible

  • @superfluidity
    @superfluidity 3 ปีที่แล้ว +3

    If you can, don't just aim for the highest quality that your audience demands - aim for quality far beyond that. That will give you more freedom to rework the document later if you want to.

  • @henke37
    @henke37 2 ปีที่แล้ว +2

    Fun fact: the pdf format is so complex that it literally includes functionality for executing arbitrary shell commands. As a feature.

  • @anarchist
    @anarchist 3 ปีที่แล้ว +3

    8:40 4:3 monitor because nothing can throttle Brailsford's brain power.
    Not PDF but something that tickled when working with TIFFs was a joke it stands for "Thousands of Incompatible File Formats"

  • @iabervon
    @iabervon 3 ปีที่แล้ว

    Midway through the video, I was distracting by recognizing that Professor Brailsford uses the same program for email that I do.
    I often solve crossword puzzles that I get as PDFs, and it's interesting to see whether the program that made the PDF put the text of the clues in the logical order that you'd read them, or if it went top to bottom, left to right, ignoring columns.

  • @b391i
    @b391i 3 ปีที่แล้ว

    Awesome as usual 😇

  • @jashaswimalyaacharjee9585
    @jashaswimalyaacharjee9585 3 ปีที่แล้ว +1

    I am totally convinced that Prof. Brailsford uses this machine 9:58 as his occasional-use Computer. What Peeping Toms like me can observe, there's Alpine 2.21 (fairly latest software compared to the system)

  • @Rubrickety
    @Rubrickety 3 ปีที่แล้ว

    Fascinating video with perhaps the least clickbaity title in history.

  • @AleksyGrabovski
    @AleksyGrabovski 3 ปีที่แล้ว +2

    Can you also do a video on DJVU format?

  • @unlokia
    @unlokia 3 ปีที่แล้ว

    Prof Brailsworth: The font of all PDF knowledge.

  • @UncleKennysPlace
    @UncleKennysPlace 3 ปีที่แล้ว +2

    My day job is assembling documents in PDF format for aviation certification. It's shocking how many engineers send everything as PDF, even bitmaps, when I know they had to convert them, despite instructions saying we can work with any format that their native applications produce.

    • @bhargavk1515
      @bhargavk1515 11 หลายเดือนก่อน

      Sir how do I learn to pdf format encoding, any guide?

  • @pierreabbat6157
    @pierreabbat6157 3 ปีที่แล้ว +1

    Many of my programs output PostScript, which can be converted to PDF. I've seen many PS files get bigger when converted to PDF; I just checked one which is 4.5 times as big in PDF as in PS. I also once wrote a PS file using the random number generator and converted it to PDF. The converted file lost the randomness.
    I'm a surveyor and download maps in PDF from register of deeds sites. The old ones are scanned, of course. But the ones drawn with CAD are, I think, also scanned. They should be taken from the PDF output of the CAD program, except that the signature is written on paper (or clear plastic sheet), which poses a problem. Digitizing the numbers from a printed copy of the plat can result in illegible numbers (is that a 6, an 8, or a 9?).

  • @HugoOneYT
    @HugoOneYT 3 ปีที่แล้ว +2

    To me PDF is about compatibility, there's a reason why all invoices are PDF, everything can open it

  • @danielmnet
    @danielmnet 3 ปีที่แล้ว

    If Prof. Brailsford is explaining I am interested in, it doesn't matter the subject

  • @No0utlet
    @No0utlet 3 ปีที่แล้ว

    At 2:30, it appears that the video of Prof. Brailsford is overlaying a video of the paper on his table and is rotated a very slight amount. Are there any video editors out there that could explain how that might happen by accident?

  • @marsgal42
    @marsgal42 3 ปีที่แล้ว

    In a past life I did a lot of work with PostScript and one product we developed was a PostScript sanitizer that would take any deranged PostScript you threw at it and output well-behaved well-structured PostScript suitable for further processing. We got the idea from generating PDF then printing it to a file with Adobe's PostScript printer driver.

  • @lawrencedoliveiro9104
    @lawrencedoliveiro9104 2 ปีที่แล้ว

    12:03 It looks like a scan that has been quantized into a bilevel (black and white only, no greys) bitmap. Those little hairy extensions on the edges are characteristic of that.

  • @Smogshaik
    @Smogshaik 3 ปีที่แล้ว

    I would love a video about the PDF/A format!

  • @PhilipStorry
    @PhilipStorry 3 ปีที่แล้ว +2

    How do I subscribe to Vague Magazine? If it has high quality reminiscing from Professor Brailsford, then I need a subscription! 😉

  • @bartas9693
    @bartas9693 3 ปีที่แล้ว +6

    It's ok I'll send you a PDF.

    • @SimGunther
      @SimGunther 3 ปีที่แล้ว

      Yeah, but what? Image, full, text?

  • @davidgillies620
    @davidgillies620 3 ปีที่แล้ว

    I primarily generate PDFs with pdflatex, using EPS or PNG for embedded graphics, so I get searchable, arbitrary-resolution output. It looks very nice.

  • @oposkainaxei
    @oposkainaxei 3 ปีที่แล้ว +3

    4:30 OCR Systems

  • @Ice_Karma
    @Ice_Karma 2 ปีที่แล้ว +1

    Prof. Brailsford, do you still use PINE, or Alpine? =D
    (PINE user since 3.87...)

  • @Chobungus
    @Chobungus 3 ปีที่แล้ว +1

    Can someone clarify for me, when he is going over the "hideously complex mathematical equations" @ 9:19, he says that you do not want to have to type that out character-by-character. Yet he then demonstrates that he is able to zoom in greatly while preserving quality. So how did he translate the bitmap image to that high quality type set?

    • @Computerphile
      @Computerphile  3 ปีที่แล้ว +3

      In this case that's exactly what the Prof is working on, recreating this important document page by page using similar software to what Dennis would have had available - Professor Brailsford talks about it in a recent video but it has been an almost full time job for him for a while now! -Sean p.s. if you see the two pictures early in this video you'll see that a version of the Thesis Dennis held was damaged but one his friend had reviewed is OK - The damaged one has amendments so this is a difficult task!

    • @Chobungus
      @Chobungus 3 ปีที่แล้ว

      @@Computerphile Thanks for the reply! Great video!

  • @ieperlingetje
    @ieperlingetje 3 ปีที่แล้ว

    4:24 Sean often gets camera settings wrong and things come out blurry, so here's an animation to hide that.

  • @jorisschellekens4630
    @jorisschellekens4630 3 ปีที่แล้ว

    This is such a wonderful video. I'm the author of a PDF library (pText) and you have no idea how often people will complain about something like "it doesn't seem to extract the text".
    Thus forcing me to explain "Yeah, but this is an image, not a PDF."

  • @jeromethiel4323
    @jeromethiel4323 3 ปีที่แล้ว +1

    I worked for a company, and we had electrical prints that were paper only. We paid a company to generate CAD files of the prints. What they did is insert scans of the paper copy into the CAD software, which isn't what we wanted. They basically screwed us over big time.
    The whole point of having them i CAD format was so that we could edit the bloody things!

  • @UnOrigionalOne
    @UnOrigionalOne 3 ปีที่แล้ว +1

    One could argue similar points for video.

  • @PswACC
    @PswACC 3 ปีที่แล้ว

    What software on linux are you using to activate OCR search ability?

  • @tubbdoose
    @tubbdoose 3 ปีที่แล้ว

    He has so much passion about PDFs XD

  • @turpialito
    @turpialito 3 ปีที่แล้ว

    Brailsfordphile, Brady. I think it's high time ;)

  • @bhargavk1515
    @bhargavk1515 11 หลายเดือนก่อน

    Can you make a tutorial (or is there a tutorial) on how prof. Brailsford restored the bitmap pdf into pdf encoding...

  • @LoesserOf2Evils
    @LoesserOf2Evils 3 ปีที่แล้ว

    If you can decompose the PDF into the text and the graphics and then recreate them into a word processing document, that can help. Then drop the document into Adobe Indesign for better and tighter layout. I admit that's a lot of effort, but sometimes it's worth it; and if the PDF standard changes in the future and it's important to produce a new standard, it'll be far easier.

  • @xelaxander
    @xelaxander 2 ปีที่แล้ว

    What’s the software Prof. Brailsford is using? I’d really love to search to some older mathematical books.

  • @samuelworsnop9983
    @samuelworsnop9983 3 ปีที่แล้ว +3

    I really want to know what Professor Brailsford's favourite font is!

    • @DrSteveBagley
      @DrSteveBagley 3 ปีที่แล้ว +1

      Optima I suspect.

    • @lakompee
      @lakompee 3 ปีที่แล้ว +1

      Comic sans

    • @johnno4127
      @johnno4127 3 ปีที่แล้ว

      @@lakompee papyrus

  • @PemboCycling
    @PemboCycling 3 ปีที่แล้ว

    Didn't Techmoan do a video on the company that was purchased by Adobe for OCR, as they made a text to speach program that the video for Techmoan was covering?

  • @johnholland7497
    @johnholland7497 3 ปีที่แล้ว +1

    I'd love to know which software you used to convert the PDF with just bitmaps into one with searchable text. Is it open source?

    • @igorthelight
      @igorthelight 3 ปีที่แล้ว

      I know about "ABBYY FineReader PDF" which is not Open Source nor free.
      Maybe there are others

    • @beakmann
      @beakmann 3 ปีที่แล้ว

      There is tesseract

  • @kakka4462
    @kakka4462 3 ปีที่แล้ว

    2:31 whole clip is tilted showing background clip of table rug?

  • @Amonimus
    @Amonimus 3 ปีที่แล้ว +1

    To me a PDF is like an archive with multiple images or doc that you can list through.

  • @volodyadykun6490
    @volodyadykun6490 3 ปีที่แล้ว +4

    4:18 great newspaper

    • @miran248
      @miran248 3 ปีที่แล้ว

      .5btc - that's one expensive newspaper :)

    • @klaxoncow
      @klaxoncow 3 ปีที่แล้ว

      @@miran248 Or maybe not. Depends how well Bitcoin's doing at the time.
      Virtual currency, yes. Anchored currency, no.

  • @Fre1maurer
    @Fre1maurer 3 ปีที่แล้ว

    My first PDF was the manual of the flight simulator game TFX back in 1994, it was the re-release budget version without printed manual. There was Adobe Acrobat Reader for MS-DOS on the game CD, and holy crap was the quality of the document bad (and the clumsy Reader itself was not much better). They obviously simply scanned a real printed manual and saved it as images with something like 4-Bit grayscale and the the text sections looked like plain 1-Bit black-or-white without any anti-aliasing. I never thought this text for the poor called PDF could be a thing in the future.

  • @rudiklein
    @rudiklein 3 ปีที่แล้ว

    A great talk, scrolling printer paper and a flashy shirt. What else does a video need?

  • @ahmetardaedogan6697
    @ahmetardaedogan6697 3 ปีที่แล้ว

    Could you explain harris corner detection?

  • @gedavids84
    @gedavids84 3 ปีที่แล้ว

    I just want to say that I'm really glad Professor Brailsford survived covid.

    • @user-ue1vw6iv3s
      @user-ue1vw6iv3s 2 ปีที่แล้ว

      Contact OnTelegram @Samanthaleeward

  • @John_Fx
    @John_Fx 3 ปีที่แล้ว +4

    He barely scratched the surface of the complexity of PDF formats. Didn't even cover PDF/A or why you should never redact a PDF and send out that original file.

    • @Jebusankel
      @Jebusankel 3 ปีที่แล้ว

      There is a true Redact function in Adobe Acrobat. You just have to use that instead of drawing a box on top.
      Ditto on PDF/A though.