Knowledge clip: Keeping research data organized

แชร์
ฝัง
  • เผยแพร่เมื่อ 23 พ.ย. 2024

ความคิดเห็น • 70

  • @G_Whiz
    @G_Whiz ปีที่แล้ว +167

    I use random password generators for file naming. Every time I open a file, it's like a tiny surprise party or similar to the rush I get when hitting a good roll at the casino if the file is actually the one I'm looking for. I like to live under constant stimulation and stress, it helps me feel alive.

  • @jenswurm
    @jenswurm ปีที่แล้ว +26

    One often overlooked aspect is a depreciation strategy. At some point documents may no longer be relevant, and one wants them out of sight of the normal document archive, while still being able to find them with relative ease.

  • @lazygardens
    @lazygardens ปีที่แล้ว +5

    This is excellent!
    If you don't have revision control software, always have a folder for your source material, and copy that material out to work on it.
    So many people accidentally edit the source image.

  • @l00k4tstuff
    @l00k4tstuff ปีที่แล้ว +16

    The best way to organize files is to study taxonomy and apply those principles to the structure.

  • @jimgrant1776
    @jimgrant1776 ปีที่แล้ว +14

    Ghent University Data Stewards - Excellent video. Maybe the best I’ve seen on folder / file organization.
    However, I disagree with you on some aspects of this.
    1 - Folder Names and Structure - Don’t design your folder names around file attributes. That is what tags are for. Folder names should be “categories” of topics/objects/nouns. The structure should be hierarchical. - - - I do agree that the “categories” should not overlap. Some professional consulting firms have developed a concept / methodology for organizing information called MECE. It stands for mutually exclusive (ME - no overlap) and collectively exhaustive (CE - nothing overlooked). - - - If it appears that a file can logically fit into more than one folder, that’s a signal that either the hierarchical file structure needs to be changed or tags should be used.
    2 - File Names - Don’t put dates in file names. If you do, files with common topics probably can’t be sorted and listed together. Windows and Mac operating systems create and maintain file “Create Dates”” and “Last Modified Dates”. If some other date (like “Due Date”) is important, add it as a tag (attribute) of the file. - - - Unquestionably, the best and most understandable / relatable file naming scheme is to make the names be specific cases of a topic/object/noun with are deeper in the hierarchy than the folder names.

    • @Junnaris
      @Junnaris ปีที่แล้ว

      Thank you for sahring! I find that information very useful.

    • @l00k4tstuff
      @l00k4tstuff ปีที่แล้ว +5

      I disagree with the dates in file names kibosh. Sure, you can turn on the file manager to show details, but then you have too many columns. Similarly, when files are a catalog of collection, then it is important to include the date (and time, and then also a unique identifier if more than one in a second) of the capture. This is much better than creating a hash for creating distinct file names because the date&time stamp has orderly meaning.

    • @cowboybob7093
      @cowboybob7093 ปีที่แล้ว +4

      @@l00k4tstuff Agree, create and modify dates don't necessarily reflect important aspects of why a date is used. Those dates can be naïvely manipulated by the OS or programs. Moving files between systems has headaches too. Finally, details vs name view, a "screenful" of files is a handy arbitrary amount of names but few detailed.

  • @pieterkops
    @pieterkops ปีที่แล้ว +56

    Don't store all the information in the filename! Use metadata instead. This makes sorting, filtering and grouping much easier. Think of it as dynamic folders.

    • @joaopedrorocha5693
      @joaopedrorocha5693 ปีที่แล้ว +6

      Details should be on the metadata, but some info on the name can be very useful sometimes ...
      I've had an annoying problem once for leaving almost all info on the metadata of my files. I've had to store a lot of data on the cloud and was using a binary data format with embedded metadata, therefore to read the metadata was necessary to run a program, so to do this it was necessary to move the files over the network to a compute node to run the program and get the metadata to them select what i needed, which took a lot of time.
      if i've had a better naming convention for my use case i would be able pre-filter the data only by pattern matching the filename.

    • @giac0416
      @giac0416 ปีที่แล้ว +2

      can you give as an example? thanks

    • @TheDiveO
      @TheDiveO ปีที่แล้ว +1

      well, file and folder names are actually meta data. but then, this is an ex-cathedra video. some people are busy organizing their life and telling others what to do, erm TODO. others are living their life.

    • @jenswurm
      @jenswurm ปีที่แล้ว +3

      I wish normal file systems would support labeling. That would make things a lot easier. Some documents might, despite the best efforts of coming up with an organization scheme, be relevant in multiple places. For example, the balance statement of an investment account may belong into the folder that is related to that bank, but it's also relevant for doing one's taxes.

    • @lazygardens
      @lazygardens ปีที่แล้ว +4

      @@jenswurm You can use symlinks.
      The bank account has the real file
      The tax prep folder has a symlink. It looks like a file, and when you click it, it opens the file in the bank account folder.
      en.wikipedia.org/wiki/Symbolic_link

  • @marcogalloro
    @marcogalloro ปีที่แล้ว

    Extremely valuable!🗂️🙌🏻

  • @yohanesliong4818
    @yohanesliong4818 4 หลายเดือนก่อน

    Very informative. Thank you

  • @CrazyDriverSwed
    @CrazyDriverSwed ปีที่แล้ว

    Most filesystem also provides ways to add tags to files, tags that can be used when searching for a specific file or a group of files.

  • @vaughngaminghd
    @vaughngaminghd ปีที่แล้ว +18

    I make a point of never using "final" in the file name: best way to jinx the project…😆

    • @cowboybob7093
      @cowboybob7093 ปีที่แล้ว +3

      When I do a light developing task like a near-throwaway script I'll start with _name99_ and work down. By default the most recent rev sorts to the top.

    • @Blast-Forward
      @Blast-Forward ปีที่แล้ว +2

      That's why I love Linux, you can have
      final
      Final
      fInal
      fiNal
      finAl
      finaL
      FInal
      FiNal
      FinAl
      FinaL
      FINal
      FInAl
      FInaL
      FINAl
      FINaL
      and most importantly
      FINAL
      Takes you a long time to run out of finals.

    • @atlasstone6896
      @atlasstone6896 11 หลายเดือนก่อน

      This this is what I am saying hahahahah
      If you say final it will never finish!

    • @EverCraft_File_History
      @EverCraft_File_History 5 หลายเดือนก่อน +1

      @@Blast-Forward hahahahhh, I couldn't help but laugh when I saw this.

  • @anitat9727
    @anitat9727 29 วันที่ผ่านมา

    Thank you for this

  • @BrandspankingFilm
    @BrandspankingFilm 4 วันที่ผ่านมา

    Currently i am doing researtch on this subject, i'd be very interested to talk to people and find out how they approach this within their organisation. Feel free to reach out.

  • @phpn99
    @phpn99 ปีที่แล้ว +9

    These were "best practices" in 1989. The world needs to move from data categorization that is based on "where", to one that is based on "what". This means that categorization is only accessorily related to folder trees ; it should be primarily done via metadata, or emergent metadata extracted by modern search engines. This old way to do things is based on the concept of taxonomy ; information is not best done in a hierarchy but in a network.

    • @timbehrens9678
      @timbehrens9678 ปีที่แล้ว +8

      Learning and implementing "metadata extraction by modern search engines" shouldn't be a prerequisite for a bachelor thesis in biology or sociology. In many cases best practices of 1989 are still the best.

    • @lazygardens
      @lazygardens ปีที่แล้ว +3

      OK ... where is your video showing WHAT metadata to put in a file to make it extractable by "modern search engines". Explain your schema.
      And what does the poor PhD candidate do when the "modern search engine" is unavailable because the frigging network is down?

    • @darked89
      @darked89 ปีที่แล้ว

      Good luck extracting metadata from the sequencing machine created foobar.fastq.gz
      One has to have unique file names and either encode project + sample etc in the file name or have a database where each file has an entry describing it.

  • @darked89
    @darked89 ปีที่แล้ว

    I would also add a strong recommendation to avoid if possible binary formats. It is almost trivial to track or spot differences between say two TSV files without opening them (git and diff).

  • @edwingonzalez6399
    @edwingonzalez6399 ปีที่แล้ว

    En la carpeta en donde vayan a estar varias revisiones siempre pongo una carpeta llamada "_Superados". Esto con la finalidad de que muevo para alla todas las versiones obsoletas y solo me queda la ultima, pero no me deshago de las anterior por precaución. Así luego de 1 año o más, cuando voy a buscar el ultimo reporte, no tengo que lidiar con una lista de versiones.

  • @EverCraft_File_History
    @EverCraft_File_History 5 หลายเดือนก่อน

    What software do people typically use for version control of research data?

  • @Pedritox0953
    @Pedritox0953 ปีที่แล้ว

    Great video!

  • @quochuynh184
    @quochuynh184 ปีที่แล้ว +1

    The underscore disables Windows search capability for file names.

    • @cowboybob7093
      @cowboybob7093 ปีที่แล้ว +1

      Not quite my experience, but I know what you mean. The way I look at underscore vs hyphen is underscore is just another letter but hyphen includes invisible white space. If you're referencing the folder tree search in Windows Explorer, and you notice a different behavior, I'll defer to your experience. That feature was so bad for so long I've only started using it recently. I know what I'm about to write is out of the stone-age, but it's not unusual for me to open a command prompt and do a dir /s /b > .\allfiles.txt - - - then use `find` to search the new file. The disk access is heavy one time, after that the text file is opened in memory and searched in a flash. The method has its obvious drawbacks, but they all do.

  • @sebastianpozo8305
    @sebastianpozo8305 2 ปีที่แล้ว +2

    JUST AMAZING! CAN'T BELIEVE THAT YOU ONLY HAVE 2065 VIEWS.

  • @danielcraft7342
    @danielcraft7342 ปีที่แล้ว

    Any tip to do if I have a lot of audio files or video files? In my example I have like 30 audios of the same event. Any idea?

    • @jon9103
      @jon9103 ปีที่แล้ว

      Like any other set of files, it depends on what differences between the files do you care about, what makes each file unique. Surely there must be something, otherwise why bother having 30 of them? With only your vague description it's impossible to know what's important to you.

    • @lazygardens
      @lazygardens ปีที่แล้ว +1

      Like multiple videos of a wedding ...
      Event and date would be important. After that, by what's happening you want to recognize quickly.
      2020_BearAttack_screams.mp4
      BearAttack_2020_screams.mp4
      Are they sequential captures?
      Simultaneous captures by different equipment?
      What is important is that you can look at the directory and pick out the file you want easily. (and write down your scheme so you can have an assistant find it).

    • @danielcraft7342
      @danielcraft7342 ปีที่แล้ว

      @@lazygardens thank you, it actually help me a lot

  • @Justopensourceandme
    @Justopensourceandme ปีที่แล้ว

    You need git or related tools to tracking your project files states

  • @reyesmedicen
    @reyesmedicen ปีที่แล้ว

    Amazing

  • @sheffin007
    @sheffin007 2 ปีที่แล้ว

    Thank you

  • @ElCidPhysics90
    @ElCidPhysics90 ปีที่แล้ว +2

    If using the date as file name I would start with year then month then day e.g. 2023 07 04

  • @valerio4044
    @valerio4044 ปีที่แล้ว

    Habría que ver qué dice el paper pero yo creo que que la IA tiene memorizada la película y lo que prevee es el minuto que está viendo el ratón en base a la lectura de las ondas cerebrales

  • @SphereofTime
    @SphereofTime ปีที่แล้ว

    1:18

  • @atlasstone6896
    @atlasstone6896 11 หลายเดือนก่อน

    Dont use acronyms!
    You will forget it !
    Your team mates will ask what is it!
    It is not the old era of pc where every bit counted and screens were small just write a long name it is ok !
    As for versions use date and the time of day in 24 hour format
    Like 2023/12/24-1830
    By the way pro time every hour of edit save a version!
    And this system will allow you to keep going it will push you once you see the files and notice that you have not been working your brain will say oh start work look there are dates we didn't work!

  • @dannytan8080
    @dannytan8080 ปีที่แล้ว +2

    a bit obnoxious to call your own preferred folder organization style as "Best Practices"

  • @madwhitehatter
    @madwhitehatter 2 ปีที่แล้ว +3

    The 90s called.

  • @kathyglass2922
    @kathyglass2922 2 ปีที่แล้ว +2

    Seriously? You have to tell me the advantages to having an organized file system? Geesh, there was a reason I clicked on the this to begin with. Wonder what it was.

    • @gr8dvd
      @gr8dvd ปีที่แล้ว

      😂😂😂

    •  ปีที่แล้ว

      Obvious as it may seem, I can understand why a university feels the need to produce this sort of video.

    • @gr8dvd
      @gr8dvd ปีที่แล้ว

      @ OP is not questioning why video is needed, she questioning the need to explain why being organized is a good thing.

  • @petersalt2342
    @petersalt2342 2 หลายเดือนก่อน

    A file system from a university? Definitely a No Thanks !

  • @ahmadkavie4178
    @ahmadkavie4178 หลายเดือนก่อน

    very thank you