My theory isn't disproven either, and "it's clearly not a simple substitution" kinda leans into it. My theory is that we're missing something fundamental and basic. Like looking at it upside down or only reading the symbols that correspond to prime numbers or something.
@@MrThedrachen Do we actually have proof of its providence? I would think looking at the techniques and knowledge and ideas and intentions of the time it was made in would be a good starting point. I think they made conlangs then too, so you could look at those, compare, and try to see the tendencies there.
@@skyworm8006 No. It's provenance is uncertain, though the most common theories tend to assert it is Italian in origin. Carbon dating suggests the document is ~500 years old
@@Arado159 Ok, I guess it's a mystery for now. Which is appropriate because the Voynich Manuscript is a mystery and has been a mystery for nearly either a century or more.
Maybe they were intentionally avoiding certain letters as a writing exercise, like how that French art collective wrote a whole novel without the letter E.
If it was a simple substitution cypher, we would have figured it out by now. My personal hypothesis is that it is an early example of a conlang. Medieval monks were often philosophers and linguists, and a conlang experimenting with a minimal sound inventory would make it incredibly hard to figure out without related texts or something akin to the Rosetta stone.
Hildegard von Bingen made one 200 years before... maybe this could be nuns from her abbey trying to refine what she made... the lack of sounds actually makes sense with "speaking in tongues"-type mystical speech production, which generally have a simpler phonology.
I know it's been many years since this has remained a mystery, but in the details is a link that seems to include a translation and seems to be quite aware of provenance... It would appear to be solved, and isn't a cypher but a language. The fact the video is showing letter variants and positioning seemed to indicate a major breakthrough happened, so I checked the details.
This is the most intelligent discussion of the Voynich manuscript I've seen. Most videos that talk about it, only mention a few obviously wrong ideas about it, then throw their hands up saying it's too hard....or something like that. I hadn't thought about positional forms, though, as someone who's done calligraphy, I should be more aware of it. This might not give us the answer to what this thing is, but it sure tells us more useful things about it. I subbed. Hope your other videos are just as good.
Thanks! Most of my other videos were meant with long-time Voynich enthusiasts in mind, but I'm planning to put out more videos like this one in the future.
One of the current theories that cannot be ruled out is that it is something like that: filler text. But then not "to be replaced", because it took a lot of effort to pen this stuff in and there seems to be quite some thought behind the system. The world was different before copy-paste :)
@@voynichtalk it's singing music for women. Chanting for healing and herbs. Or something like that. There's a very good video on TH-cam. Voynich Manuscript Explained Audrius Plioplys
@@FrancoisTremblay it's not me, it's : Audrius V. Plioplys is a Canadian artist, neurologist, neuroscientist and public figure of Lithuanian descent. He makes some very valid points on repeating words and stuff. There's no way it is a language and a normal text. It's too samey. Go watch the video and then make your criticism.
13 distinct letters doesn't mean, that it can only repræsent 13 phonemes. There could be digraphs. Some letters could influence the pronunciation of other letter, like how final e often changes the prævious vowel in English. Multiple sounds could be repræsented by one glyph.
That is true, but all languages I‘m aware of that have extebsive use of this tend to already have „not the smallest alphabet“ but we could get surprised. We once thought Mayan couldn‘t be an alphabet because it had way to many symbols (like in the hundreds) but it was. Just that they had like a dozen symbols for each phonem and then while writing tried to avoid letter repetitions like „can‘t use that specific e, already used it a word ago“… but all these thinga get complicated when we get to only 13 letters. It’s just extremely few…
A good example of this kind of ordeal would be youger futhark, a runic system with only 16 runes, most of them having to play double or triple duty (shoutout to úr for representing 4 or 5 sounds (3 vowels and then either u as a vowel or w as a semivowel) (also shoutout to íss and ár for both being able to represent e, together with their other roles)).
What if the script doesn't represent all the sounds of the language? The younger futhark has less symbols than sounds that probably existed in the era. Arabic script before introduction of punctuation had much less symbols than its phonemes. Even in case of Latin: its alphabet was borrowed from Etruscans and wasn't differenciating between voiced and voiceless consonants. Which was not a problem in the etruscan, but was insufficient for Latin.
The principle of not representing all the sounds of a language is certainly often encountered (like the two "th" sounds in English). What we tend to see in Voynich solutions though, is that this ambiguity needs to be taken too far, to the extent where solvers are basically filling in what they want to read. They devise a system that's flexible enough, so that when they think a plant image is a carrot, they can certainly turn some word on the page into something that reads "carrot" in some language. Another issue is Voynichese's rigid structure. You might be able to read it as Futhark, but if you go the other way around, you're in trouble. Encoding a text to Voynichese would result in all kinds of invalid words, because languages (even with small alphabets) need much more flexibility than what Voynichese has to offer. I go into this in my previous video: th-cam.com/video/XSTM8Gixai4/w-d-xo.html
@@voynichtalkthey could have a letter that represent something below a phoneme. A single voice marker alone can get rid of the need of several distinct voiced character.
The grapheme analysis here is excellent, but I don't think this means Voynich can't be an alphabet. It reminds me remarkably of old Mongolian script. Mongolian has about as many graphemes, along with positional variations and a restricted set of characters that can occur at the end of a word. In Mongolian, many phonemes are represented by a combination of graphemes, graphically indistinguishable from a sequence of letters. Old Mongolian had 7 vowels, but only 3 vowel letters; word initial /ö/ for example was written like 'aui'. Also, many graphemes ambiguously represent various phonemes. Mongolian script made no distinction of t/d and usually k/g, despite these being quite distinct in the language. In old manuscripts, a single 'tooth' mark could be an /e/, and /n/, one of two 'teeth' constituting an /a/ or just as well a /q/ or /g/-I'm not kidding. Sequences of frequent graphemes could infact represent one letter, spaces may demarcate syllables rather than words, as in the Phags-pa alphabet. I'm not saying Voynich is Mongolian, just that it's plausible as an alphabet on similar principles.
Those are good points, and in fact I agree that it is most like an alphabet (I spent some time in the video explaining why I think other writing systems are a worse match). Like I said in other comments though, the implied alphabet size must be considered in tandem with the entropy problem. Voynichese "word" structure is incredibly rigid and predictable, even after certain operations have been done to improve the situation. My prediction would be that if you were to take Mongolian and convert it to Voynichese, you would get a whole range of "invalid" formations in Voynichese because it does not offer the flexibility required for a natural language. I'm not familiar with Mongolian though, so I could be wrong.
Interesting. Also Arabic used to be written without the dots above or below some letters, so the same letter would represent two or three sounds, which would have reduced the number of distinct letters from 28 to 18 (not counting positional variation). Also English has many more phonemes than it has characters to represent them. It uses digraphs, such as sh ch, th and ng, the th itself can be either voiced or unvoiced, we have no single way of writing a schwa, despite it being a common vowel, and we have no single letter for the s in pleasure or measure. Voynichese could also use the same character to represent multiple sounds or digraphs to represent sounds they don't have a single letter for.
@@voynichtalk > if you were to take Mongolian and convert it to Voynichese, you would get a whole range of "invalid" formations in Voynichese because it does not offer the flexibility required for a natural language ?? I'm going to assume you're not proposing Mongolian isn't natural. So, of course Voynichese wouldn't map perfectly to Mongolian - mostly because it probably wasn't designed for Mongolian, so it would need the occasional determiner or radical, like when Akkadian or Elamite were written in Sumerian cuneiform...granted, even Sumerian didn't write parts of their words (such as the endings)
@@Pining_for_the_fjords Arabs, however, use EVEN MORE symbols than European languages when writing using Latin script, substituting numbers like 3 or 7 for sounds English doesn't have. Ever seen their chats? Looks like 13375p34k. If anything it's East Asian languages that need less letters for all sounds, if you replace hieroglyphics with letters. I.e. if you write Japanese with Romaji, you have stuff like V/B or R/L being the same sound. It still won't go down to 13 but would be easier to condense than Arabic.
@@KasumiRINA Your japanese example is reversed - Japanese only has one sound that is "in between" R and L, so it gets used for both when trying to map English into Japanese using katakana. Same for V/B - hence バイキング (baikingu) for all-you-can-eat buffets ("viking" restaurants - that is, Swedish smorgasbords).
tbh idk much about voynich, but it really reminds me of the "conlang" i made when i was like 14 it hadn't had much characters either, and not really much unique words, which made it very unprecise and forced lengthy sentences i've basically taken random few absolutelly simple words gave them 50+ meanings depending on context and sentence construction, and the words around it, bolted on top of it somewhatish czech grammar (im czech), while it had it's own logographic alphabet, i quickly dropped it in favour of tiny latin (a c č e ë(later e and ë i made just positional variants of sound) i l k n ň o r ř s t (& x = ks) + lenght marks but those were 99% predictable), and even then those were very rigid, like "in" or "co" being almost everywhere, vowels i often removed from middle of words, because they were pointless it couldnt even begin to represent any concrete ideas, without borrowing words, only very abstract things, and in the end i've ended up turning it just into shortening thing but it was fun trying to boil meaning down to nothing i'est {netkv}, ans s'est inin cotexka; s'est sin ans i'cotinintanin loř i'costans' which could be boiled down to: i' st' {netkv}, ns' s' st' in n' o' ks' ka; s' st sin ns' i' o' in n' t' n' loř i cos t' s' (' means word was shortened, no matter where, and even then it's pointless since shortening means removing vowel, so vowelless word is already 100% shortened) or rewritten as: i est {netkv}, ans sin est in in cot ex ka; sin est sin ans i cot in in tan in loř i cos tan sin (i be {netkv}, and this be my(in in) not-alive-speech; this be bad and i not-have(inintanin [someone's-someone's-work/verb-someone's]) idea i pefective-work it) (yes calculator was my inspiration, i was bored in class) idk anything about conlanging, i dont know any and that was result of me being bored and trying not fall asleep during math, but why couldnt some similary minded mediaveal monk do something similar out of boredom, if i extraced all the nonsense from my exercise books i written with this, it could end up manuscript, which even i wouldnt be able to decipher, because i forgot the context, like i could get only very abstract ideas out of it hmm yes this is where alive person talked about whether life is bad or good with not alive person, or this is where maybe human said time is bad edit: also the drawings, my old excersise books are also filled with weird sketches, funnily enough often also weird plants
technically it's binary with timing, ternary, or from a computer perspective it's an encoding on binary composed of sequences of 10, 1110 and various runs of 00s :D
Fun video! One little note- when talking about Hawaiian, you showed a book in Hawaiian Pidgin, which is an English-based pidgin language and not the same as Hawaiian!
To be fair to both Japan and China, yes, the Japanese approved kanji list is something like 2,045 characters (they occasionally add a few), but they keep a few hundred more 'in reserve' - for proper names, and historical and religious documents. They are not part of the school curriculum, but they are there for scholars. The 2,045 JouYou kanji are expected to be known by all schoolkids by the time they leave school. But there is no way Chinese schoolkids are expected to have memorised 50,000 hanzi! I don't know what the exact figure is, but I'd imagine it is a lot closer to the Japanese number than the full total of all hanzi. (Actually, I just looked it up. Chinese schoolkids are supposed to have memorised about 4,500 hanzi by the time they graduate - not 50,000!).
The Jouyou kanji (which currently consist of 2,136 kanji) is just a list of kanji that are considered common in everyday language. Japanese government documents use only these kanji, and will write a word with hiragana/katakana if it does not use one of these characters. News media also often limits itself to these characters. But they're by no means the only commonly used kanji in the language. A college educated Japanese adult probably knows double that. Smaller Japanese kanji dictionaries will contain 5,000-6,000 unique characters, and larger ones up to 20,000. I imagine this 50,000 character is something more like that- just a near exhaustive list of every character ever recorded (of which both languages have many that are never used).
@@link99912 Arigatou! So it is 2,136 now? It was 1,945 when I was learning Japanese 20 years ago! But Chinese was so much easier - each hanzi has ONE SOUND/READING! Because each kanji has many readings (some in on'yomi and many in kun'yomi), they are actually harder to learn than hanzi!
@@link99912 China has a similar list of common characters which numbers 8,105. So you can expect that's the norm, if not higher. It has to be higher than Japanese because that's the only script it uses. Japanese gets a lot of utility out of the characters it does use by having multiple readings, and distinguishing by context or in combination with other characters or kana, and it can easily minimise them by choosing to write in kana, especially informally. It works very differently in Chinese. Also, I don't think it's exhaustive since it's very easy to create new characters and this was freely done in the past. Total recorded characters would definitely be way higher than 50,000, as that would cover thousands of years of texts. And it really depends what you're counting and what period. The way to write the same word changes to different characters over time when conflicts arise. I mean it's different words and languages in play too. There are many variants and characters no longer in use. Characters have been standardised, conflated, or removed from use for the sake of ease of learning. Some of the simplified characters are also just existing easier-to-write variants standardised. Japanese has the same. However, Chinese characters can be reduced to a smaller number of characters as the reason there's so many is that they are easily constructed from other characters, using their sound and/or meaning in some way. And this is not the same as radicals for indexing as that includes contrived non-character components. To get a simpler idea, look at Korea's Hangul. It is essentially an alphabet you could write linearly as-is, but it follows the square arrangement of Chinese characters to combine them into one unit as a syllable. Like a line within a line, which is merely following the norm of the writing they knew (Chinese characters) and greatly regularising it as an alphabet, but it also clearly delineates syllables which might be useful. It's not too different in Chinese characters but a lot more complicated, because the parts are not a simple, regular alphabet. Still, it makes distinct characters and works.
My theory is that the manuscript was made by Edward Kelley to grift Emperor Rudolf II in some way, with techniques inspired by the Enochian scrying that he did with John Dee for many years prior. He probably spent a few days analyzing a page or so from a random book to determine some basic character order and frequency patterns that exist in actual language, concocted some kind of simple algorithm based on his observations that could generate plausible mysterious-looking text from dice rolls, made many pages of this gibberish text on paper, transcribed it onto some cheap old parchment he bought off a monastery, and finally drew a bunch of bizarre fake alchemy-inspired diagrams all over it.
The two arguments against that that I see are 1) to come up with such a complicated ruse all at once seems hard. Surely there'd be other earlier stepping stones on the road? Though plausibly we wouldn't have found them yet. 2) to come up with this technique and use it only once, again, surprising.
@@lqr824 Like I said, it could have been inspired by the complex Enochian scrying that Kelley did for several years beforehand, that would have been the stepping stone; also, Kelley died shortly after getting patronage from Rudolf II, so he wouldn't have gotten a second chance to try it, and besides this is such a complex scam that I don't think Kelley would have bothered in the first place if the mark wasn't a literal Holy Roman Emperor.
I concur, it was an elaborate scam -- if you want to be rewarded with a fortune and are trying to scam a king with his own secret agents and cyphers, it better be elaborate.
In Standard German, the J is functionally a different form of I. The only difference is that the glyph I is always used before a consonant or at the end of a word, and the glyph J before a vowel. Somehow it still became established as a separate letter of the German Alphabet. (Which of course is now very useful to write French and English words.)
no, there's a qualitative difference in the sounds they make compare, dunno, adjazent, adiabatisch; or Madjaren (no morpheme boundary like in adjazent), adiabatisch
Not really. J represent the glide /j/ like the word "ja" and words spelled with I at the begining will have a mandatory glottal stop like the word "IKEA" or "Idee". If "ja" was spelled like "ia" it'd be pronounced like /ʔia/.
Though in capitals, I and J have not been distinguished until recently and it's still a convention (albeit a somewhat dated one) to write Il as Jl for better legibility (e.g. write Jllustration instead of Illustration).
What if the author decomposed it even more than into phonemes? Let's take English for example. The only difference between b an p is that b is voiced and p isn't. Similarly,t is a "devoiced" d, k is a "devoided" g, s is a "devoiced" z, and in a sense, f can be seen as "devoiced" v. Moreover, h might be considered a devoiced a. So if there's a devoicing character that modifies the following letter to its devoiced version, you'd save at least 6 letters at the cost of one additional letter.
Well the phoneme model didn't really exist then, though it is obviously based on intuition and the alphabet, I hardly think it's some underlying structure. Following it closely can be too limiting if you want to make a reduced writing system, but you have the right idea. You can just do whatever as long as words remain distinct enough. There are a lot of options as there's no need to adhere to a set system, you can use any spellings or patterns as long as the words are distinct enough to be functional (it's also fine if context is sufficient to distinguish words). It could be as elaborate as you like. And indeed to make such a reduced inventory of letters, it's likely not such a simple system with simple rules. Another way to distinguish words is how you use spaces. If you force/interpret certain words to be written as one, then it is disambiguated by what it is put with or what it isn't put with. Just as a single letter used an abbreviation is made clear with spaces where it might otherwise cause conflicts. I have made some reduced letter pseudo-phonemic writing systems for my dialect of English before. Last one I made had letters that cover a group of vowels and diacritics added to those letters for specifying as needed. But even then I conflated two vowels to reduce the number of diacritics since strictly phonemic means you need dedicated letters that have barely any utility in distinguishing words. I'm still inclined to believe Voynich doesn't encode a natural language. At best a half-baked conlang, more likely just nothing.
I still think it's likely a manufactured language for an alchemical text, which is why it is its only example. It likely only needed to be read by a very small handful of people. With a limited numbet of sounds, it would sacrifice brevity, but it could still communicate information. Or it could just be gibberish constructed with a set of rules produced for some wealrhy collector of unusual manuscripts 🤷
What about a situation like with pre-modern kana in Japanese during the Edō-period, where the same sound can not only be written with completely different kana derived from different kanji with a shared reading, they also each come in multiple variants of how exactly they are realized. And oftentimes, a single given text wouldn't just stick to one of the possible kana variants/expressions, not even inside a single sentence. It is common to find multiple kana chosen seemingly at random or based on aesthetic preference or a better flow of uninterrupted writing. This is true for many different kinds of texts, hand-written as well as woodblock printed mass-produced texts - like cheap proto-manga based on folk tales or historic heroes.
yeah but in order to have proper ascii text you would need to make the values have fixed spacing. so if A is 1, you would have to make it something like 00000001 and b 00000010, which might as well be different symbols, because how would you know that 11111111 is actually that and not 1,1,1,1,1,1 or 11,1,111,11
@@eduardopupuconIT guy here, you may want to see into Shannon encoding :D It's what powers most of our compression these days Also, you have Morse code, which has technically two symbols, but in reality three: dot, dash and space.
ooh, and what if voynich text is actually a word list randomized from a small set, encoding meanings kabalah style, with the meaning itself derived from sum of numerical values? a very bad but simple example, using a numeric value of 1 for all letters, would be: a lun d' y sole il 13 11 42 polybius square for "car" encoded on what looks like a part of the sentence
Only one thing is certain: Voynichese is *pretending* to be a simple alphabet, and anyone who reads it as one falls for the trick. Any letter-for-letter deciphering just yields babble.
The written text isn't my area, but I've always wished someone would investigate the ways that medieval western missionaries were taught to write the languages of any regions they were going to, especially between the 1290s and 1350. Obviously no-one had time to learn to write Chinese the way an educated Chinese did (etc.), but is there any record of what languages were taught and of how a Latin missionary or trader might write foreign languages? Not necessarily to write or pronounce perfectly - they might ignore tones for example - but to pronounce words at the level of, say, a trader's pidgin or something of that kind. *Sigh* It's the sort of research that needs deep-digging, would probably mean spending weeks on end in France or the Vatican archives, and even then could turn up a blank, so I'll just have to hope someone else - someone with skills I lack - develops the same bee in their bonnet one day and finds time to look into it.
Hmmm, anyone consider a possible double cypher yet? Like maybe the glyphs represent nondecimal numbers which then combine in two for each letter? Add in the odd unique word start/final forms and merged glyphs (eg uj > y) and you could get something Voynichy.
Are you suggesting that despite the limited letters it may actually have just as many phonemes? Somewhat like how computers can save full english text using only binary? I suppose that can work in theory but it seems impractical as it requires more text for the same amount of information. The advantage of something like chinese writing, where there are thousands of symbols, is that you can potentially write a lot of information with very few symbols, the disadvantage of course is that it is difficult to memorize all those symbols. I suppose the impracticality can be explained as intentional obfuscation but I am not sure why we should assume the writer had any intent to obfuscate the information when there are such clear (even if not high quality) illustrations that seem detrimental to obfuscation of information if we assume those images directly relate to whatever the book may have recorded.
I remember listening to a podcast a few years ago where they implied that the statistical variance prove that there was something there, because the knowledge of how stats worked wasn't available at the time so it must be based on something. But damn yeah, if it were pairs and you thus suddenly have hundreds of words or the like to pick from...
I have a photo copy of the voynich manuscript. I have a lot of respect for you and others who still keep working to crack what the heck this book is. Thank you for still searching, and thank you for making this video. 🌞
13 letters is in no way "much too small to function". Take out all vowels, write "TS" instead of "C" or "Z" and "KS" instead of "X" etc., combine B + P and D + T and Q + K + G into one symbol, and then count. I got 12 distinct sounds from our alphabet after this operation. That remaining symbol could be used to "harden" B to P and D to T and G to K., just as an example
N ths csmpl m ttmpt t rprsnt rglr nglsh cmmncshtn wth thrtn lttrs. Gvn ths mnt f dt fr tcst ds mc cmmncshtn dffclt bt ths cn b lrnd prtt cvvcl. It is absolutely possible.
That's a good point. In my original draft of this video, I also mentioned that systems like Morse and binary code work with two symbols. Some people, especially in the olden days, learned to understand Morse code by ear. Now I originally made this video with the typical "I solved the Voynich!" people in mind, and they never think about it the way you do - they just kind of start reading it like a regular alphabet, which doesn't work. Additionally, even in a system like the one you describe, Voynichese would still have a huge entropy problem (too high predictability) - but I will have to revisit this in another video.
@@voynichtalk Thanks for the reply! I was never really interested in the Voynich Manuscript, to be honest,, but I grew up with two alphabets (Latin and Cyrillic), learned two more later in life (Arabic and Georgian), plus I am from a culture with a very strong oral tradition (I'm Serbian), so naturally the claim that 13 letters would be "way too small to function" just sounds very wrong to me :D I would argue that vowels are overestimated, it's consonants that make the difference, and we basically have the above mentioned 12 consonants as a base line, if you will, with each of these consonants being pronounceable in a variety of ways Maybe the symbolic meaning of 12 and 13 (which probably derives from the fact there are 12 moons in a solar years, with 13 moons every 33 years) also feeds into this. "The number twelve carries religious, mythological and magical symbolism, generally representing perfection, entirety, or cosmic order in traditions since antiquity." (wiki) I wish you many more viral hits and a lot of success in general! All the best!
@@voynichtalk I just realized that it might be confusing to talk about 12 consonants, without defining them, so here they are: B + P D + T and combinations like TS and DZ F + V G + K + Q H - with huge variety that would include glottal stop and Ayin J L M N R S + Z (like in "sun" and "zodiac") but also sounds like the English TH Š + Ž and combinations like TŠ and DŽ Maybe I'm wrong, but to me it seems that those are the basic sounds. The specific pronounciation of each has a lot of variety, not only if it is voiceless or not.
Looks more like a musical annotation than a language. For example the characters in the first line could be settings for the strings like E, A, D, G, B, E or D,A,D,G,B,E tunings for a guitar - or like a modern clef. You have characters in music notation that tell you to repeat a certain section, which could be the same as the characters you mentioned at the end of the lines. The characters with long tails could simply be longer played version of the same note or the same note on a different string ....... etc etc. And if you look at the illustrations they look more like the sort of thing you would find in a book of tunes. They could be illustrative of the musical style or the lyrics to be sung with the melody - which is why they appear random ( different illustrations for each tune ) - and sometimes mystical ( a song of an imaginary place etc.).
I don't expect to crack the code by any means, but I wanted to mention something on the off chance that it helps liguists explore a new area that hasn't been thought of yet. There is a pseudo whistling language in the Canary Islands. It isn't a true language because all it is is just whistling in a pattern reminiscent of an actual spoken language (think humming lyrics rather than singing them and others being able to understand what the song is). There are 12 tones to a scale in western music. Outside of western music there can be many more than that. Of interest is that arabic music had a 17 tone scale during the 13th century. Perhaps as to keep the information hidden, the writer of the Voynich manuscript could have simply utilized an 18 tone scale, assigned letters to them, and wrote them out to help them memorize what the actual words were. Song is one of the oldest ways to memorize information after all. Loved the video and loved how you broke down a lot of the nuance in the problem of figuring out what language it was written in.
That may not be as crazy as you think - I know several scholars who are considering some kind of constructed language as the most likely scenario. This would certainly explain why we haven't cracked it yet.
mi kama tawa sitelen ni la, mi pilin e ni: "sitelen luka luka tu wan? toki pona li jo e sitelen luka luka tu tu la, ona li pona." mi wile pana e sona ni, taso mi lukin e toki sina :)
It's insane that the voynich script is so clearly structured in a way that would be hard (though hardly impossible) to fake if you weren't trying to convey actual information, but also struggles to plausibly respresent language. I genuinely can't even begin to guess what set of constraints would lead someone to invent something like that.
I feel the same way about it. That's why I have my doubts when people think it was just made for a quick buck or any theory assuming low effort. It's a well-developed system. Admittedly, it evolves throughout the text, but the overall principles remain the same.
@@voynichtalk so more like a personal cypher between in-group or minimalistic language used for specific technical purpose? Notice how a lot of pictures show herbs, so it might be doctors handwriting, which, to my knowledge, nobody was able to decipher to this day.
@@KasumiRINA we really don't know yet at this point :/ until we can read it, both options of some low-entropy but meaningful system and nonsense filler remain open. I do hope people don't get too hung up on the 13 letters now though - this was just my attempt to really drive home the fact that you can't just go simple substitution on it without specifically targeting statistical oddities.
I wonder if that final S form comes from the Greek S. Something you seem to be missing in your understanding of writing is that they don't always try to accurately represent speech and it's completely arbitrary what speakers of a language consider different sounds and variants of the same sounds. For example Younger Futhark reduced the 24 Elder Futhark symbols to 16. Or in Arabic many characters are built on other characters with diacritics, or in Hebrew there are characters that don't have corresponding sounds anymore, they just help with distinguishing homophones in writing. There's also the script they use for Tigrinya, where there are major motives that depict consonants and minor variations of those depict vowels. Context does a lot of work. English uses 26 letters for about 45 sounds, Danish uses 29 letters for maybe 56 sounds, yet both are perfectly legible. You're rarely ever in doubt how you should read words like "read" or "lead" when they are in context.
You are right, I did not go too much into cases where letters and sounds don't match one to one. This was a conscious decision; there was already so much to explain, and these videos really work better if they're not 40 minutes long. More importantly though, the issue you describe was much less prominent in the Middle Ages. For English, you may have heard about the Great Vowel Shift: en.wikipedia.org/wiki/Great_Vowel_Shift . Basically, The language underwent significant change during/after the process of spelling standardization, which is why *modern* English is notorious for not sounding like its spelling. Also notice that digraphs are usually employed because the language needs to represent sounds *in addition* to the phoneme/sound combinations inherited from the Latin alphabet. They are generally not used to reduce the size of the Latin alphabet. The point I was trying to make is that especially in the Middle Ages, the general rule of alphabets was a 1 to 1 correspondence between sound and shape. Any additions, like diacritics and digraphs can be seen as expanding upon this system for the needs of the language in question.
@@voynichtalk It also depends on the sounds of the language it's supposed to represent. Something you missed in your alphabet shooting (great animation btw.) is that in the Latin alphabet G is not the third letter because when they first made the Latin alphabet, they didn't distinguish a voiced velar plosive yet, therefore C (which stood for a voiceless velar plosive) was close enough and G is a later addition (basically the letter C with a little Γ added to it). It could be possible to analize voiced/voiceless (or unaspirated/aspirated) pairs as variants of the same sound, that one letter is enough to mark. You could also think of R and L as variants of a single fluid sound. This way you could use for example A, B, D, E, F, G, H, I, L, M, N, O, S, V, vig vvd admidedli lvk vild (which would admittedly look weird). For example Old Norse dropped one third of their alphabet and still managed. You also brought up Arabic, where many letters are modifications of the same symbol, for example ح ج خ or س ش etc.
Fascinating video, I think you're onto something. Have you considered how the first circular diagram in the book (page f57v, or page 114 of the pdf file) has a repeating 17-character sequence in its second circle? This suggests to me that we are dealing with a 17-letter alphabet rather than an 18-letter one here.
Such sequences are certainly fascinating and might indeed offer a clue. But It won't be straightforward. Several of the glyphs in the sequence you refer to are extremely rare or even unique, and conversely other common glyphs of the MS are missing :/
In Poland we have sounds that are writed by two or three leters like: zi si ci dz dź dż dzi drz rz ch cz sz But also "normal" 1 leter / 1 sound like z i s c d z ź ż r h So by 10 leters i got 22 sounds! (Some of them no konger have diffrent sound but had them in the past) Also we have variation of leters like i show before z - ź - ż Other leters like this are ó, ś, ć that looks like o, s, c but have diffrent sound than them. Diffrent language have letters like ö ô õ œ ø ō ò ó just on leter o So ther are posibilities
@@Axacqk that's an excellent question. Strangely, most theorists think exactly the other way around, having glyphs represent syllables rather than having digraphs represent phonemes. There are a few things to unpack about digraphs. One is that they kick the can down the road for the entropy question. Where first you had one glyph frequently following another glyph (the suspected digraph), you now have a digraph that *still* occurs in a predictable context. Second problem is that words would get incredibly short. So then you'd need to wonder if spaces are real. Which would mean words aren't words. But then what are they and where do we even begin... Either way, these kinds of questions are much more useful than "could it be Judeo-French?" :) I played around with this a couple of years ago, in case you haven't seen this yet: herculeaf.wordpress.com/2020/10/19/entropy-hunting-bigger-and-better/
@@voynichtalk Words could be short if the language is isolating. Also, I may be wrong on this, but my intuition is that in languages that use digraphs, sounds represented by digraphs are often variations of sounds represented by single glyphs and differ from their single-glyph counterparts by one phonological feature, so it's not surprising that they are similarly contextually restricted.
Why digraphs though? Usually digraphs arise when the writing system doesn't have enough symbols. But as Voynich is not borrowing a script, it's counter intuitive to use digraphs.
@@voynichtalk Spaces could just as likely be separating syllables; look at a language like Tibetan, where the space-equivalent 'tseg' demarcates syllables, with no way to demarcate words themselves.
@@BryanLu0 It's intuitive if your native language already does it, e.g. you know of "sh", "ch", "zh" in English or "sz", "cz" in Polish or "mp" in Greek (for b in positions where beta would be pronounced as v).
There are so many possibilities of what this document could be. It doesn't necessarily have to be language in the traditional sense. It could be notes or a code like hexadecimal numbers. Or it could be that the actual information is hidden in a big pile of random characters and images to distract people, and only those who have the key or understand it can read it, similar to a book cipher. Different sentence endings could mean that different keys are used, or that only one type of line needs to be considered, etc.
to me this almost seems like a case of medieval conlanging to make something like a universal language, something with so few sounds as to be pronounceable by just about any medieval language, but written in this script as perhaps a cypher to keep it from being read for whatever reason.
A while ago I was at a "conference" of Ido and Esperanto speakers. We had a couple of students from Korea and one from China there, too. After a bit of healthy debate and using French and English as Lingua Francas we talked a lot about which sounds are easy to understand in all the languages and established a basic set of sounds and phonemes that are nice to do, easy to understand and reproduce and we came up with like 16 or so "letters" to build an alphabet from and plenty of exclusionary rules. Of course nothing came out of it, but the idea to be able to teach something to Xhosa speakers and Swedes, to Mandarin speakers and Koreans was fun.
Well, a system such as the Younger Futhark had only 16 letters, and 4 of those were vowels. If the Voynich script is an abjad, it should work for some languages with small consonant inventories, right? Maybe it's even written in a language that was not native to the author, in which case he might have failed to distinguish some of the phonemes (such as the uvular and pharyngeal fricatives in Semitic languages).
Fascinating video, this is my first foray into the Voynich manuscript but this is pretty interesting stuff. I myself am a Linguistics major with an interest in phonology, morphology, sociolinguistics, and historical linguistics, and while my interest in writing systems is purely just a hobby I couldn't help but create a theory while watching this video. I by no means think these ideas haven't been thought up by someone else so I'm curious what the counter arguments are. First of all I think the problem with the data on the Abjads is that semitic languages tend to have quite a few consonants anyways, for example including vowels in the ISO romanization of Arabic I believe puts the count at 38 letters, which is on the higher but not highest at all end for alphabets. A problem for this of course is that we can ask if Abjads are so popular amongst Semitic languages because of their phonologies, their significant consonants inventories and their vowel inventories that don't from my knowledge get higher than 6, or is their popularity just because that's where they were used. And either way the Perso Arabic abjad in particular has been used for many non semitic languages with *very* different phonologies (like my own Punjabi). Additionally in Arabic long vowels are written with letters otherwise used for consonants, so that can bring down the number without reducing as much information as no vowels at all. Let's play around with modern standard Italian for this. If I write it with the version of the Perso Arabic script using the version used to write Punjabi known as Shahmukhi (not for any historical reason, just because it's a functional abjad used for a language with a phonology closer to Italian than Arabic's and I know it). Basing it off of the Italian alphabet when it would lessen the number, not it's phonology (so /tʃ/ and /k/ are both written with and /ɡ/ and /dʒ/ are both written with ) I ended up with 16 letters, with and being and , , and being . Now the first thought I had actually was "what if they've just merged voiceless and voiced consonants for some reason". And I don't actually mean that the spoken language it represents did, but just that the writing system doesn't differentiate between /p/ and /b/ for some reason. Now this would be weird and I'm not sure the motivation would be but in most spoken languages while this is a big merger it's not one that makes communication impossible. In fact many spoken languages have undergone this sound change and been fine. Two examples I can think of right now are the Tocharian languages which are Indo European and the Oceanic branch of Austronesian. Once again I don't mean that the spoken language this represented had this change, just that this orthographic merger wouldn't render the text unreadable. Doing this merger with the previous Shahmukhized Modern Italian we bring it down to exactly 13 (I genuinely didn't expect it to be 13 when I started writing this). Lastly if we go back to version 1 of this but instead use what I know of the Perso Arabic script for Arabic itself we do actually still get some opportunities for some mergers, for example Arabic doesn't have any letters for /p/, /ɡ/, and /tʃ/ so that's more mergers we can do, though this only actually brings it down by 1, with /p/ merging either with /b/ or/f/. If we also merge the sounds of the letter (/ts/ and /dz/) with the other affricates that now brings us down to 14. I don't actually think that it's modern Italian written in a cipher of Italian written in modern Arabic, but I just wanted to show that an Abjad writing a *European* language can get a lot closer to that 13 letter goal.
Hi! My background is also in language (MA in Historical Linguistics) but it's been a while since I graduated :) The thing with abjads is that we sometimes see Voynich solutions along the lines of "what if it's vowelless Latin?". These ignore the fact that you can't just omit the vowels from any language. Well, you can, but it wouldn't be a good idea. My naive understanding is that vowelless systems work exceptionally well for Semitic languages, because those have a recognizable consonant "frame" for words, and the required vowel can be filled in easily from context. How would you say Punjabi functions as an abjad without too much ambiguity? Are vowels represented in some way? (I know nothing about this). To be honest, when I was making this video, I had a limited audience in mind of people who were already a bit aware of the Voynich issues. Most of my other videos got like a 100 views, from enthusiasts. Somehow the algorithm picked this one up though. If I had known that this would happen, I would have included more context. There are more issues than what I mention here. See Rene Zandbergen's site on the entropy issue for example: www.voynich.nu/extra/sol_ent.html
@@voynichtalk thanks for the response. My family is from Cáṛdā Punjab (the India side) where Gurmukhi, an abugida is used instead so just warning that while I did take lessons for learning Shahmukhi, the adapted form of Perso Arabic, I didn't grow up using it so for me it naturally feels unintuitive and like it creates a lot of ambiguity, especially when Gurmukhi is kinda hyper adapted for Punjabi (to the point that it's actually pretty bad at writing Sanskrit, unlike Devanagari). But of course the truth is that Láíndā Punjab (the Pakistan side) has a population of 127 million, and while not everyone is literate, a lot are, so more people use Shahmukhi than Gurmukhi, so it definitely works, and it's definitely grown on me. Punjabi has a 10 vowel system of iː uː ɪ ʊ eː ə oː ɛː ɔː äː In Classical Arabic there's a six vowel system of [i u a] that can be long or short and it's actually only short vowels that aren't written, long vowels are written, but using the letters for the consonants /j/ for [iː], /w/ for [uː], and [ʔ] for [aː]. Also short vowels technically can be written with diacritics but they're very rarely used, usually only in education or for calligraphy. In Punjab instead all the long front vowels ([iː eː ɛː]) are written with the letter for /j/, back vowels with the letter for /ʋ/ (there is no /w/), and [äː] with the letter for the glottal stop which neither Punjabi nor Italian have and I totally forgot that so my count should be 1 higher. The system definitely works even if I find it a bit clunky, and if you'll notice Punjabi's long vowels are pretty much identical to modern Standard Italian's vowel inventory. So my thinking is it seems that it *could* be possible that the manuscript is a replacement cipher but for writing a medieval European language in an Abjad such as the Arabic script *or* the Hebrew script which I can't really talk on because I don't know it at all. Medieval European languages definitely were written in the Hebrew script a lot and while moddern Yiddish has spelling reforms that turns Hebrew into an alphabet that's modern Yiddish and it doesn't have to be Yiddish. Alternatively just from the very quick skim of the wikipedia page that I did it seems that the manuscript is believed to be from Italy (which is half of why I chose Modern Standard Italian for the example, the other half being the similarity of its vowels to Punjabi) and there was a Muslim presence in Southern Italy in the earlier medieval era and Malta still speaks a Semitic language to this day, and trade between Italy and North Africa still continued, so it's not unreasonable for an Italian to know the Arabic script. Either way I'm definitely gonna check that site out as well as the rest of your channel because all this seems very interesting and I don't expect that I just solved this problem in 5 seconds, I just think abjads can't be so quickly counted out.
in danish we have 3 vowels that are special to Danish and Norwegian (and to some degree Swedish ) they are Æ, Ø and Å or æ, ø and å - Now this is the interesting bit : å used to be written aa - so an alphabet can have more letters than it has in the "font" . Actually the reason that Danish is so very hard to learn is, that it has so many vowels, it has many more than it has letters for them, an a can be pronounced in many ways short, long and so on. You could use a language with fewer letters than it has sounds ! Younger Runes - the futhharken had 16 letters but 2 where R's and 2 where a's - but the rune Úr represents the letters u, o, ø and w and the letter or sound of æ can be represented by the runes Óss, Ís and Ár. I know the value of this younger rune alphabet, was that it didn't matter if what dialect of Norse you spoke, all could read the text (if you could read) . But it does imply, that it is possible to get along with a smaller alphabet, than the letters you need, if you either can combine letters to ad letters to the alphabet, or if you have a conventions that lets you use one letter for many sounds (I as I og j) and so on.
I don't have to mention (OK2, I am mentioning) that sound (basic=phoneme) and letters of the alphabet used to represent the phonemes (the script) are two different things though very much related. We must also consider that, ancient languages could have used one symbol (letter) to represent more than one sound/phoneme. Example: Arabic language. Though the alphabet has 28 letters, this is relatively new because it is easier to have 28 unique letters to represent 28 unique phonemes. In the classical Arabic orthography, no letters are used to represent short vowels (or vowelless i.e. sukun). In addition, there are no dots used. Hence, a س (seen) could be a seen (س) or a sheen (ش). In modern script س is always seen because ش = sheen. But in ancient Arabic script a س could be a seen or a sheen (not dots in ancient script)! How did the ancients know which one is the right phoneme? Context! There is also a letter that has 3 phonemes! This is also where the positioning variants could help to identify which phoneme is meant! So, if you count the ancient Arabic script, discarding those that have dots you end up with 15 or 16 or 17 (not 28) depending on how you count them... OK2 - not the unlucky 13!
As someone who's interested in this subject, I've got a few questions forthcoming, number one, why is everyone convinced voynichese is a written text, what if it's simply a very long poem? Which leads me to question number two, why are researchers so distracted by the drawings? They could be there to throw us off from the real meaning behind the context right? Or maybe the illustrations are also a part of the meaning of the scripture? question number three, why don't these researchers try to figure out how these symbols came about? I mean, if - a 21 Century (wo)man like myself- were to create a coding or a language, he/she would be influenced by a writing system that he/she would already know right? And the influence would arguably depend on the psychological attributes of exampled person. What I'm trying to say is, wouldn't we accomplish more by trying to find where most characters come from? An example being that, we identify a certain amount of similarities between Voynichese and Alphabet X, say alphabet X was only used at a certain geographical location, we could hypothesize that whoever wrote or created these symbols was in contact with the certain geographical location where alphabet X was used. Like honestly, some of the symbols are giving "Pahlavi alphabet" vibes, an alphabet used by the Iranians during the Sassanid rule. I'm just saying maybe, whoever devised these symbols was at one point exposed to the Pahlavi alphabet. Now we can look for the people at that time who might have been exposed to such an alphabet, who were also exposed by medieval European art. Or we can choose another alphabet which looks like voynichese's, and try to figure links between people and places. Could this idea possibly work? I'm just trying to ask questions here and I'm not proposing any solution.
@@Chevalier.D.Artagnan question 1: poetry is often mentioned, but I haven't seen anyone take it further than broad speculation. This text is huge, so a poem of this size is probably narrative, in which case we would still expect text-like behavior on the glyph level. Question 2. It's the other way around: if you want to get published academically in Voynich research, you have to research the text. Top level imagery research is lacking. I am convinced that the images deserve just as much attention as the text, but they are currently not getting it. Question 3. This has been researched, but I have not yet delved into it myself. You are right that this is an important question. Maybe a topic for another video.
@ Thanks for taking the time and actually responding! I’d love to see a new video on Voynich alphabet and its possible origin. I always thought that since researchers tend to categorize the manuscript based on the illustrations, the pictures got more attention. I didn’t realize actual academical research is always done on the scripture. Another question which I forgot to ask, it’s highly unlikely, but have you ever considered the voynich manuscript being a translation of sorts? A very old book/context translated and hidden into whatever the pages are hiding from us? I’m so enthralled to have found such a wonderful TH-cam channel.
I have only taken a few casual glances at some of the manuscript's pages, but it seems to me that certain "words" or groupings of symbols appear many times in the same sentence, creating a nonsensical effect akin to what is seen in James Joyce's Finnigan's Wake - "The sun basket basket goes basket the basket basket basket nadir basket to." and such. Has an inventory of "words" been made and then, giving these simple labels like "A, B, C" (etc), sentence shapes studied? I bet that if one did this, there will be pages and pages of sentences like "AABAAACBDAACAAABBEDAA", i.e., gibberish or, more charitably, some form of ritual incantation, even song.
I like how the voynich manuscript inadvertently illustrates how connected to the language itself the writing system can be. Makes me think of Yuri Knorozov, having to manually put together the glyphs, words and semantic meaning from accompanying pictures while deciphering maya
8:05 I remember watching a video explaining that such pairs as B/P were seen as the same in some language through which the Latin alphabet had gone, but the distinction was restored except C, which originally was voiced K, remained a variant of /k/ and G was invented to distinguish
Here is my first attempt at a constructed 13 letter alphabet. There are 9 consonants which generally have a hard and a soft sound., There are 3 vowels which are ronounced either at the front of the mouth or the back. The remaining symbol is a modifier which alters both consonants and vowels. Here are the consonants b / b* (b/v) p / p* (p/f) d / d* (d/z) t / t* (t/s) g / g* (g/l) k / k* (k/r) s / s* (sh/zh) h / h* (th/dh) n (n/m/ng) The vowels are o / o* (bog/bag) e / e* (bug/beg) y / y* (book/big) The n is special. It is treated as an n by default. Context clues can include; a following consonant, a silent preceding consonant or simply doubling the letter to "nn". "gnome" could be variously "noenn", "noebn" or "dnoebn". The h sound is missing. You could let hard h serve as both dh and th letting soft h* stand for the h sound. Ch is a compound "t-sh", here spelled ts. J as in "jump" is "d-zh", spelled "ds*enp" (or simply "dsenp"). I am not sure how vague a system of vowels can be. Enlish spelling probably a terrible model.
If there are some signs which we know are combinations of two other signs, when perhaps some of the alphabetic signs are themselves combinations? such as, perhaps, the \ and \\ in the bottom row at 10:45 (which may or may not correspond to part of the bottom of the angled 2s on the right-hand-side of the top row at that screen capture of 10:45. ...which may or may not suggest one or both of the 2s is a variant or typo - since they're so rare - of the \) of the bottom row. i hope that made sense
Could it be an early shorthand? Pitman and Gregg shorthands have about 30 characters each. But in one of them it appears to be only 15, as each symbol is written either bolded or regular. In the other to the untrained eye it would appear there are only 10 characters, as each of those has three variants that are shown by their length.
Maybe. The low entropy makes this unlikely though. Voynichese glyphs are very predictable (if I give you one glyph, you have a good chance of predicting its neighbors). This is the kind of stuff that _asks_ for a shorthand or abbreviation treatment, rather than looking like the result of it.
Rotokas has 12 letters (and only 11 sounds, so they could have 11, but they write an allophone of one sound differently) /p/ /t~t͡s~s/ /k/ /b~β/ /d~ɾ~l/ /g~ɣ/ /i/ /u/ /e/ /o/ /a/ Hawaiian uses either 13 or 18 if you count the long vowels writen with a mark above the vowel as seperate (which they are not, but even if counted it’s still small) /p/ /k~t/ /m/ /n/ /h/ /l~ɾ~ɹ/ /w~v/ /ʔ/ /i/ /iː/ /u/ /uː/ /e/ /eː/ /o/ /oː/ /a/ /aː/
@@the_linguist_llPart of my argument in the video is that these languages are all found in the same part of the world, and their small alphabet is the result of the unique combination of having a very limited phoneme inventory *and* the alphabetical system of Western missionaries. But yes, in theory you are right. Note that my video was mostly prompted by the "I solved it!" crowd, who never take any of these things into account.
The theory that I always thought was the coolest (and I have absolutely no expertise in this area so it’s almost certainly not correct) is that it’s the work of a monk trying to copy a script he has absolutely no knowledge of like Chinese.
My favorite is that it’s just someone’s fun project, I heard of that theory in some TH-cam video, and the person who brought it up mentioned some elaborate but nonsensical project of their own. That it’s equivalent to finding fanfiction on some obscure book or experimental music.
I don't have any personal opinion about the Voynich script in particular, but seeing how tiny book pahlavi script of the zoroastrian priests and kufic arabic (without i'jam) was, I don't see how it couldn't be possible to have such a small number of symbols. The way that these letters can only appear in very oddly specific positions in the word though, is a significantly more valid point.
@koengheuens Has anypne tried the Hangul (Korean) / phonetic angle? Theoretically that could require a very simple alphabet: "voiced labial fricative" (Eng. V) is only three categories with naturally around two to three values each, contrasting with "unvoiced velar plosive" (Eng. K) or "voiced dental nasal plosive" (Eng. N) for example. And you can get stupidly specific, e.g. a "unvoiced labio-palatal nasal fricative" in context which would take several paragraphs to explain to a lay person.
Has anyone considered fatigue for the gallows that appear in the first lines only? Maybe they got tired of distinguishing whatever features they differ from the others on? Or maybe these are foreign sounds and only occur in discussions of what the thing is called in other languages?
i think its possible that this was shorthand meant intentionally to obfuscate the meaning a little, but leave enough that people familiar w the text and the types of thing the material was likely to say could read it. in that case, we can accept a bit of loss of precision. taking the latin alphabet, discarding the new or redundant letters (something like q, w, x, j, v, c), leaves us with 20. then if you begin merging similar letters, you can fairly quickly get to something workable. (p t k s - b d g z) and you get 16. (e o - i u) and you get 14. that leaves you writing english like this: "that leaues yuu (uu)reten(k) enklesh leke thes". a language which has less use for the new/weird letters fares even better. "eso es lo que le voy a decir" "eso es lo (ke, koe, even k) le uoy a teser". it seems possible that this was some kind of cypher of the latin alphabet but it may also have been derived from some other system. depending on the language, they also may have been able to get away with not writing vowels, as is the case in some languages, and seems more likely if it's intentionally a bit hard to read. "'s 's th cs 'n sm lngwgs, 'nd sms mr lkly f ts 'ntntnlly ' bt hrd t rd"
The other day i tried transcribing a section of the voynich manuscript, and yeah ngl it definitely feels like the author would occasionally invent new letters and then forget to use them and then throw some in after remembering they hadnt used them in a while.
I figured out the Voynich, but your language understanding is off, Spanish can be minimalized to 7 cons and 2 vowels. Lots of languages have rules and short cuts. then the writer can also have a artistic design to the text writing. If the rules of the language are done right you can have a language that has less then 1000 words and can create words that have meaning, like Ancient Greek, not new Greek. but the Voynich language has some odd today ideas of language, which i figured out because my made-up language followed the same rules.
I've only ever known of the Voynich Manuscript as an historically peripheral oddity. It's existence took up residence in that small back corner of my brain set aside for such things, eg. bananas with seeds, piezoelectricity, and, well, Voynich's manuscript. This discussion was in equal parts illuminating and mind-expanding - bedankt! I do have a question, though, more personal that professional, perhaps: why 'zee' and not 'zed'? (ta.)
Thanks! I'm planning to make more videos like this, so make sure to follow the channel if you're interested. On the pronunciation of "Z", my way of speaking English leans closer to North American - zed is more of a British thing. In Dutch we do say "zet" :)
Why not duble letters being capable of mean different sounds like with ma means a and am meaning b, there would be thousands of possibilities of words but has anyone tried to categorize the language standard? Like what letters are more plausible to come after other letters?
Thanks for putting this together, the lack of characters really does cause problems and the observation that one of them appears mostly as an 'end of line' character is very odd. This isn't a file, it's a written document where 'end of line' is implicit unless it's doing something like a dash to indicate a word is broken over multiple lines. I'm not a linguist, but one observation of a very limited character set language is Arabic rusum or even English written without the vowels. This could bring the 18 down to 14 by dropping A, E, I and O? Edit: I see you addressed this in your first video. Subscribed.
It's possible there are more glyphs than you realize: perhaps height and width have a differentiation that isn't reflected in the common character tallies.
True, and some people have suggested ideas along these lines. However, I feel madness creeping in when I actually try something like this for myself. We're looking at an artefact where someone 600 years ago took an animal part (feather) to write on another animal part (skin). Variation is everywhere. But determining whether it's meaningful is not straightforward.
If there's so few letters, then maybe it's a tonal language? I can't think of any that would have used an alphabet at the time though, I know Vietnamese does it, but that came from French rule. I wonder what language contact would have allowed it... Regarding the positional variation, I can see why it would be the case: Perhaps the top-of-page characters were used to denote new paragraphs/points, and final letters were used to help distinguish where writing begins and ends. Hebrew script does this a looooot, for example.
@@PlaguevonKarma if a tonal language used a small alphabet, wouldn't you expect a lot of diacritics though? Maybe a tonal language as understood by non-native speakers, so with a lot of information loss. I don't know though, the whole system is _so_ structured and rigid. I really feel like the clue will be to first understand how the system works and only then move on to identify the peculiarities of any linguistic content.
@@voynichtalk Not necessarily, Burmese is tonal but it is not indicated in informal Burmese orthography. That being said though, the chance of the 'language' in which the Voynich manuscript was written being tonal is pretty low considering where it's located
@@voynichtalk mmm, I don't know. There are languages (as the other commenter mentioned) that eschew diacritics; it's not unheard of at all. Dzongkha and Tibetan don't imply tones in their syllabic writing, for example - if my idea is correct then I'd say it's most similar to Dzhongkha (literally, not linguistically, goodness no!) in how it is written, as Dzongkha uses Tibetan script to write syllables without denoting either of its two tones. Moving outside of tones, though, not all stress-timed languages note said stress; Spanish does, English doesn't, Yiddish doesn't, so on, so forth. Speaking as someone who speaks Mandarin, I can read Hanzì just fine, even though, on their own, they mean very little - the phonetic components are outdated and the radicals are very abstract. Despite that, I can still remember what a text is meant to say. The same applies with English - it's not very representative, it doesn't note the word stress at all, but I can still say it. I don't think the messaging in Voynichese would be lost - it seems like it would give the same amount of memory-jogging as English, if anything. Given what you've said, with structures this constrained, with such a small set of ways to say...anything...I think tones are worth considering, if it makes it make sense. The only question, then, would be...why was it written in Europe? As the other commenter said, we don't usually have tonal languages, and I don't think we'd have travellers going around writing things if there wasn't a "home fort" with more samples. Anyway, I hope this means something to you, your work is amazing! ❤
the ones that only (is it only? you say mostly and im interested in what the exceptions are) occur at the end of the line look very similar, and a bit like an ampersand, which makes me think they're actually variants of each other, and function like a hyphen, for marking that the word continues over the linebreak.
First video of yours I've seen. Some guy posted a video series that made a pretty convincing case that the Voynich manuscript is written in A sister language to Romani (the language of the "Gypsies" who called themselves Romani). This video series was unfortunately taken down because he wasn't very happy with the quality of it, and is now focusing on making a bunch of videos on topics that are related to his methods before redoing the series, I barely has any videos still, which is A real shame. But he built his work off the back of Steven Bax. The methodology use was to look through the astrological sections of the book, and identify potential candidates for constellations that are labeled. Since these constellations are likely to have the same name in multiple languages, you can make predictions as to which sounds each letter likely represents by seeing what sounds that name has in different languages. This is the type of method that was used to decoder linear B, where linear B was decoded by identifying sets of words that were only found on certain islands, too much then the assumption was these words must represent the names of cities and towns, since those are the most likely words to only have relevance in specific geographical locations. I bring this up to you to see what your thoughts on this would be.
Must have been Derek Vogt (Volder Z). I am not too familiar with his work, so I can't talk specifics. What I do know is that there was this period in Voynich research (around 2016 or so?) when Stephen Bax' theory attracted a lot of attention and other researchers, and Derek was one of those closely associated to him. However, nowadays most if not all researchers I talked to have abandoned belief in the usefulness of Bax' theories. The reason why Romani is attractive is that it has low entropy (relatively predictable). It also offers a (somewhat vague) cultural reason for the manuscripts' existence. Traveling outsider culture, no own written tradition... So I understand the incentive to look into this. However, what I expect is that it still won't be enough, and that many issues will remain unaddressed. It will be just another theory cherry picking words that happen to work without offering a structural solution. Moreover, since the target language is not well attested, there will be a lot of wiggle room for the solver. I'm saying this without having seen his theory though. When any new video comes out, we'll discuss specifics.
Ive used voynich script to write salishan languages before. Its not too small, just the end text deviates substantially from voynich. Unless I separate syllables via spaces. Then it looks like voynich text. Anyway I absolutely used positional variation when adapting the script. It looks great! It's just there is another class of writing systems that separates phonological features into discrete symbols, its not featural but not an alphabet. "Kalis" is a great example of that type of script. I wouldn't be surprised if the original voynich used a similar strategy.
The problem we have with Voynich, at least as I remember from a few years ago, is that it obeys statistical characteristics of written languages (that is, the most-frequent glyph appears so many times relative to the next-most-frequent, etc.) that the authors of the manuscript couldn't have known about. So it has very strong arguments that it _is_ some form of written language, just not a substitution cipher for an alphabet. So if it's not a substitution cipher, what might it be?
The Younger Futhark has 16 letters, and in old Norse we pretty much only see ᛦ (usually in English and in Gothic written as z, in Scandinavia usually transcribed as Ʀ) at the very end of words, and by roughly 1000 AD it started to fall out of favour, and we see R replace it entirely in some runic writings, giving some inscriptions only 15 letters to play with. I could see Thors (th) and Tyr (t/d) being simplified into one, and As(A) Ár(Á/o) I am not saying this because I have a theory. I am just a bit surprised that this rather small(but also highly unstable) alphabet is glossed over
@@voynichtalk yeah the point still stands there, very interesting video (i'd heard of the voynich manuscript like one time but this video and the part 1 are very interesting)
I assume it has, but Has the repetition of words in combination/relation been looked at? Kinda like how mailbox doesn't have it's own word, but instead is a combination of words.
@@AlexandHuman that doesn't happen very much at all. One of the reasons is precisely that glyphs tend to occur in the same position within the word. Simplifying a bit too much, one could say that words have a beginning, a middle and an end, an each is picked from a fixed category (or can be left open). The reality is more complex, but this should give you a feel for how it is. There are certain character groups that can be repeated, but it's all pretty unnatural and insufficient. If you're interested in this, I'd recommend Marco Ponzi's post here: medium.com/viridisgreen/two-voynich-word-models-c10a89e8ea01
what if many characters are ambiguous to (eg) voicing or other features? my first thought was the Younger Fuþark alphabet which i've just confirmed has 16 characters after having eliminated the more typical alphabetic distinctions of Elder Fuþark. i disclose that this is my first encounter with your channel and i have only nominal familiarity with the Voynich Manuscript, so my suggestion is likely unfounded. interesting topic though, great analysis. as a linguistics nerd i have subscribed :)
That's cool, I didn't know there was a runic script with only 16 characters. The main problem with Voynichese is its low conditional character entropy; this means that when I give you one character, you have a pretty good chance to predict the next one. Now of course if we collapse the alphabet like I did in the video by assuming positional variation, this situation might improve. It will make certain glyphs equivalent to one another, allowing them to appear in a wider context. The problem is that nobody has come up with an optimal way yet to do this. What I did in the video was just collapse the empty spaces in the table, without thinking about which glyphs are most likely variants of each other. So to test your idea, we would first need to find the best way to rewrite Voynichese, then obtain a long text in Futhark, and then compare their entropy values. It sounds interesting, but also a lot of work :)
I made a similar comment above about Mongolian, just to suggest that this is still plausible as an alphabet. I'm curious to learn more about the low positional entropy. It could be that frequent combinations are essentially a single letter-or collocates in the mind of a doodling artist.
I'm wondering whether it's the nature of the text itself that is responsible for these problems. What if it's a bunch of prayers, chants, recitations, etc., with words being repeated over and over within sentences that repeat over and over in structure? Certain words might then only be used when introducing a new topic. Certain words might indeed only appear at the end of lines, assuming those are the ends of sentences. It might be an alphabet with the few glyphs capable of appearing in multiple positions functioning as modifiers, producing very different changes depending which glyph they modify. The writing system of the Voynich manuscript may not even be a complete writing system, capable of writing any text. It may only be capable of writing... the Voynich Manuscript. Has anyone tried turning all the different chunks of text (what appear to be words) into unique symbols, so that 'xhun' becomes ^ and 'xgun' becomes & (I hope that makes sense), and then using a computer to check the frequency of those against the frequency of words within texts like the Hail Mary, Lord's Prayer, Prayer of St. Francis, etc.? A kind of mass lexical comparison, if you will.
@@MrNyathi1 regarding bigraphs, I did experiment a bit with that some years ago: herculeaf.wordpress.com/2020/10/19/entropy-hunting-bigger-and-better/ As for the idea that the text is poor rather than the writing system... Well there are many different words, a normal amount even. And they are distributed similarly to a normal text, with new vocabulary being introduced at a normal rate. So I guess that would argue against such a proposal, at least without further modifications.
Can the Latin alphabet be constructed from the Voynich characters? Is it possible a non-latin-reading person got their hands on some letter construction templates broken down into individual strokes or elements, thought they were full letters, then combined them in a way that they thought looked like latin writing?
Did someone already do a similar analysis, but based on words and sentences instead of letters? In my naive understanding, words would have way less options for different meanings than characters. It could then be used to further narrow the options for alphabets (if it resembles an actual language) or further disprove character substitution.
My theory is it’s a Nestorian translation for the Bible written in a script for a Tibetan or Mongolic language. It just so happened to have ended up in Europe.
How many "words" are there? I wonder whether the whole word is used like a logographic scrypt? A distribution graph of the word frequency might be able to tell that. Or perhaps the words are letters if you get my meaning, like each collection fo characters might map to iust a single phoneme?
If I recall correctly, the number of different word types is somewhat on the high side but within a normal range. Certainly way too much to represent any form of alphabet. I guess they could be like logographic script, where for example each word is like one Chinese character. To be honest I don't know enough about this. Wouldn't it essentially equal the use of a code book?
If some letters appears only at the end, are there any possibility that letter next to it (penultimate) does NOT have same value? Same goes for the letter at the begining of the word (and second one). Are there any doubled leTTers at aLL?
@@cocobill doubled letters are notoriously rare. Although this depends on what we see as a letter! You often get a bunch of i-shapes at the end of words, but it is unclear whether those are single glyphs or not. Most letters generally don't double, and it is possible to read the glyphs in such a way that there is hardly any doubling. About the penultimate letter, that's usually pretty easy to predict if you know the last letter. There are always only a few common options. It's all much too rigid. Check Voynichese.com if you want to have a look.
Have you looked at the Thai writing system? It’s highly regulated, but there are basically no exceptions to those rules, like a handful. And a *lot* of letters depends their sound on their position in the word and proximity to other letters and tone signs. Might be something to consider. But yes, probably one of the languages with most individual consonant and vowel glyphs.
It's certainly a good idea to think in this direction. I looked a bit into digraphs here: herculeaf.wordpress.com/2020/10/19/entropy-hunting-bigger-and-better/ In my opinion, at least two problems remain. One is that entropy is still really on the low side even if you go wild with n-grams. Another is that if you assume enough n-grams to approach a semblance of normalcy, your words get very small. So that means abandoning spaces, which opens up a whole new can of worms. There's also the question to what extent something like this is practically feasible.
Greek has 24 letters. Four η ω ψ ξ lack a sound of their own. Two θ δ are highly specific to greek. Six σ ζ χ γ φ β are fricatives, and it's rare for a language to have all of those. A greek speaker could use very few letters to write a foreign language. They would write basque with 13-14 letters (of course it wouldn't be great but that's how they would do it.) Hebrew has 22 letters. Four ע ח כ ת are rarely used for non hebrew words. Four ז ס צ ש are sibilants. A language with only one sibilant, like finnish, would be written by an hebrew speaker with less than 13 letters. In fact one could do with only 12. Arabic has 28 letters. Of these, nine ق ح ع ص ث ض ذ ظ ط are very rare in non arabic words. That leaves 19 letters, including ه ف س ش ج ز غ خ eight fricative letters, which again is a lot. A language with few fricatives, like finnish again, would be written by an arabic speaker using around 13 letters. Basque would need the same amount. Pretty much any language without /z/, /x/ or /f/ could be reduced that way.
Interesting. When I started studying the Voynich years ago, something like this was my first idea: one language "reduced" through the lens of another language. It explains some of the poverty of the system and provides a real-world scenario where it could have emerged. Since then though, I have learned more about the statistics behind Voynichese, and I don't think that would work anymore. Even when positional rigidity is improved like I did by assuming positional variants, the system is still much too rigid. If we were to rewrite Voynichese in the way I discuss in the video and then compare it to "Finnish as written by an Arabic speaker", we'd see that the letters in the Finnish would still have a much greater freedom of movement.
I think this reduction of the Latin alphabet would still be functional: A 1 E 2 F 3 I 4 K=G 5 L 6 M 7 N 8 O 9 P=B 10 R 11 S 12 T=D 13 V 14 One more reduction, either A=O or M=N would reduce it to 13. Use both, and you could afford U =/=V.
Thoughts on some of these acting like recitation marks or something? Kinda like what’s seen in the Torah or Quran where there are special symbols every now and then which are there to limit the recitation to a more exact way ? Such as pitch, small changes to words etc which wouldn’t happen normally in speech
Something like that is possible, especially since certain glyphs (the one-legged "gallows") are so frequent top line and in the first position, kind of like paragraph markers. So it is possible to come up with explanations for strange distributions. But the point I'm trying to make is that all of this takes away from your regular alphabet, leading to a problematically poor glyph set for the actual "work".
Hildegard von Bingen's language was often mixed in with normal Latin, right? So perhaps a lot of the text is meant to be gibberish [for whatever reason] and only a few words are meant to be decipherable. So, for example, and it doesn't have to be like this but perhaps the characters on 8:44 are actually the signal for a "real word". Apparently this is one of the few surviving passages of her language: "O *orzchis* Ecclesia, armis divinis praecincta, et hyacinto ornata, tu es *caldemia* stigmatum *loifolum* et urbs scienciarum." The marked words are in the unknown language, the rest is readable Latin. What if something like that is at work?
People mention this option sometimes, but I'm not aware of any serious research into it. Now if you're creative enough, anything can be converted into music, so the challenge would be to find a plausible conversion method that produces period-appropriate music.
I haven't read up on the Manuscript in a decade or two. My recollection is that the glyph set is small but the word assortment matches a typical European language in frequency, and further has some words only in some subject pages? If that recollection is right, then it must be a system whereby each set of glyphs maps to a word without glyphs having necessarily any phonic meaning. I'm no linguist but I don't know a writing system like that. But nothing else would fit.
Those two symbols that are always at the right margin might be something like a hyphen, splitting a word into syllables, with the word starting on one line and being finished on the next.
My Voynich theory isn’t wrong. I haven’t got a theory.
That's a good start!
My theory isn't disproven either, and "it's clearly not a simple substitution" kinda leans into it. My theory is that we're missing something fundamental and basic. Like looking at it upside down or only reading the symbols that correspond to prime numbers or something.
@@MrThedrachen Do we actually have proof of its providence? I would think looking at the techniques and knowledge and ideas and intentions of the time it was made in would be a good starting point. I think they made conlangs then too, so you could look at those, compare, and try to see the tendencies there.
@@skyworm8006 No. It's provenance is uncertain, though the most common theories tend to assert it is Italian in origin.
Carbon dating suggests the document is ~500 years old
@@beardedemperorCloser to 600 according to carbon dating, between 1404 and 1438
When a monk's joke goes too far 😭
I pity the 30th century archeologist attempting to make sense of the media in that printed movie prop store that Adam Savage visited last year
👍
It is clearly a fake document made on a lark.
What celibacy does to a mf
Probably, but even in assuming so, the methodology is still puzzling. Imagine going through months or years of effort to not include a single cipher
Man I love going from never having heard of a thing to having strong opinions in less then 20 minutes
i love that phenomenon too
How are you watching this vid if you’ve never heard of the VM ?? Genuinely asking it’s very strange
@@Arado159 Hunh, I wonder why the TH-cam algorithm thought old videos from 'The Onion' would have anything to do with Thu Voynich Manuscript?
@@Arado159 Ok, I guess it's a mystery for now. Which is appropriate because the Voynich Manuscript is a mystery and has been a mystery for nearly either a century or more.
@@jeremias-serus I got here from Conlang videos
Maybe they were intentionally avoiding certain letters as a writing exercise, like how that French art collective wrote a whole novel without the letter E.
You mean "la disparition" by Georges Perec ?
Maybe they wrote without vowels like ancient Hebrew. That could also explain why it's so hard to crack the code
@@MaxOakland You might wanna watch the video, he goes over that in detail.
@@smuecke I watched part of it but had to save it for later. Thanks for letting me know he goes over that
@@MaxOaklandyou mean like modern day arabic?
If it was a simple substitution cypher, we would have figured it out by now. My personal hypothesis is that it is an early example of a conlang. Medieval monks were often philosophers and linguists, and a conlang experimenting with a minimal sound inventory would make it incredibly hard to figure out without related texts or something akin to the Rosetta stone.
Hildegard von Bingen made one 200 years before... maybe this could be nuns from her abbey trying to refine what she made... the lack of sounds actually makes sense with "speaking in tongues"-type mystical speech production, which generally have a simpler phonology.
This has always been my belief. It’s just a monk’s art project, but the key to his conlang has been lost so nobody can read it anymore
I know it's been many years since this has remained a mystery, but in the details is a link that seems to include a translation and seems to be quite aware of provenance...
It would appear to be solved, and isn't a cypher but a language.
The fact the video is showing letter variants and positioning seemed to indicate a major breakthrough happened, so I checked the details.
Muh medieval monk konlang
This is the most intelligent discussion of the Voynich manuscript I've seen. Most videos that talk about it, only mention a few obviously wrong ideas about it, then throw their hands up saying it's too hard....or something like that. I hadn't thought about positional forms, though, as someone who's done calligraphy, I should be more aware of it. This might not give us the answer to what this thing is, but it sure tells us more useful things about it. I subbed. Hope your other videos are just as good.
Thanks! Most of my other videos were meant with long-time Voynich enthusiasts in mind, but I'm planning to put out more videos like this one in the future.
perhaps it's a "lorem ipsum" type of thing (i.e. just a placeholder) to be replaced by real text later by the developers
One of the current theories that cannot be ruled out is that it is something like that: filler text. But then not "to be replaced", because it took a lot of effort to pen this stuff in and there seems to be quite some thought behind the system. The world was different before copy-paste :)
Asemic writing
@@voynichtalk it's singing music for women. Chanting for healing and herbs.
Or something like that. There's a very good video on TH-cam.
Voynich Manuscript Explained
Audrius Plioplys
@@Albtraum_TDDCYea right. You found THE solution. Uh-uh.
@@FrancoisTremblay it's not me, it's :
Audrius V. Plioplys is a Canadian artist, neurologist, neuroscientist and public figure of Lithuanian descent.
He makes some very valid points on repeating words and stuff. There's no way it is a language and a normal text. It's too samey.
Go watch the video and then make your criticism.
13 distinct letters doesn't mean, that it can only repræsent 13 phonemes.
There could be digraphs.
Some letters could influence the pronunciation of other letter, like how final e often changes the prævious vowel in English.
Multiple sounds could be repræsented by one glyph.
Ahh .. it's Kennyspeak 😆
That is true, but all languages I‘m aware of that have extebsive use of this tend to already have „not the smallest alphabet“ but we could get surprised. We once thought Mayan couldn‘t be an alphabet because it had way to many symbols (like in the hundreds) but it was. Just that they had like a dozen symbols for each phonem and then while writing tried to avoid letter repetitions like „can‘t use that specific e, already used it a word ago“… but all these thinga get complicated when we get to only 13 letters. It’s just extremely few…
A good example of this kind of ordeal would be youger futhark, a runic system with only 16 runes, most of them having to play double or triple duty (shoutout to úr for representing 4 or 5 sounds (3 vowels and then either u as a vowel or w as a semivowel) (also shoutout to íss and ár for both being able to represent e, together with their other roles)).
@@PhantomKING113…no. Those are not digraphs.
æ
What if the script doesn't represent all the sounds of the language? The younger futhark has less symbols than sounds that probably existed in the era. Arabic script before introduction of punctuation had much less symbols than its phonemes. Even in case of Latin: its alphabet was borrowed from Etruscans and wasn't differenciating between voiced and voiceless consonants. Which was not a problem in the etruscan, but was insufficient for Latin.
The principle of not representing all the sounds of a language is certainly often encountered (like the two "th" sounds in English). What we tend to see in Voynich solutions though, is that this ambiguity needs to be taken too far, to the extent where solvers are basically filling in what they want to read. They devise a system that's flexible enough, so that when they think a plant image is a carrot, they can certainly turn some word on the page into something that reads "carrot" in some language.
Another issue is Voynichese's rigid structure. You might be able to read it as Futhark, but if you go the other way around, you're in trouble. Encoding a text to Voynichese would result in all kinds of invalid words, because languages (even with small alphabets) need much more flexibility than what Voynichese has to offer. I go into this in my previous video: th-cam.com/video/XSTM8Gixai4/w-d-xo.html
I wonder if a vowel-less script, like many early semitic scripts, may explain the gap, but I doubt it.
@@josepheridu3322 he already covered this in the video
@@gavinrolls1054 true, my attention was split hehe
@@voynichtalkthey could have a letter that represent something below a phoneme. A single voice marker alone can get rid of the need of several distinct voiced character.
The grapheme analysis here is excellent, but I don't think this means Voynich can't be an alphabet. It reminds me remarkably of old Mongolian script. Mongolian has about as many graphemes, along with positional variations and a restricted set of characters that can occur at the end of a word. In Mongolian, many phonemes are represented by a combination of graphemes, graphically indistinguishable from a sequence of letters. Old Mongolian had 7 vowels, but only 3 vowel letters; word initial /ö/ for example was written like 'aui'. Also, many graphemes ambiguously represent various phonemes. Mongolian script made no distinction of t/d and usually k/g, despite these being quite distinct in the language. In old manuscripts, a single 'tooth' mark could be an /e/, and /n/, one of two 'teeth' constituting an /a/ or just as well a /q/ or /g/-I'm not kidding. Sequences of frequent graphemes could infact represent one letter, spaces may demarcate syllables rather than words, as in the Phags-pa alphabet. I'm not saying Voynich is Mongolian, just that it's plausible as an alphabet on similar principles.
Those are good points, and in fact I agree that it is most like an alphabet (I spent some time in the video explaining why I think other writing systems are a worse match). Like I said in other comments though, the implied alphabet size must be considered in tandem with the entropy problem. Voynichese "word" structure is incredibly rigid and predictable, even after certain operations have been done to improve the situation. My prediction would be that if you were to take Mongolian and convert it to Voynichese, you would get a whole range of "invalid" formations in Voynichese because it does not offer the flexibility required for a natural language. I'm not familiar with Mongolian though, so I could be wrong.
Interesting. Also Arabic used to be written without the dots above or below some letters, so the same letter would represent two or three sounds, which would have reduced the number of distinct letters from 28 to 18 (not counting positional variation).
Also English has many more phonemes than it has characters to represent them. It uses digraphs, such as sh ch, th and ng, the th itself can be either voiced or unvoiced, we have no single way of writing a schwa, despite it being a common vowel, and we have no single letter for the s in pleasure or measure. Voynichese could also use the same character to represent multiple sounds or digraphs to represent sounds they don't have a single letter for.
@@voynichtalk
> if you were to take Mongolian and convert it to Voynichese, you would get a whole range of "invalid" formations in Voynichese because it does not offer the flexibility required for a natural language
??
I'm going to assume you're not proposing Mongolian isn't natural. So, of course Voynichese wouldn't map perfectly to Mongolian - mostly because it probably wasn't designed for Mongolian, so it would need the occasional determiner or radical, like when Akkadian or Elamite were written in Sumerian cuneiform...granted, even Sumerian didn't write parts of their words (such as the endings)
@@Pining_for_the_fjords Arabs, however, use EVEN MORE symbols than European languages when writing using Latin script, substituting numbers like 3 or 7 for sounds English doesn't have. Ever seen their chats? Looks like 13375p34k. If anything it's East Asian languages that need less letters for all sounds, if you replace hieroglyphics with letters. I.e. if you write Japanese with Romaji, you have stuff like V/B or R/L being the same sound. It still won't go down to 13 but would be easier to condense than Arabic.
@@KasumiRINA Your japanese example is reversed - Japanese only has one sound that is "in between" R and L, so it gets used for both when trying to map English into Japanese using katakana. Same for V/B - hence バイキング (baikingu) for all-you-can-eat buffets ("viking" restaurants - that is, Swedish smorgasbords).
tbh idk much about voynich, but it really reminds me of the "conlang" i made when i was like 14
it hadn't had much characters either, and not really much unique words, which made it very unprecise and forced lengthy sentences
i've basically taken random few absolutelly simple words gave them 50+ meanings depending on context and sentence construction, and the words around it, bolted on top of it somewhatish czech grammar (im czech), while it had it's own logographic alphabet, i quickly dropped it in favour of tiny latin (a c č e ë(later e and ë i made just positional variants of sound) i l k n ň o r ř s t (& x = ks) + lenght marks but those were 99% predictable), and even then those were very rigid, like "in" or "co" being almost everywhere, vowels i often removed from middle of words, because they were pointless
it couldnt even begin to represent any concrete ideas, without borrowing words, only very abstract things, and in the end i've ended up turning it just into shortening thing
but it was fun trying to boil meaning down to nothing
i'est {netkv}, ans s'est inin cotexka; s'est sin ans i'cotinintanin loř i'costans'
which could be boiled down to: i' st' {netkv}, ns' s' st' in n' o' ks' ka; s' st sin ns' i' o' in n' t' n' loř i cos t' s' (' means word was shortened, no matter where, and even then it's pointless since shortening means removing vowel, so vowelless word is already 100% shortened)
or rewritten as: i est {netkv}, ans sin est in in cot ex ka; sin est sin ans i cot in in tan in loř i cos tan sin
(i be {netkv}, and this be my(in in) not-alive-speech; this be bad and i not-have(inintanin [someone's-someone's-work/verb-someone's]) idea i pefective-work it)
(yes calculator was my inspiration, i was bored in class)
idk anything about conlanging, i dont know any and that was result of me being bored and trying not fall asleep during math, but why couldnt some similary minded mediaveal monk do something similar out of boredom, if i extraced all the nonsense from my exercise books i written with this, it could end up manuscript, which even i wouldnt be able to decipher, because i forgot the context, like i could get only very abstract ideas out of it
hmm yes this is where alive person talked about whether life is bad or good with not alive person, or this is where maybe human said time is bad
edit: also the drawings, my old excersise books are also filled with weird sketches, funnily enough often also weird plants
So! It was YOU that wrote the manuscript.
@@betweenprojects damn ig must have been my past life or something lol
a few Malay wound up in europe and tried to replicate their plants from memory and make a new script
I have no idea how long I've been giggling at the sound that's a carriage return sound of a old typewriter.
Morse code is an even smaller alphabet (it’s literally binary) yet it can still represent all letters and numbers in the Latin system.
technically it's binary with timing, ternary, or from a computer perspective it's an encoding on binary composed of sequences of 10, 1110 and various runs of 00s :D
@@pawelabrams yes this is a good correction/corollary
Fun video! One little note- when talking about Hawaiian, you showed a book in Hawaiian Pidgin, which is an English-based pidgin language and not the same as Hawaiian!
@@colbymcarthur7871 I know! I noticed too late, feels a bit dumb. Now on the other hand, I do love its title :)
To be fair to both Japan and China, yes, the Japanese approved kanji list is something like 2,045 characters (they occasionally add a few), but they keep a few hundred more 'in reserve' - for proper names, and historical and religious documents. They are not part of the school curriculum, but they are there for scholars.
The 2,045 JouYou kanji are expected to be known by all schoolkids by the time they leave school.
But there is no way Chinese schoolkids are expected to have memorised 50,000 hanzi! I don't know what the exact figure is, but I'd imagine it is a lot closer to the Japanese number than the full total of all hanzi.
(Actually, I just looked it up. Chinese schoolkids are supposed to have memorised about 4,500 hanzi by the time they graduate - not 50,000!).
漢字强力也。
The Jouyou kanji (which currently consist of 2,136 kanji) is just a list of kanji that are considered common in everyday language. Japanese government documents use only these kanji, and will write a word with hiragana/katakana if it does not use one of these characters. News media also often limits itself to these characters. But they're by no means the only commonly used kanji in the language. A college educated Japanese adult probably knows double that.
Smaller Japanese kanji dictionaries will contain 5,000-6,000 unique characters, and larger ones up to 20,000. I imagine this 50,000 character is something more like that- just a near exhaustive list of every character ever recorded (of which both languages have many that are never used).
@@link99912
Arigatou!
So it is 2,136 now? It was 1,945 when I was learning Japanese 20 years ago!
But Chinese was so much easier - each hanzi has ONE SOUND/READING! Because each kanji has many readings (some in on'yomi and many in kun'yomi), they are actually harder to learn than hanzi!
People who have graduated university know about 10 000 characters
@@link99912
China has a similar list of common characters which numbers 8,105. So you can expect that's the norm, if not higher. It has to be higher than Japanese because that's the only script it uses. Japanese gets a lot of utility out of the characters it does use by having multiple readings, and distinguishing by context or in combination with other characters or kana, and it can easily minimise them by choosing to write in kana, especially informally. It works very differently in Chinese.
Also, I don't think it's exhaustive since it's very easy to create new characters and this was freely done in the past. Total recorded characters would definitely be way higher than 50,000, as that would cover thousands of years of texts. And it really depends what you're counting and what period. The way to write the same word changes to different characters over time when conflicts arise. I mean it's different words and languages in play too. There are many variants and characters no longer in use. Characters have been standardised, conflated, or removed from use for the sake of ease of learning. Some of the simplified characters are also just existing easier-to-write variants standardised. Japanese has the same.
However, Chinese characters can be reduced to a smaller number of characters as the reason there's so many is that they are easily constructed from other characters, using their sound and/or meaning in some way. And this is not the same as radicals for indexing as that includes contrived non-character components.
To get a simpler idea, look at Korea's Hangul. It is essentially an alphabet you could write linearly as-is, but it follows the square arrangement of Chinese characters to combine them into one unit as a syllable. Like a line within a line, which is merely following the norm of the writing they knew (Chinese characters) and greatly regularising it as an alphabet, but it also clearly delineates syllables which might be useful. It's not too different in Chinese characters but a lot more complicated, because the parts are not a simple, regular alphabet. Still, it makes distinct characters and works.
My theory is that the manuscript was made by Edward Kelley to grift Emperor Rudolf II in some way, with techniques inspired by the Enochian scrying that he did with John Dee for many years prior. He probably spent a few days analyzing a page or so from a random book to determine some basic character order and frequency patterns that exist in actual language, concocted some kind of simple algorithm based on his observations that could generate plausible mysterious-looking text from dice rolls, made many pages of this gibberish text on paper, transcribed it onto some cheap old parchment he bought off a monastery, and finally drew a bunch of bizarre fake alchemy-inspired diagrams all over it.
This sounds like something Neal Stephenson would come up with, very informed by recent techniques but plausibly backported into history.
The two arguments against that that I see are 1) to come up with such a complicated ruse all at once seems hard. Surely there'd be other earlier stepping stones on the road? Though plausibly we wouldn't have found them yet. 2) to come up with this technique and use it only once, again, surprising.
@@lqr824 Like I said, it could have been inspired by the complex Enochian scrying that Kelley did for several years beforehand, that would have been the stepping stone; also, Kelley died shortly after getting patronage from Rudolf II, so he wouldn't have gotten a second chance to try it, and besides this is such a complex scam that I don't think Kelley would have bothered in the first place if the mark wasn't a literal Holy Roman Emperor.
@@lqr824 It's not really complicated. People were very interested in deciphering texts already so there's a market.
I concur, it was an elaborate scam -- if you want to be rewarded with a fortune and are trying to scam a king with his own secret agents and cyphers, it better be elaborate.
In Standard German, the J is functionally a different form of I. The only difference is that the glyph I is always used before a consonant or at the end of a word, and the glyph J before a vowel.
Somehow it still became established as a separate letter of the German Alphabet. (Which of course is now very useful to write French and English words.)
no, there's a qualitative difference in the sounds they make
compare, dunno, adjazent, adiabatisch;
or Madjaren (no morpheme boundary like in adjazent), adiabatisch
@@ahG7na4 in your examples the i/j are on different positions in the syllables. I and J are imo really close in pronounciation.
Not really. J represent the glide /j/ like the word "ja" and words spelled with I at the begining will have a mandatory glottal stop like the word "IKEA" or "Idee". If "ja" was spelled like "ia" it'd be pronounced like /ʔia/.
Though in capitals, I and J have not been distinguished until recently and it's still a convention (albeit a somewhat dated one) to write Il as Jl for better legibility (e.g. write Jllustration instead of Illustration).
The dude with "demonic languages" among other screenshots was hilarious
What if the author decomposed it even more than into phonemes? Let's take English for example. The only difference between b an p is that b is voiced and p isn't. Similarly,t is a "devoiced" d, k is a "devoided" g, s is a "devoiced" z, and in a sense, f can be seen as "devoiced" v. Moreover, h might be considered a devoiced a. So if there's a devoicing character that modifies the following letter to its devoiced version, you'd save at least 6 letters at the cost of one additional letter.
Well the phoneme model didn't really exist then, though it is obviously based on intuition and the alphabet, I hardly think it's some underlying structure. Following it closely can be too limiting if you want to make a reduced writing system, but you have the right idea. You can just do whatever as long as words remain distinct enough. There are a lot of options as there's no need to adhere to a set system, you can use any spellings or patterns as long as the words are distinct enough to be functional (it's also fine if context is sufficient to distinguish words). It could be as elaborate as you like. And indeed to make such a reduced inventory of letters, it's likely not such a simple system with simple rules. Another way to distinguish words is how you use spaces. If you force/interpret certain words to be written as one, then it is disambiguated by what it is put with or what it isn't put with. Just as a single letter used an abbreviation is made clear with spaces where it might otherwise cause conflicts.
I have made some reduced letter pseudo-phonemic writing systems for my dialect of English before. Last one I made had letters that cover a group of vowels and diacritics added to those letters for specifying as needed. But even then I conflated two vowels to reduce the number of diacritics since strictly phonemic means you need dedicated letters that have barely any utility in distinguishing words.
I'm still inclined to believe Voynich doesn't encode a natural language. At best a half-baked conlang, more likely just nothing.
@@skyworm8006Or maybe ut encodes an artifical language instead?
this is called a featural system, and is essentially the underlying principle of Korean hangul, which was created in the 15th century.
I still think it's likely a manufactured language for an alchemical text, which is why it is its only example. It likely only needed to be read by a very small handful of people.
With a limited numbet of sounds, it would sacrifice brevity, but it could still communicate information. Or it could just be gibberish constructed with a set of rules produced for some wealrhy collector of unusual manuscripts 🤷
yes lord, look at this rare and exotic obviously real chinese manuscript, that will be only few gold coins my lord
lol someone did a prank in the 1400s and it still echoes
My favorite theory is this manuscript is a very elaborate prank and whoever wrote it is still laughing from beyond the grave.
What about a situation like with pre-modern kana in Japanese during the Edō-period, where the same sound can not only be written with completely different kana derived from different kanji with a shared reading, they also each come in multiple variants of how exactly they are realized. And oftentimes, a single given text wouldn't just stick to one of the possible kana variants/expressions, not even inside a single sentence. It is common to find multiple kana chosen seemingly at random or based on aesthetic preference or a better flow of uninterrupted writing.
This is true for many different kinds of texts, hand-written as well as woodblock printed mass-produced texts - like cheap proto-manga based on folk tales or historic heroes.
Pardon, we have a whole industry running on 0 and 1.
And it is possible to try writing only consonants.
yeah but in order to have proper ascii text you would need to make the values have fixed spacing.
so if A is 1, you would have to make it something like 00000001 and b 00000010, which might as well be different symbols, because how would you know that 11111111 is actually that and not 1,1,1,1,1,1 or 11,1,111,11
@@eduardopupuconIT guy here, you may want to see into Shannon encoding :D It's what powers most of our compression these days
Also, you have Morse code, which has technically two symbols, but in reality three: dot, dash and space.
ooh, and what if voynich text is actually a word list randomized from a small set, encoding meanings kabalah style, with the meaning itself derived from sum of numerical values?
a very bad but simple example, using a numeric value of 1 for all letters, would be:
a lun d' y sole il
13 11 42
polybius square for "car" encoded on what looks like a part of the sentence
Only one thing is certain: Voynichese is *pretending* to be a simple alphabet, and anyone who reads it as one falls for the trick. Any letter-for-letter deciphering just yields babble.
Yes!
The written text isn't my area, but I've always wished someone would investigate the ways that medieval western missionaries were taught to write the languages of any regions they were going to, especially between the 1290s and 1350. Obviously no-one had time to learn to write Chinese the way an educated Chinese did (etc.), but is there any record of what languages were taught and of how a Latin missionary or trader might write foreign languages? Not necessarily to write or pronounce perfectly - they might ignore tones for example - but to pronounce words at the level of, say, a trader's pidgin or something of that kind. *Sigh* It's the sort of research that needs deep-digging, would probably mean spending weeks on end in France or the Vatican archives, and even then could turn up a blank, so I'll just have to hope someone else - someone with skills I lack - develops the same bee in their bonnet one day and finds time to look into it.
Hmmm, anyone consider a possible double cypher yet?
Like maybe the glyphs represent nondecimal numbers which then combine in two for each letter?
Add in the odd unique word start/final forms and merged glyphs (eg uj > y) and you could get something Voynichy.
Are you suggesting that despite the limited letters it may actually have just as many phonemes? Somewhat like how computers can save full english text using only binary? I suppose that can work in theory but it seems impractical as it requires more text for the same amount of information. The advantage of something like chinese writing, where there are thousands of symbols, is that you can potentially write a lot of information with very few symbols, the disadvantage of course is that it is difficult to memorize all those symbols. I suppose the impracticality can be explained as intentional obfuscation but I am not sure why we should assume the writer had any intent to obfuscate the information when there are such clear (even if not high quality) illustrations that seem detrimental to obfuscation of information if we assume those images directly relate to whatever the book may have recorded.
I remember listening to a podcast a few years ago where they implied that the statistical variance prove that there was something there, because the knowledge of how stats worked wasn't available at the time so it must be based on something.
But damn yeah, if it were pairs and you thus suddenly have hundreds of words or the like to pick from...
I've learned new things about the codex and manuscripts in general, and I've been following the story for 10 years now. Great video thank you !
I have a photo copy of the voynich manuscript.
I have a lot of respect for you and others who still keep working to crack what the heck this book is.
Thank you for still searching, and thank you for making this video. 🌞
13 letters is in no way "much too small to function". Take out all vowels, write "TS" instead of "C" or "Z" and "KS" instead of "X" etc., combine B + P and D + T and Q + K + G into one symbol, and then count. I got 12 distinct sounds from our alphabet after this operation. That remaining symbol could be used to "harden" B to P and D to T and G to K., just as an example
N ths csmpl m ttmpt t rprsnt rglr nglsh cmmncshtn wth thrtn lttrs.
Gvn ths mnt f dt fr tcst ds mc cmmncshtn dffclt bt ths cn b lrnd prtt cvvcl.
It is absolutely possible.
That's a good point. In my original draft of this video, I also mentioned that systems like Morse and binary code work with two symbols. Some people, especially in the olden days, learned to understand Morse code by ear.
Now I originally made this video with the typical "I solved the Voynich!" people in mind, and they never think about it the way you do - they just kind of start reading it like a regular alphabet, which doesn't work. Additionally, even in a system like the one you describe, Voynichese would still have a huge entropy problem (too high predictability) - but I will have to revisit this in another video.
@@voynichtalk Thanks for the reply! I was never really interested in the Voynich Manuscript, to be honest,, but I grew up with two alphabets (Latin and Cyrillic), learned two more later in life (Arabic and Georgian), plus I am from a culture with a very strong oral tradition (I'm Serbian), so naturally the claim that 13 letters would be "way too small to function" just sounds very wrong to me :D I would argue that vowels are overestimated, it's consonants that make the difference, and we basically have the above mentioned 12 consonants as a base line, if you will, with each of these consonants being pronounceable in a variety of ways
Maybe the symbolic meaning of 12 and 13 (which probably derives from the fact there are 12 moons in a solar years, with 13 moons every 33 years) also feeds into this. "The number twelve carries religious, mythological and magical symbolism, generally representing perfection, entirety, or cosmic order in traditions since antiquity." (wiki)
I wish you many more viral hits and a lot of success in general! All the best!
@@voynichtalk I just realized that it might be confusing to talk about 12 consonants, without defining them, so here they are:
B + P
D + T and combinations like TS and DZ
F + V
G + K + Q
H - with huge variety that would include glottal stop and Ayin
J
L
M
N
R
S + Z (like in "sun" and "zodiac") but also sounds like the English TH
Š + Ž and combinations like TŠ and DŽ
Maybe I'm wrong, but to me it seems that those are the basic sounds. The specific pronounciation of each has a lot of variety, not only if it is voiceless or not.
Looks more like a musical annotation than a language.
For example the characters in the first line could be settings for the strings like E, A, D, G, B, E or D,A,D,G,B,E tunings for a guitar - or like a modern clef.
You have characters in music notation that tell you to repeat a certain section, which could be the same as the characters you mentioned at the end of the lines.
The characters with long tails could simply be longer played version of the same note or the same note on a different string ....... etc etc.
And if you look at the illustrations they look more like the sort of thing you would find in a book of tunes. They could be illustrative of the musical style or the lyrics to be sung with the melody - which is why they appear random ( different illustrations for each tune ) - and sometimes mystical ( a song of an imaginary place etc.).
I don't expect to crack the code by any means, but I wanted to mention something on the off chance that it helps liguists explore a new area that hasn't been thought of yet.
There is a pseudo whistling language in the Canary Islands. It isn't a true language because all it is is just whistling in a pattern reminiscent of an actual spoken language (think humming lyrics rather than singing them and others being able to understand what the song is). There are 12 tones to a scale in western music. Outside of western music there can be many more than that. Of interest is that arabic music had a 17 tone scale during the 13th century. Perhaps as to keep the information hidden, the writer of the Voynich manuscript could have simply utilized an 18 tone scale, assigned letters to them, and wrote them out to help them memorize what the actual words were. Song is one of the oldest ways to memorize information after all.
Loved the video and loved how you broke down a lot of the nuance in the problem of figuring out what language it was written in.
It's toki pona!
That may not be as crazy as you think - I know several scholars who are considering some kind of constructed language as the most likely scenario. This would certainly explain why we haven't cracked it yet.
sina toki e seme aaa
Apparently this means it’s a laptop
@@rowandunning6877 ala. ni li ilo pi nanpa mute
mi kama tawa sitelen ni la, mi pilin e ni: "sitelen luka luka tu wan? toki pona li jo e sitelen luka luka tu tu la, ona li pona." mi wile pana e sona ni, taso mi lukin e toki sina :)
the ones that only occur at the end of a line could just be a hyphen that indicated a word was broken across two lines.
It's insane that the voynich script is so clearly structured in a way that would be hard (though hardly impossible) to fake if you weren't trying to convey actual information, but also struggles to plausibly respresent language. I genuinely can't even begin to guess what set of constraints would lead someone to invent something like that.
I feel the same way about it. That's why I have my doubts when people think it was just made for a quick buck or any theory assuming low effort. It's a well-developed system. Admittedly, it evolves throughout the text, but the overall principles remain the same.
@@voynichtalk so more like a personal cypher between in-group or minimalistic language used for specific technical purpose? Notice how a lot of pictures show herbs, so it might be doctors handwriting, which, to my knowledge, nobody was able to decipher to this day.
@@KasumiRINA we really don't know yet at this point :/ until we can read it, both options of some low-entropy but meaningful system and nonsense filler remain open. I do hope people don't get too hung up on the 13 letters now though - this was just my attempt to really drive home the fact that you can't just go simple substitution on it without specifically targeting statistical oddities.
I wonder if that final S form comes from the Greek S.
Something you seem to be missing in your understanding of writing is that they don't always try to accurately represent speech and it's completely arbitrary what speakers of a language consider different sounds and variants of the same sounds. For example Younger Futhark reduced the 24 Elder Futhark symbols to 16. Or in Arabic many characters are built on other characters with diacritics, or in Hebrew there are characters that don't have corresponding sounds anymore, they just help with distinguishing homophones in writing.
There's also the script they use for Tigrinya, where there are major motives that depict consonants and minor variations of those depict vowels.
Context does a lot of work. English uses 26 letters for about 45 sounds, Danish uses 29 letters for maybe 56 sounds, yet both are perfectly legible. You're rarely ever in doubt how you should read words like "read" or "lead" when they are in context.
You are right, I did not go too much into cases where letters and sounds don't match one to one. This was a conscious decision; there was already so much to explain, and these videos really work better if they're not 40 minutes long. More importantly though, the issue you describe was much less prominent in the Middle Ages. For English, you may have heard about the Great Vowel Shift: en.wikipedia.org/wiki/Great_Vowel_Shift . Basically, The language underwent significant change during/after the process of spelling standardization, which is why *modern* English is notorious for not sounding like its spelling.
Also notice that digraphs are usually employed because the language needs to represent sounds *in addition* to the phoneme/sound combinations inherited from the Latin alphabet. They are generally not used to reduce the size of the Latin alphabet.
The point I was trying to make is that especially in the Middle Ages, the general rule of alphabets was a 1 to 1 correspondence between sound and shape. Any additions, like diacritics and digraphs can be seen as expanding upon this system for the needs of the language in question.
@@voynichtalk It also depends on the sounds of the language it's supposed to represent. Something you missed in your alphabet shooting (great animation btw.) is that in the Latin alphabet G is not the third letter because when they first made the Latin alphabet, they didn't distinguish a voiced velar plosive yet, therefore C (which stood for a voiceless velar plosive) was close enough and G is a later addition (basically the letter C with a little Γ added to it). It could be possible to analize voiced/voiceless (or unaspirated/aspirated) pairs as variants of the same sound, that one letter is enough to mark. You could also think of R and L as variants of a single fluid sound. This way you could use for example A, B, D, E, F, G, H, I, L, M, N, O, S, V, vig vvd admidedli lvk vild (which would admittedly look weird).
For example Old Norse dropped one third of their alphabet and still managed. You also brought up Arabic, where many letters are modifications of the same symbol, for example ح ج خ or س ش etc.
Fascinating video, I think you're onto something. Have you considered how the first circular diagram in the book (page f57v, or page 114 of the pdf file) has a repeating 17-character sequence in its second circle? This suggests to me that we are dealing with a 17-letter alphabet rather than an 18-letter one here.
Such sequences are certainly fascinating and might indeed offer a clue. But It won't be straightforward. Several of the glyphs in the sequence you refer to are extremely rare or even unique, and conversely other common glyphs of the MS are missing :/
@@voynichtalk So could the 17-glyph sequence be an abjad and the other common missing glyphs represent vowels? Just a thought.
In Poland we have sounds that are writed by two or three leters like:
zi
si
ci
dz
dź
dż
dzi
drz
rz
ch
cz
sz
But also "normal" 1 leter / 1 sound like
z
i
s
c
d
z
ź
ż
r
h
So by 10 leters i got 22 sounds! (Some of them no konger have diffrent sound but had them in the past)
Also we have variation of leters like i show before
z - ź - ż
Other leters like this are
ó, ś, ć that looks like o, s, c but have diffrent sound than them.
Diffrent language have letters like
ö ô õ œ ø ō ò ó just on leter o
So ther are posibilities
What about digraphs? After collapsing to 13 by positional variation, are there glyph pairs that occur with frequencies typical of digraphs?
@@Axacqk that's an excellent question. Strangely, most theorists think exactly the other way around, having glyphs represent syllables rather than having digraphs represent phonemes.
There are a few things to unpack about digraphs. One is that they kick the can down the road for the entropy question. Where first you had one glyph frequently following another glyph (the suspected digraph), you now have a digraph that *still* occurs in a predictable context.
Second problem is that words would get incredibly short. So then you'd need to wonder if spaces are real. Which would mean words aren't words. But then what are they and where do we even begin... Either way, these kinds of questions are much more useful than "could it be Judeo-French?" :)
I played around with this a couple of years ago, in case you haven't seen this yet:
herculeaf.wordpress.com/2020/10/19/entropy-hunting-bigger-and-better/
@@voynichtalk Words could be short if the language is isolating. Also, I may be wrong on this, but my intuition is that in languages that use digraphs, sounds represented by digraphs are often variations of sounds represented by single glyphs and differ from their single-glyph counterparts by one phonological feature, so it's not surprising that they are similarly contextually restricted.
Why digraphs though? Usually digraphs arise when the writing system doesn't have enough symbols. But as Voynich is not borrowing a script, it's counter intuitive to use digraphs.
@@voynichtalk Spaces could just as likely be separating syllables; look at a language like Tibetan, where the space-equivalent 'tseg' demarcates syllables, with no way to demarcate words themselves.
@@BryanLu0 It's intuitive if your native language already does it, e.g. you know of "sh", "ch", "zh" in English or "sz", "cz" in Polish or "mp" in Greek (for b in positions where beta would be pronounced as v).
There are so many possibilities of what this document could be. It doesn't necessarily have to be language in the traditional sense. It could be notes or a code like hexadecimal numbers. Or it could be that the actual information is hidden in a big pile of random characters and images to distract people, and only those who have the key or understand it can read it, similar to a book cipher. Different sentence endings could mean that different keys are used, or that only one type of line needs to be considered, etc.
Or it could be a scam.
Is it possible that some letters could make several different sounds, like in the Arabic script prior to the introduction of i'jam?
Thank you for taking the time to make this video and share it with us.
(sorry; meant to click Comment yesterday)
What a great video! The analysis is excellent and the explanation is crystalline! Really looking forward to see what else you can unravel!
to me this almost seems like a case of medieval conlanging to make something like a universal language, something with so few sounds as to be pronounceable by just about any medieval language, but written in this script as perhaps a cypher to keep it from being read for whatever reason.
A while ago I was at a "conference" of Ido and Esperanto speakers. We had a couple of students from Korea and one from China there, too. After a bit of healthy debate and using French and English as Lingua Francas we talked a lot about which sounds are easy to understand in all the languages and established a basic set of sounds and phonemes that are nice to do, easy to understand and reproduce and we came up with like 16 or so "letters" to build an alphabet from and plenty of exclusionary rules.
Of course nothing came out of it, but the idea to be able to teach something to Xhosa speakers and Swedes, to Mandarin speakers and Koreans was fun.
Well, a system such as the Younger Futhark had only 16 letters, and 4 of those were vowels. If the Voynich script is an abjad, it should work for some languages with small consonant inventories, right? Maybe it's even written in a language that was not native to the author, in which case he might have failed to distinguish some of the phonemes (such as the uvular and pharyngeal fricatives in Semitic languages).
i really think this is interesting but your voice is so calm i've fallen asleep to this video multiple times now hahah
@@marmoth9786 Welcome to Voynich manuscript ASMR
Fascinating video, this is my first foray into the Voynich manuscript but this is pretty interesting stuff. I myself am a Linguistics major with an interest in phonology, morphology, sociolinguistics, and historical linguistics, and while my interest in writing systems is purely just a hobby I couldn't help but create a theory while watching this video. I by no means think these ideas haven't been thought up by someone else so I'm curious what the counter arguments are.
First of all I think the problem with the data on the Abjads is that semitic languages tend to have quite a few consonants anyways, for example including vowels in the ISO romanization of Arabic I believe puts the count at 38 letters, which is on the higher but not highest at all end for alphabets. A problem for this of course is that we can ask if Abjads are so popular amongst Semitic languages because of their phonologies, their significant consonants inventories and their vowel inventories that don't from my knowledge get higher than 6, or is their popularity just because that's where they were used. And either way the Perso Arabic abjad in particular has been used for many non semitic languages with *very* different phonologies (like my own Punjabi). Additionally in Arabic long vowels are written with letters otherwise used for consonants, so that can bring down the number without reducing as much information as no vowels at all.
Let's play around with modern standard Italian for this. If I write it with the version of the Perso Arabic script using the version used to write Punjabi known as Shahmukhi (not for any historical reason, just because it's a functional abjad used for a language with a phonology closer to Italian than Arabic's and I know it). Basing it off of the Italian alphabet when it would lessen the number, not it's phonology (so /tʃ/ and /k/ are both written with and /ɡ/ and /dʒ/ are both written with ) I ended up with 16 letters, with and being and , , and being .
Now the first thought I had actually was "what if they've just merged voiceless and voiced consonants for some reason". And I don't actually mean that the spoken language it represents did, but just that the writing system doesn't differentiate between /p/ and /b/ for some reason. Now this would be weird and I'm not sure the motivation would be but in most spoken languages while this is a big merger it's not one that makes communication impossible. In fact many spoken languages have undergone this sound change and been fine. Two examples I can think of right now are the Tocharian languages which are Indo European and the Oceanic branch of Austronesian. Once again I don't mean that the spoken language this represented had this change, just that this orthographic merger wouldn't render the text unreadable. Doing this merger with the previous Shahmukhized Modern Italian we bring it down to exactly 13 (I genuinely didn't expect it to be 13 when I started writing this).
Lastly if we go back to version 1 of this but instead use what I know of the Perso Arabic script for Arabic itself we do actually still get some opportunities for some mergers, for example Arabic doesn't have any letters for /p/, /ɡ/, and /tʃ/ so that's more mergers we can do, though this only actually brings it down by 1, with /p/ merging either with /b/ or/f/. If we also merge the sounds of the letter (/ts/ and /dz/) with the other affricates that now brings us down to 14. I don't actually think that it's modern Italian written in a cipher of Italian written in modern Arabic, but I just wanted to show that an Abjad writing a *European* language can get a lot closer to that 13 letter goal.
Hi! My background is also in language (MA in Historical Linguistics) but it's been a while since I graduated :)
The thing with abjads is that we sometimes see Voynich solutions along the lines of "what if it's vowelless Latin?". These ignore the fact that you can't just omit the vowels from any language. Well, you can, but it wouldn't be a good idea. My naive understanding is that vowelless systems work exceptionally well for Semitic languages, because those have a recognizable consonant "frame" for words, and the required vowel can be filled in easily from context. How would you say Punjabi functions as an abjad without too much ambiguity? Are vowels represented in some way? (I know nothing about this).
To be honest, when I was making this video, I had a limited audience in mind of people who were already a bit aware of the Voynich issues. Most of my other videos got like a 100 views, from enthusiasts. Somehow the algorithm picked this one up though. If I had known that this would happen, I would have included more context. There are more issues than what I mention here. See Rene Zandbergen's site on the entropy issue for example: www.voynich.nu/extra/sol_ent.html
@@voynichtalk thanks for the response. My family is from Cáṛdā Punjab (the India side) where Gurmukhi, an abugida is used instead so just warning that while I did take lessons for learning Shahmukhi, the adapted form of Perso Arabic, I didn't grow up using it so for me it naturally feels unintuitive and like it creates a lot of ambiguity, especially when Gurmukhi is kinda hyper adapted for Punjabi (to the point that it's actually pretty bad at writing Sanskrit, unlike Devanagari).
But of course the truth is that Láíndā Punjab (the Pakistan side) has a population of 127 million, and while not everyone is literate, a lot are, so more people use Shahmukhi than Gurmukhi, so it definitely works, and it's definitely grown on me.
Punjabi has a 10 vowel system of
iː uː
ɪ ʊ
eː ə oː
ɛː ɔː
äː
In Classical Arabic there's a six vowel system of [i u a] that can be long or short and it's actually only short vowels that aren't written, long vowels are written, but using the letters for the consonants /j/ for [iː], /w/ for [uː], and [ʔ] for [aː]. Also short vowels technically can be written with diacritics but they're very rarely used, usually only in education or for calligraphy. In Punjab instead all the long front vowels ([iː eː ɛː]) are written with the letter for /j/, back vowels with the letter for /ʋ/ (there is no /w/), and [äː] with the letter for the glottal stop which neither Punjabi nor Italian have and I totally forgot that so my count should be 1 higher. The system definitely works even if I find it a bit clunky, and if you'll notice Punjabi's long vowels are pretty much identical to modern Standard Italian's vowel inventory.
So my thinking is it seems that it *could* be possible that the manuscript is a replacement cipher but for writing a medieval European language in an Abjad such as the Arabic script *or* the Hebrew script which I can't really talk on because I don't know it at all. Medieval European languages definitely were written in the Hebrew script a lot and while moddern Yiddish has spelling reforms that turns Hebrew into an alphabet that's modern Yiddish and it doesn't have to be Yiddish. Alternatively just from the very quick skim of the wikipedia page that I did it seems that the manuscript is believed to be from Italy (which is half of why I chose Modern Standard Italian for the example, the other half being the similarity of its vowels to Punjabi) and there was a Muslim presence in Southern Italy in the earlier medieval era and Malta still speaks a Semitic language to this day, and trade between Italy and North Africa still continued, so it's not unreasonable for an Italian to know the Arabic script.
Either way I'm definitely gonna check that site out as well as the rest of your channel because all this seems very interesting and I don't expect that I just solved this problem in 5 seconds, I just think abjads can't be so quickly counted out.
This is my new favorite channel!
in danish we have 3 vowels that are special to Danish and Norwegian (and to some degree Swedish ) they are Æ, Ø and Å or æ, ø and å - Now this is the interesting bit : å used to be written aa - so an alphabet can have more letters than it has in the "font" . Actually the reason that Danish is so very hard to learn is, that it has so many vowels, it has many more than it has letters for them, an a can be pronounced in many ways short, long and so on. You could use a language with fewer letters than it has sounds ! Younger Runes - the futhharken had 16 letters but 2 where R's and 2 where a's - but the rune Úr represents the letters u, o, ø and w and the letter or sound of æ can be represented by the runes Óss, Ís and Ár.
I know the value of this younger rune alphabet, was that it didn't matter if what dialect of Norse you spoke, all could read the text (if you could read) . But it does imply, that it is possible to get along with a smaller alphabet, than the letters you need, if you either can combine letters to ad letters to the alphabet, or if you have a conventions that lets you use one letter for many sounds (I as I og j) and so on.
I don't have to mention (OK2, I am mentioning) that sound (basic=phoneme) and letters of the alphabet used to represent the phonemes (the script) are two different things though very much related.
We must also consider that, ancient languages could have used one symbol (letter) to represent more than one sound/phoneme.
Example: Arabic language. Though the alphabet has 28 letters, this is relatively new because it is easier to have 28 unique letters to represent 28 unique phonemes. In the classical Arabic orthography, no letters are used to represent short vowels (or vowelless i.e. sukun). In addition, there are no dots used. Hence, a س (seen) could be a seen (س) or a sheen (ش). In modern script س is always seen because ش = sheen. But in ancient Arabic script a س could be a seen or a sheen (not dots in ancient script)! How did the ancients know which one is the right phoneme? Context! There is also a letter that has 3 phonemes! This is also where the positioning variants could help to identify which phoneme is meant!
So, if you count the ancient Arabic script, discarding those that have dots you end up with 15 or 16 or 17 (not 28) depending on how you count them...
OK2 - not the unlucky 13!
As someone who's interested in this subject, I've got a few questions forthcoming,
number one, why is everyone convinced voynichese is a written text, what if it's simply a very long poem? Which leads me to question number two,
why are researchers so distracted by the drawings? They could be there to throw us off from the real meaning behind the context right? Or maybe the illustrations are also a part of the meaning of the scripture?
question number three, why don't these researchers try to figure out how these symbols came about? I mean, if - a 21 Century (wo)man like myself- were to create a coding or a language, he/she would be influenced by a writing system that he/she would already know right? And the influence would arguably depend on the psychological attributes of exampled person.
What I'm trying to say is, wouldn't we accomplish more by trying to find where most characters come from? An example being that, we identify a certain amount of similarities between Voynichese and Alphabet X, say alphabet X was only used at a certain geographical location, we could hypothesize that whoever wrote or created these symbols was in contact with the certain geographical location where alphabet X was used.
Like honestly, some of the symbols are giving "Pahlavi alphabet" vibes, an alphabet used by the Iranians during the Sassanid rule. I'm just saying maybe, whoever devised these symbols was at one point exposed to the Pahlavi alphabet. Now we can look for the people at that time who might have been exposed to such an alphabet, who were also exposed by medieval European art. Or we can choose another alphabet which looks like voynichese's, and try to figure links between people and places. Could this idea possibly work?
I'm just trying to ask questions here and I'm not proposing any solution.
@@Chevalier.D.Artagnan question 1: poetry is often mentioned, but I haven't seen anyone take it further than broad speculation. This text is huge, so a poem of this size is probably narrative, in which case we would still expect text-like behavior on the glyph level.
Question 2. It's the other way around: if you want to get published academically in Voynich research, you have to research the text. Top level imagery research is lacking. I am convinced that the images deserve just as much attention as the text, but they are currently not getting it.
Question 3. This has been researched, but I have not yet delved into it myself. You are right that this is an important question. Maybe a topic for another video.
@ Thanks for taking the time and actually responding!
I’d love to see a new video on Voynich alphabet and its possible origin.
I always thought that since researchers tend to categorize the manuscript based on the illustrations, the pictures got more attention. I didn’t realize actual academical research is always done on the scripture.
Another question which I forgot to ask, it’s highly unlikely, but have you ever considered the voynich manuscript being a translation of sorts? A very old book/context translated and hidden into whatever the pages are hiding from us?
I’m so enthralled to have found such a wonderful TH-cam channel.
I have only taken a few casual glances at some of the manuscript's pages, but it seems to me that certain "words" or groupings of symbols appear many times in the same sentence, creating a nonsensical effect akin to what is seen in James Joyce's Finnigan's Wake - "The sun basket basket goes basket the basket basket basket nadir basket to." and such. Has an inventory of "words" been made and then, giving these simple labels like "A, B, C" (etc), sentence shapes studied? I bet that if one did this, there will be pages and pages of sentences like "AABAAACBDAACAAABBEDAA", i.e., gibberish or, more charitably, some form of ritual incantation, even song.
I like how the voynich manuscript inadvertently illustrates how connected to the language itself the writing system can be. Makes me think of Yuri Knorozov, having to manually put together the glyphs, words and semantic meaning from accompanying pictures while deciphering maya
8:05 I remember watching a video explaining that such pairs as B/P were seen as the same in some language through which the Latin alphabet had gone, but the distinction was restored except C, which originally was voiced K, remained a variant of /k/ and G was invented to distinguish
Here is my first attempt at a constructed 13 letter alphabet. There are 9 consonants which generally have a hard and a soft sound., There are 3 vowels which are ronounced either at the front of the mouth or the back. The remaining symbol is a modifier which alters both consonants and vowels.
Here are the consonants
b / b* (b/v)
p / p* (p/f)
d / d* (d/z)
t / t* (t/s)
g / g* (g/l)
k / k* (k/r)
s / s* (sh/zh)
h / h* (th/dh)
n (n/m/ng)
The vowels are
o / o* (bog/bag)
e / e* (bug/beg)
y / y* (book/big)
The n is special. It is treated as an n by default. Context clues can include; a following consonant, a silent preceding consonant or simply doubling the letter to "nn". "gnome" could be variously "noenn", "noebn" or "dnoebn".
The h sound is missing. You could let hard h serve as both dh and th letting soft h* stand for the h sound.
Ch is a compound "t-sh", here spelled ts.
J as in "jump" is "d-zh", spelled "ds*enp" (or simply "dsenp").
I am not sure how vague a system of vowels can be. Enlish spelling probably a terrible model.
This is very similar to Japanese phonetic systems.
it can have aspirated consonants marked by -h, the earliest Greek alphabet (16-letter, by the legend) is thought to be of this kind.
If there are some signs which we know are combinations of two other signs, when perhaps some of the alphabetic signs are themselves combinations? such as, perhaps, the \ and \\ in the bottom row at 10:45 (which may or may not correspond to part of the bottom of the angled 2s on the right-hand-side of the top row at that screen capture of 10:45.
...which may or may not suggest one or both of the 2s is a variant or typo - since they're so rare - of the \) of the bottom row.
i hope that made sense
I had no idea English had so many syllables... That's crazy. I want to read more of this
Ps, what a delightfully fried voice. Subbed.
Could it be an early shorthand? Pitman and Gregg shorthands have about 30 characters each. But in one of them it appears to be only 15, as each symbol is written either bolded or regular. In the other to the untrained eye it would appear there are only 10 characters, as each of those has three variants that are shown by their length.
Maybe. The low entropy makes this unlikely though. Voynichese glyphs are very predictable (if I give you one glyph, you have a good chance of predicting its neighbors). This is the kind of stuff that _asks_ for a shorthand or abbreviation treatment, rather than looking like the result of it.
That comment about the alphabet being too small is a non-starter, there are smaller ones in use today.
Can you provide some examples?
Rotokas has 12 letters (and only 11 sounds, so they could have 11, but they write an allophone of one sound differently)
/p/
/t~t͡s~s/
/k/
/b~β/
/d~ɾ~l/
/g~ɣ/
/i/
/u/
/e/
/o/
/a/
Hawaiian uses either 13 or 18 if you count the long vowels writen with a mark above the vowel as seperate (which they are not, but even if counted it’s still small)
/p/
/k~t/
/m/
/n/
/h/
/l~ɾ~ɹ/
/w~v/
/ʔ/
/i/ /iː/
/u/ /uː/
/e/ /eː/
/o/ /oː/
/a/ /aː/
@@the_linguist_llPart of my argument in the video is that these languages are all found in the same part of the world, and their small alphabet is the result of the unique combination of having a very limited phoneme inventory *and* the alphabetical system of Western missionaries. But yes, in theory you are right. Note that my video was mostly prompted by the "I solved it!" crowd, who never take any of these things into account.
@@elio7610 Piraha - 10 or 11, if you allow for two phones being different for men and women.
The theory that I always thought was the coolest (and I have absolutely no expertise in this area so it’s almost certainly not correct) is that it’s the work of a monk trying to copy a script he has absolutely no knowledge of like Chinese.
My favorite is that it’s just someone’s fun project, I heard of that theory in some TH-cam video, and the person who brought it up mentioned some elaborate but nonsensical project of their own. That it’s equivalent to finding fanfiction on some obscure book or experimental music.
I don't have any personal opinion about the Voynich script in particular, but seeing how tiny book pahlavi script of the zoroastrian priests and kufic arabic (without i'jam) was, I don't see how it couldn't be possible to have such a small number of symbols. The way that these letters can only appear in very oddly specific positions in the word though, is a significantly more valid point.
@koengheuens
Has anypne tried the Hangul (Korean) / phonetic angle?
Theoretically that could require a very simple alphabet: "voiced labial fricative" (Eng. V) is only three categories with naturally around two to three values each, contrasting with "unvoiced velar plosive" (Eng. K) or "voiced dental nasal plosive" (Eng. N) for example.
And you can get stupidly specific, e.g. a "unvoiced labio-palatal nasal fricative" in context which would take several paragraphs to explain to a lay person.
Has anyone considered fatigue for the gallows that appear in the first lines only? Maybe they got tired of distinguishing whatever features they differ from the others on? Or maybe these are foreign sounds and only occur in discussions of what the thing is called in other languages?
i think its possible that this was shorthand meant intentionally to obfuscate the meaning a little, but leave enough that people familiar w the text and the types of thing the material was likely to say could read it. in that case, we can accept a bit of loss of precision. taking the latin alphabet, discarding the new or redundant letters (something like q, w, x, j, v, c), leaves us with 20. then if you begin merging similar letters, you can fairly quickly get to something workable. (p t k s - b d g z) and you get 16. (e o - i u) and you get 14. that leaves you writing english like this: "that leaues yuu (uu)reten(k) enklesh leke thes". a language which has less use for the new/weird letters fares even better. "eso es lo que le voy a decir" "eso es lo (ke, koe, even k) le uoy a teser". it seems possible that this was some kind of cypher of the latin alphabet but it may also have been derived from some other system. depending on the language, they also may have been able to get away with not writing vowels, as is the case in some languages, and seems more likely if it's intentionally a bit hard to read. "'s 's th cs 'n sm lngwgs, 'nd sms mr lkly f ts 'ntntnlly ' bt hrd t rd"
A symbol can be an abbreviation and a normal letter. Like 'c' can mean 'the speed of light' and 'Celsius', or 'x' can mean 'an as-yet-unknown number'.
The other day i tried transcribing a section of the voynich manuscript, and yeah ngl it definitely feels like the author would occasionally invent new letters and then forget to use them and then throw some in after remembering they hadnt used them in a while.
Is it too rigid and predictable have been written as a form of speaking in tongues?
I figured out the Voynich, but your language understanding is off, Spanish can be minimalized to 7 cons and 2 vowels. Lots of languages have rules and short cuts. then the writer can also have a artistic design to the text writing. If the rules of the language are done right you can have a language that has less then 1000 words and can create words that have meaning, like Ancient Greek, not new Greek. but the Voynich language has some odd today ideas of language, which i figured out because my made-up language followed the same rules.
My theory is that a group of monks who had too much free time on their hands decided to write a giant shitpost to troll future linguists
I've only ever known of the Voynich Manuscript as an historically peripheral oddity.
It's existence took up residence in that small back corner of my brain set aside for such things, eg. bananas with seeds, piezoelectricity, and, well, Voynich's manuscript.
This discussion was in equal parts illuminating and mind-expanding - bedankt!
I do have a question, though, more personal that professional, perhaps: why 'zee' and not 'zed'? (ta.)
Thanks! I'm planning to make more videos like this, so make sure to follow the channel if you're interested. On the pronunciation of "Z", my way of speaking English leans closer to North American - zed is more of a British thing. In Dutch we do say "zet" :)
Why not duble letters being capable of mean different sounds like with ma means a and am meaning b, there would be thousands of possibilities of words but has anyone tried to categorize the language standard? Like what letters are more plausible to come after other letters?
Thanks for putting this together, the lack of characters really does cause problems and the observation that one of them appears mostly as an 'end of line' character is very odd. This isn't a file, it's a written document where 'end of line' is implicit unless it's doing something like a dash to indicate a word is broken over multiple lines. I'm not a linguist, but one observation of a very limited character set language is Arabic rusum or even English written without the vowels. This could bring the 18 down to 14 by dropping A, E, I and O? Edit: I see you addressed this in your first video. Subscribed.
It's possible there are more glyphs than you realize: perhaps height and width have a differentiation that isn't reflected in the common character tallies.
True, and some people have suggested ideas along these lines. However, I feel madness creeping in when I actually try something like this for myself. We're looking at an artefact where someone 600 years ago took an animal part (feather) to write on another animal part (skin). Variation is everywhere. But determining whether it's meaningful is not straightforward.
I haven’t looked into this, but is there room to assess for a ton of digraphs?
If there's so few letters, then maybe it's a tonal language? I can't think of any that would have used an alphabet at the time though, I know Vietnamese does it, but that came from French rule. I wonder what language contact would have allowed it...
Regarding the positional variation, I can see why it would be the case: Perhaps the top-of-page characters were used to denote new paragraphs/points, and final letters were used to help distinguish where writing begins and ends. Hebrew script does this a looooot, for example.
@@PlaguevonKarma if a tonal language used a small alphabet, wouldn't you expect a lot of diacritics though?
Maybe a tonal language as understood by non-native speakers, so with a lot of information loss. I don't know though, the whole system is _so_ structured and rigid. I really feel like the clue will be to first understand how the system works and only then move on to identify the peculiarities of any linguistic content.
@@voynichtalk Not necessarily, Burmese is tonal but it is not indicated in informal Burmese orthography. That being said though, the chance of the 'language' in which the Voynich manuscript was written being tonal is pretty low considering where it's located
@@voynichtalk mmm, I don't know. There are languages (as the other commenter mentioned) that eschew diacritics; it's not unheard of at all. Dzongkha and Tibetan don't imply tones in their syllabic writing, for example - if my idea is correct then I'd say it's most similar to Dzhongkha (literally, not linguistically, goodness no!) in how it is written, as Dzongkha uses Tibetan script to write syllables without denoting either of its two tones. Moving outside of tones, though, not all stress-timed languages note said stress; Spanish does, English doesn't, Yiddish doesn't, so on, so forth.
Speaking as someone who speaks Mandarin, I can read Hanzì just fine, even though, on their own, they mean very little - the phonetic components are outdated and the radicals are very abstract. Despite that, I can still remember what a text is meant to say. The same applies with English - it's not very representative, it doesn't note the word stress at all, but I can still say it. I don't think the messaging in Voynichese would be lost - it seems like it would give the same amount of memory-jogging as English, if anything.
Given what you've said, with structures this constrained, with such a small set of ways to say...anything...I think tones are worth considering, if it makes it make sense.
The only question, then, would be...why was it written in Europe? As the other commenter said, we don't usually have tonal languages, and I don't think we'd have travellers going around writing things if there wasn't a "home fort" with more samples.
Anyway, I hope this means something to you, your work is amazing! ❤
the ones that only (is it only? you say mostly and im interested in what the exceptions are) occur at the end of the line look very similar, and a bit like an ampersand, which makes me think they're actually variants of each other, and function like a hyphen, for marking that the word continues over the linebreak.
There are definitely lots of exceptions. If you want to have a look, I suggest voynichese.com/
First video of yours I've seen. Some guy posted a video series that made a pretty convincing case that the Voynich manuscript is written in A sister language to Romani (the language of the "Gypsies" who called themselves Romani). This video series was unfortunately taken down because he wasn't very happy with the quality of it, and is now focusing on making a bunch of videos on topics that are related to his methods before redoing the series, I barely has any videos still, which is A real shame.
But he built his work off the back of Steven Bax. The methodology use was to look through the astrological sections of the book, and identify potential candidates for constellations that are labeled. Since these constellations are likely to have the same name in multiple languages, you can make predictions as to which sounds each letter likely represents by seeing what sounds that name has in different languages. This is the type of method that was used to decoder linear B, where linear B was decoded by identifying sets of words that were only found on certain islands, too much then the assumption was these words must represent the names of cities and towns, since those are the most likely words to only have relevance in specific geographical locations.
I bring this up to you to see what your thoughts on this would be.
Must have been Derek Vogt (Volder Z). I am not too familiar with his work, so I can't talk specifics. What I do know is that there was this period in Voynich research (around 2016 or so?) when Stephen Bax' theory attracted a lot of attention and other researchers, and Derek was one of those closely associated to him. However, nowadays most if not all researchers I talked to have abandoned belief in the usefulness of Bax' theories.
The reason why Romani is attractive is that it has low entropy (relatively predictable). It also offers a (somewhat vague) cultural reason for the manuscripts' existence. Traveling outsider culture, no own written tradition... So I understand the incentive to look into this.
However, what I expect is that it still won't be enough, and that many issues will remain unaddressed. It will be just another theory cherry picking words that happen to work without offering a structural solution. Moreover, since the target language is not well attested, there will be a lot of wiggle room for the solver.
I'm saying this without having seen his theory though. When any new video comes out, we'll discuss specifics.
the letter that forms the iiii group can only be i (Latin examples are known) or u. So, ijji or uwwu.
Ive used voynich script to write salishan languages before. Its not too small, just the end text deviates substantially from voynich. Unless I separate syllables via spaces. Then it looks like voynich text.
Anyway I absolutely used positional variation when adapting the script. It looks great!
It's just there is another class of writing systems that separates phonological features into discrete symbols, its not featural but not an alphabet. "Kalis" is a great example of that type of script. I wouldn't be surprised if the original voynich used a similar strategy.
The problem we have with Voynich, at least as I remember from a few years ago, is that it obeys statistical characteristics of written languages (that is, the most-frequent glyph appears so many times relative to the next-most-frequent, etc.) that the authors of the manuscript couldn't have known about. So it has very strong arguments that it _is_ some form of written language, just not a substitution cipher for an alphabet. So if it's not a substitution cipher, what might it be?
The Younger Futhark has 16 letters, and in old Norse we pretty much only see ᛦ (usually in English and in Gothic written as z, in Scandinavia usually transcribed as Ʀ) at the very end of words, and by roughly 1000 AD it started to fall out of favour, and we see R replace it entirely in some runic writings, giving some inscriptions only 15 letters to play with. I could see Thors (th) and Tyr (t/d) being simplified into one, and As(A) Ár(Á/o) I am not saying this because I have a theory. I am just a bit surprised that this rather small(but also highly unstable) alphabet is glossed over
12:10 aren't the short and long s switched? i thought long s was in the beginning or middle of words
@@notwithouttext yeah I know this is so dumb! Too bad I can't change it anymore. Luckily some people are paying attention :)
@@voynichtalk yeah the point still stands there, very interesting video (i'd heard of the voynich manuscript like one time but this video and the part 1 are very interesting)
@@voynichtalk oh also, one thing you can do is put a correction in the description with a timestamp. not perfect but it's something
I assume it has, but Has the repetition of words in combination/relation been looked at? Kinda like how mailbox doesn't have it's own word, but instead is a combination of words.
@@AlexandHuman that doesn't happen very much at all. One of the reasons is precisely that glyphs tend to occur in the same position within the word. Simplifying a bit too much, one could say that words have a beginning, a middle and an end, an each is picked from a fixed category (or can be left open). The reality is more complex, but this should give you a feel for how it is. There are certain character groups that can be repeated, but it's all pretty unnatural and insufficient. If you're interested in this, I'd recommend Marco Ponzi's post here: medium.com/viridisgreen/two-voynich-word-models-c10a89e8ea01
what if many characters are ambiguous to (eg) voicing or other features? my first thought was the Younger Fuþark alphabet which i've just confirmed has 16 characters after having eliminated the more typical alphabetic distinctions of Elder Fuþark.
i disclose that this is my first encounter with your channel and i have only nominal familiarity with the Voynich Manuscript, so my suggestion is likely unfounded.
interesting topic though, great analysis. as a linguistics nerd i have subscribed :)
That's cool, I didn't know there was a runic script with only 16 characters. The main problem with Voynichese is its low conditional character entropy; this means that when I give you one character, you have a pretty good chance to predict the next one. Now of course if we collapse the alphabet like I did in the video by assuming positional variation, this situation might improve. It will make certain glyphs equivalent to one another, allowing them to appear in a wider context. The problem is that nobody has come up with an optimal way yet to do this. What I did in the video was just collapse the empty spaces in the table, without thinking about which glyphs are most likely variants of each other. So to test your idea, we would first need to find the best way to rewrite Voynichese, then obtain a long text in Futhark, and then compare their entropy values. It sounds interesting, but also a lot of work :)
I made a similar comment above about Mongolian, just to suggest that this is still plausible as an alphabet. I'm curious to learn more about the low positional entropy. It could be that frequent combinations are essentially a single letter-or collocates in the mind of a doodling artist.
I'm wondering whether it's the nature of the text itself that is responsible for these problems.
What if it's a bunch of prayers, chants, recitations, etc., with words being repeated over and over within sentences that repeat over and over in structure? Certain words might then only be used when introducing a new topic. Certain words might indeed only appear at the end of lines, assuming those are the ends of sentences.
It might be an alphabet with the few glyphs capable of appearing in multiple positions functioning as modifiers, producing very different changes depending which glyph they modify.
The writing system of the Voynich manuscript may not even be a complete writing system, capable of writing any text. It may only be capable of writing... the Voynich Manuscript.
Has anyone tried turning all the different chunks of text (what appear to be words) into unique symbols, so that 'xhun' becomes ^ and 'xgun' becomes & (I hope that makes sense), and then using a computer to check the frequency of those against the frequency of words within texts like the Hail Mary, Lord's Prayer, Prayer of St. Francis, etc.? A kind of mass lexical comparison, if you will.
@@MrNyathi1 regarding bigraphs, I did experiment a bit with that some years ago: herculeaf.wordpress.com/2020/10/19/entropy-hunting-bigger-and-better/
As for the idea that the text is poor rather than the writing system... Well there are many different words, a normal amount even. And they are distributed similarly to a normal text, with new vocabulary being introduced at a normal rate. So I guess that would argue against such a proposal, at least without further modifications.
Can the Latin alphabet be constructed from the Voynich characters?
Is it possible a non-latin-reading person got their hands on some letter construction templates broken down into individual strokes or elements, thought they were full letters, then combined them in a way that they thought looked like latin writing?
Koen: "German? I never heard of German!"
Did someone already do a similar analysis, but based on words and sentences instead of letters? In my naive understanding, words would have way less options for different meanings than characters. It could then be used to further narrow the options for alphabets (if it resembles an actual language) or further disprove character substitution.
If Voynich is written in reverse?
If v. Works like those cloister numerals?
My theory is it’s a Nestorian translation for the Bible written in a script for a Tibetan or Mongolic language. It just so happened to have ended up in Europe.
How many "words" are there? I wonder whether the whole word is used like a logographic scrypt? A distribution graph of the word frequency might be able to tell that. Or perhaps the words are letters if you get my meaning, like each collection fo characters might map to iust a single phoneme?
If I recall correctly, the number of different word types is somewhat on the high side but within a normal range. Certainly way too much to represent any form of alphabet.
I guess they could be like logographic script, where for example each word is like one Chinese character. To be honest I don't know enough about this. Wouldn't it essentially equal the use of a code book?
If some letters appears only at the end, are there any possibility that letter next to it (penultimate) does NOT have same value?
Same goes for the letter at the begining of the word (and second one).
Are there any doubled leTTers at aLL?
@@cocobill doubled letters are notoriously rare. Although this depends on what we see as a letter! You often get a bunch of i-shapes at the end of words, but it is unclear whether those are single glyphs or not. Most letters generally don't double, and it is possible to read the glyphs in such a way that there is hardly any doubling.
About the penultimate letter, that's usually pretty easy to predict if you know the last letter. There are always only a few common options. It's all much too rigid. Check Voynichese.com if you want to have a look.
Have you looked at the Thai writing system? It’s highly regulated, but there are basically no exceptions to those rules, like a handful. And a *lot* of letters depends their sound on their position in the word and proximity to other letters and tone signs. Might be something to consider. But yes, probably one of the languages with most individual consonant and vowel glyphs.
Can’t some symbols be used in combination (digraphs?) to make additional sounds from a smaller alphabet?
It's certainly a good idea to think in this direction. I looked a bit into digraphs here: herculeaf.wordpress.com/2020/10/19/entropy-hunting-bigger-and-better/
In my opinion, at least two problems remain. One is that entropy is still really on the low side even if you go wild with n-grams. Another is that if you assume enough n-grams to approach a semblance of normalcy, your words get very small. So that means abandoning spaces, which opens up a whole new can of worms. There's also the question to what extent something like this is practically feasible.
Greek has 24 letters. Four η ω ψ ξ lack a sound of their own. Two θ δ are highly specific to greek. Six σ ζ χ γ φ β are fricatives, and it's rare for a language to have all of those.
A greek speaker could use very few letters to write a foreign language. They would write basque with 13-14 letters (of course it wouldn't be great but that's how they would do it.)
Hebrew has 22 letters. Four ע ח כ ת are rarely used for non hebrew words. Four ז ס צ ש are sibilants.
A language with only one sibilant, like finnish, would be written by an hebrew speaker with less than 13 letters. In fact one could do with only 12.
Arabic has 28 letters. Of these, nine ق ح ع ص ث ض ذ ظ ط are very rare in non arabic words. That leaves 19 letters, including ه ف س ش ج ز غ خ eight fricative letters, which again is a lot. A language with few fricatives, like finnish again, would be written by an arabic speaker using around 13 letters. Basque would need the same amount.
Pretty much any language without /z/, /x/ or /f/ could be reduced that way.
Interesting. When I started studying the Voynich years ago, something like this was my first idea: one language "reduced" through the lens of another language. It explains some of the poverty of the system and provides a real-world scenario where it could have emerged. Since then though, I have learned more about the statistics behind Voynichese, and I don't think that would work anymore. Even when positional rigidity is improved like I did by assuming positional variants, the system is still much too rigid. If we were to rewrite Voynichese in the way I discuss in the video and then compare it to "Finnish as written by an Arabic speaker", we'd see that the letters in the Finnish would still have a much greater freedom of movement.
I think this reduction of the Latin alphabet would still be functional:
A 1
E 2
F 3
I 4
K=G 5
L 6
M 7
N 8
O 9
P=B 10
R 11
S 12
T=D 13
V 14
One more reduction, either A=O or M=N would reduce it to 13. Use both, and you could afford U =/=V.
you can easily combine F and P with how Ph works, and R with L too as in Japanese... or you can just use cryptography shorthands.
Thoughts on some of these acting like recitation marks or something? Kinda like what’s seen in the Torah or Quran where there are special symbols every now and then which are there to limit the recitation to a more exact way ? Such as pitch, small changes to words etc which wouldn’t happen normally in speech
Something like that is possible, especially since certain glyphs (the one-legged "gallows") are so frequent top line and in the first position, kind of like paragraph markers. So it is possible to come up with explanations for strange distributions. But the point I'm trying to make is that all of this takes away from your regular alphabet, leading to a problematically poor glyph set for the actual "work".
Hildegard von Bingen's language was often mixed in with normal Latin, right? So perhaps a lot of the text is meant to be gibberish [for whatever reason] and only a few words are meant to be decipherable. So, for example, and it doesn't have to be like this but perhaps the characters on 8:44 are actually the signal for a "real word". Apparently this is one of the few surviving passages of her language: "O *orzchis* Ecclesia, armis divinis praecincta, et hyacinto ornata, tu es *caldemia* stigmatum *loifolum* et urbs scienciarum." The marked words are in the unknown language, the rest is readable Latin. What if something like that is at work?
Could it be music melody? Like music notes on staff?
People mention this option sometimes, but I'm not aware of any serious research into it. Now if you're creative enough, anything can be converted into music, so the challenge would be to find a plausible conversion method that produces period-appropriate music.
Pardon what's Hangul, does it count as syllabic? It's way elegant.
I haven't read up on the Manuscript in a decade or two. My recollection is that the glyph set is small but the word assortment matches a typical European language in frequency, and further has some words only in some subject pages? If that recollection is right, then it must be a system whereby each set of glyphs maps to a word without glyphs having necessarily any phonic meaning. I'm no linguist but I don't know a writing system like that. But nothing else would fit.
Those two symbols that are always at the right margin might be something like a hyphen, splitting a word into syllables, with the word starting on one line and being finished on the next.