This lecture is super easy to understand. Thank you for this legendary lecture, Michael. 1. Parallelism(Union) shows branching in programming. 2. Sequential process(Concatenation) shows just the process of programming itself. 3. Star (*) means the loop in programming.
Great lecture. I am learning this material on my own. Reading lectures from another university was confusing but your video clarifies quite a bit. Thanks!
Introduction, outline, mechanics, expectations 2. Finite Automata, formal definition, regular languages 3. Regular Operations and Regular Expressions 4. Proved: Class of regular languages is closed under ∪ 5. Started: Closure under ∘ , to be continued…
That symbol that looks like a zero with a line through it (∅) means the empty set (the set of no states). He could have also written the empty set using the bracket notation that he uses elsewhere as: {}, they are equivalent (∅ = {}).
@@GAPQ-xb6di I'm not familiar with the term "dead state" in this context, but maybe. I asked ChatGPT and this is what it said: Empty set vs dead state The concepts of "empty set" and "dead state" come from different fields: mathematics and computer science, respectively. Here’s an explanation of both: Empty Set Field: Mathematics (specifically set theory) Definition: The empty set, denoted by ∅ or {}, is a set that contains no elements. It is a fundamental concept in set theory and serves as the foundation for building more complex sets. Properties: It is unique; there is only one empty set. The empty set is a subset of every set. Its cardinality (the number of elements in the set) is 0. Dead State Field: Computer Science (specifically automata theory) Definition: In the context of finite automata (like a state machine), a dead state is a state from which no sequence of input symbols can lead to an accepting state. Once the automaton enters a dead state, it will never reach an accepting state, no matter what inputs follow. Properties: It indicates a failure or an error in the system. Transitioning to a dead state means the process is essentially halted or stuck in a non-productive loop. Comparison Empty Set: A fundamental concept in set theory, representing the absence of elements. Dead State: A specific state in an automaton where no further productive progress can be made toward acceptance. While both concepts deal with the idea of "nothingness" or lack of productive elements, they are used in entirely different contexts and have distinct implications in their respective fields.
Okay, at 52:25 we are beginning the argument that regular expressions imply regular languages. Well, Regular Expressions were defined in lecture 1 from languages or alphabet symbols. Just before that definition, L(M) was defined to be a language containing all strings recognized by the automaton M. So, how do we make sense of the notation L(R), where R is a regular expression? Please don't say I'm supposed to interpret the Regular expression in terms of its associated automaton, because this is supposed to be the proof that a Regular expression has an associated automaton. Please, someone help me understand exactly what L(R) represents.
Yeah, I think so. But, the regular expression is already a language, so it's completely superfluous. Without actually having a conversation with the instructor and asking him why he wrote it like this, we might never know his thinking and motivation. It's possible to simply write the statement as, "If R is a regular expression, then R is regular."
generically, when constructing the DFA you will have 2^q states. However, once you construct the DFA you may find that some states have no arrows pointing at them, so they can be removed from the DFA without affecting anything.
For A* we are basically solving palindromic partition kind of leetcode problem where we we find a possible to place to cut the string and check check recursively the rest part I'd accepted or not
The q1 branch dies off. Remember that as long as one of the branches ends up in an accepting state by the end of the string, then the NFA accepts. An NFA rejects only when all the branches end up in a non-accepting state.
aa doesn't take you to the final state or acceptor state. But aab can. When it is just aa, first a can take you to q1 again or to q2. After going to q1 you can go to q2 with the next a. But, once you are in q2, you have no where to go, since a is not a valid input for q2. On the other hand, when there is aab, 1st a takes you to q1 (self) and 2nd a takes to q2 and b takes you to q3. Now still this isnt final state. But with empty string we can go to q4. Which is accept node or final state. So it is enough to reach q3, you can always be in q4. Sorry if I made you more confused.
Correct me if I'm wrong, it seems to me that in order to account for the empty transitions in an NFA, the proof that NFA languages are regular would need to utilize the result that a union of regular languages is regular. If so, using the regularity of NFA languages to prove that a union of regular languages is regular is perhaps circular, and perhaps the most important aspect of the proof of regularity for NFA was brushed over.
> the proof that NFA languages are regular would need to utilize the result that a union of regular languages is regular. Would you mind expanding on where that result is relied on?
1. A language is regular if some "finite automation" recognizes it -> this is the definition from previous video lecture. 2.we have proved (as we saw in this video lecture) that an NFA can be converted to an equivalent DFA ! 3.and, we know that any language that is recognized by DFA is regular (from 1) so, as NFA === DFA (2), we can say that - any language that is recognized by NFA will also be regular. so the whole circular dependency thing falls ! please do correct me if I'm wrong. really hoping to get a reply.
In closure in concat, why we don't know where to split? x is a string in M1, so it has a finite length. Then I think the end of the string x must be the split point.
If you are given e.g. "abbabab" as input to a machine that wants to reconize [L1 concatenated with L2], is x=a? Is x=abb? Is x=abbab? How do you tell? What's a way of telling, independent of my particular choice of L1 and L2? My answer: you don't tell. The independent way, which is only almost independent, is to do the concatenation construction from the video (or else something very much like it).
Consider two languages A1 and A2 over the alphabet {0, 1}, A1 is the language containing all strings that end with 1 (e.g. 1, 01, 011, 01001 etc.), A2 is the language containing all strings that start with 11 (e.g. 11, 110, 1101101 etc.). These are two regular languages (we can make Finite-State Automata for both of these). Now consider the regular language A which is the concatenation: A1A2. Consider the input string w = "1001110". Where would we split x and y? Is x = "100111"? This is an accepted string in A1, this would make y = "0", however this is not an accepted string in A2. Is x = "10011"? This would make y = "10", which is still not accepted! The correct answer here is x = "1001" and y = "110". But how do we know when x has ended? There is no way of telling. This is why we don't know where to split in the initial approach
What I gather is that without taking many possible paths, the machines cannot know where to split the string so they can accept each of their respective pieces, and a DFA machine cannot take multiple paths from a single state. However, nondeterminism (as introduced with NFAs) provides the notion of many possible paths and now all the machines have to do is guess the right path where the input string is split properly. Prof. Sipser proves this with the closure properties.
Say you have {0} as the alphabet and construct an automaton accepting an odd number of 0s, so the set of accepting states will be A = {0, 000, 00000, ...} . One possible automaton will consist of two states q0 and q1, a cycle of length 2, and q0 as the starting state, q1 as the accepting state. Now, note that A* will be {epsilon, 0, 000, 00000 ...}. Say you add the epsilon transition from q1 to q0 and at the same time making q0 an accepting state. Suddenly, your NFA for A* will accept the string 00 (by going q0 to q1 using 0, from q1 back to q0 using the epsilon transition, then from q0 to q1 using 0, and finally from q1 to q0 using the epsilon transition), which it shouldn't. This is a very simple counterexample.
@@danielghenghea7104 Hi, thanks for your explanation. I have a few other questions if you wouldn't mind. If you have a language that only accepts an odd number of 0 and the only symbol is 0, then wouldn't 00 be part of the A* set, and it should accept 00 too, right? I assume all even 0 strings should be part of A*. And let us assume even 0 strings are not a part of your language A* and following the lecture we add a new state behind the initial start state that is now the accepting state and start state q0, and it is connected to our initial start state with epsilon which now is q1 and as you mentioned our final state is now q2. We connect q2 with q1 for epsilon transition, then this automaton follows closure under * and now if the string 00 comes, will it not accept? It can go from q0 to q1 via epsilon transition, then a 0 from q1 to q2, again come back from q2 to q1 via epsilon transition and then finally from q1 to q2 when final 0 comes. If I remember correctly professor said as long as there is a state which accepts in the set of states an NFA could be in after the final transition, NFA will accept that string. Wouldn't that hold here too?
Sorry about the above example. Here’s one I thin works: say you have a DFA with two states q0 and q1, with a transition of 0 from q0 to q1 and a transition of 1 from q1 to q0, with q1 as the accepting state and q0 the start state. Also, let A be the language recognised by this DFA. Then surely A* won’t have a string ending in 1 since gluing together from A won’t give us this. However, if we make the initial state an accept state, the modified automaton will accept strings with 1.
I wondered the same thing, but it was answered at around 54:50, if there is another input and the machine has nowhere to go, it "is just going to die", which is presumably a reject state. While you can get to an accept state with abb, the second b moves the machine away from an accept state to a reject state because it has nowhere to go.
An NFA does not have to accept the empty string, for example, if the starting state of the NFA is a non-accepting state and there aren’t any empty string transitions out of that state, it won’t accept the empty string. DFAs are usually considered NFAs, as they are a subset of NFAs, but it seems like it can sometimes depend on what definition of NFA you are using. This is from the Wikipedia article for "Nondeterministic finite automaton": "… In particular, every DFA is also an NFA. Sometimes the term NFA is used in a narrower sense, referring to an NFA that is not a DFA, but not in this article."
Why is abb not accepted considering that after ab, and you are on state q3, you can take an epsilon transition to q4, then read b again, or is this not allowed?
That is not allowed. After reading ab, you are on q3. Let's say you take the epsilon transition. You end up at q4. But you are still not done reading the entire string. So you read b, and there is no transition. Your branch dies off, in the professors words. So you can't accept it, even though it is on q4. TLDR: We check acceptance by seeing if we're at an accept state only after the entire string has been read.
@@karanahlawat9106 So that means even though the machine is at an accept state, but still hasn't read the string completely, and when it tries to read the next string but there is nowhere to go, so the branch just dies off( even though it was at the accept state just before reading the last string). Am I right?
There are two possible branches: 1) stay in q3 and 2) reach q4 with an epsilon transition. Since 2nd branch lands the machine in an accepting state, the machine accepts as a whole. Remember all you need is a single accepting state for the NFA to accept.
This course has: Syllabus, Calendar, Instructor Insights, Readings, Lecture Notes, Video Lectures, Assignments (no solutions), Exams (no solutions). See the course on MIT OpenCourseWare for more info and materials: ocw.mit.edu/18-404JF20. Best wishes on your studies!
If you have any problems understanding this lecture, don't blame yourself. It is a somewhat careless, ambiguous and even erroneous description. The wikipedia page is better. That's on top of the misleading, though conventional, use of the term 'Nondeterministic'. Perhaps 'Multi-deterministic' might be a more descriptive term.
Just skimmed through Thompson's method, and it looks like it highly prioritizes the ability to implement that conversion by simply following the rules. As for intuitiveness - I think it's a matter of preference :)
The teacher explained complicated concepts in a most intuitive, easy to understand way! Thank you.
This lecture is super easy to understand. Thank you for this legendary lecture, Michael.
1. Parallelism(Union) shows branching in programming.
2. Sequential process(Concatenation) shows just the process of programming itself.
3. Star (*) means the loop in programming.
How does concatanation represents programming?
Sequential processing is that.
Mind. blown....
This comment was pretty much an epiphany.
@@wedernoch6441 What is it useful for though ? I cant think of any use of this
* is combinatorial, loops linear, sequential?
Proving closures using NFAs feels like a cheat code
Great lecture. I am learning this material on my own. Reading lectures from another university was confusing but your video clarifies quite a bit. Thanks!
Its always exciting to understand non determinism.
Outstanding lectures this far, thank you! Love how every detail is explained, and with simple examples.
wow this is leaps and bounds better explained than my university teacher
Very helpful to future students! Great work! Thank you!
The example at the end is so useful.
Introduction, outline, mechanics, expectations
2. Finite Automata, formal definition, regular languages
3. Regular Operations and Regular Expressions
4. Proved: Class of regular languages is closed under ∪
5. Started: Closure under ∘ , to be continued…
in the lecture, the Coffee break is the best time to learn 😀😃
16:55 What about transition q2->q1 after reading ab?
He forgot that. That branch dies, too, because q1 has no outgoing arrow for b
Thanks Michael, very cool!
HELLO UNCLE SIPSER.... Good to see you ... Keep it up good work ... Stay Fit and take care of yourself... ❤
24:00 what does the capital after the transition function stand for? new variable for states (Q)? And is it the same variable as 54:00?
That symbol that looks like a zero with a line through it (∅) means the empty set (the set of no states). He could have also written the empty set using the bracket notation that he uses elsewhere as: {}, they are equivalent (∅ = {}).
@@elliotwaiteCan this be called "Dead State" as well.?
@@GAPQ-xb6di I'm not familiar with the term "dead state" in this context, but maybe. I asked ChatGPT and this is what it said:
Empty set vs dead state
The concepts of "empty set" and "dead state" come from different fields: mathematics and computer science, respectively. Here’s an explanation of both:
Empty Set
Field: Mathematics (specifically set theory)
Definition: The empty set, denoted by ∅ or {}, is a set that contains no elements. It is a fundamental concept in set theory and serves as the foundation for building more complex sets.
Properties:
It is unique; there is only one empty set.
The empty set is a subset of every set.
Its cardinality (the number of elements in the set) is 0.
Dead State
Field: Computer Science (specifically automata theory)
Definition: In the context of finite automata (like a state machine), a dead state is a state from which no sequence of input symbols can lead to an accepting state. Once the automaton enters a dead state, it will never reach an accepting state, no matter what inputs follow.
Properties:
It indicates a failure or an error in the system.
Transitioning to a dead state means the process is essentially halted or stuck in a non-productive loop.
Comparison
Empty Set: A fundamental concept in set theory, representing the absence of elements.
Dead State: A specific state in an automaton where no further productive progress can be made toward acceptance.
While both concepts deal with the idea of "nothingness" or lack of productive elements, they are used in entirely different contexts and have distinct implications in their respective fields.
@@elliotwaite Thanks.! You cleared my doubts.
Okay, at 52:25 we are beginning the argument that regular expressions imply regular languages. Well, Regular Expressions were defined in lecture 1 from languages or alphabet symbols. Just before that definition, L(M) was defined to be a language containing all strings recognized by the automaton M. So, how do we make sense of the notation L(R), where R is a regular expression?
Please don't say I'm supposed to interpret the Regular expression in terms of its associated automaton, because this is supposed to be the proof that a Regular expression has an associated automaton.
Please, someone help me understand exactly what L(R) represents.
Is the instructor using this notation in a slippery way to simply represent the language obtained from the regular expression?
Yeah, I think so. But, the regular expression is already a language, so it's completely superfluous. Without actually having a conversation with the instructor and asking him why he wrote it like this, we might never know his thinking and motivation.
It's possible to simply write the statement as, "If R is a regular expression, then R is regular."
At 31:20
does a dfa converted from nfa always contain 2^q states ?
I mean , Q' is equal to P(Q) or Q' is a subset of P(Q) ?
generically, when constructing the DFA you will have 2^q states. However, once you construct the DFA you may find that some states have no arrows pointing at them, so they can be removed from the DFA without affecting anything.
@@isaacbowser6023 thanks, that was my doubt for a long time
what about in qbb time 18:00, the last b can jump to q4 using free one
same doubt .
Any answer to this?
For A* we are basically solving palindromic partition kind of leetcode problem where we we find a possible to place to cut the string and check check recursively the rest part I'd accepted or not
Respect, 45 from Cairo, Egypt
Great lecture! Thanks from Ukraine 🥰🙌🌞❤️
are u ok
18:26 isn’t N1 with ab not only in q3/q4 but also in q1?
The q1 branch dies off. Remember that as long as one of the branches ends up in an accepting state by the end of the string, then the NFA accepts. An NFA rejects only when all the branches end up in a non-accepting state.
20:17: How is aa, rejected but aab is accepted? I'm confused
aa doesn't take you to the final state or acceptor state. But aab can.
When it is just aa, first a can take you to q1 again or to q2. After going to q1 you can go to q2 with the next a. But, once you are in q2, you have no where to go, since a is not a valid input for q2.
On the other hand, when there is aab, 1st a takes you to q1 (self) and 2nd a takes to q2 and b takes you to q3. Now still this isnt final state. But with empty string we can go to q4. Which is accept node or final state. So it is enough to reach q3, you can always be in q4.
Sorry if I made you more confused.
Correct me if I'm wrong, it seems to me that in order to account for the empty transitions in an NFA, the proof that NFA languages are regular would need to utilize the result that a union of regular languages is regular. If so, using the regularity of NFA languages to prove that a union of regular languages is regular is perhaps circular, and perhaps the most important aspect of the proof of regularity for NFA was brushed over.
> the proof that NFA languages are regular would need to utilize the result that a union of regular languages is regular.
Would you mind expanding on where that result is relied on?
@@jonaskoelker let me review this and get back to you! It’s been a few months, and I may be wrong.
1. A language is regular if some "finite automation" recognizes it -> this is the definition from previous video lecture.
2.we have proved (as we saw in this video lecture) that an NFA can be converted to an equivalent DFA !
3.and, we know that any language that is recognized by DFA is regular (from 1)
so, as NFA === DFA (2), we can say that - any language that is recognized by NFA will also be regular.
so the whole circular dependency thing falls !
please do correct me if I'm wrong.
really hoping to get a reply.
In closure in concat, why we don't know where to split?
x is a string in M1, so it has a finite length. Then I think the end of the string x must be the split point.
If you are given e.g. "abbabab" as input to a machine that wants to reconize [L1 concatenated with L2], is x=a? Is x=abb? Is x=abbab? How do you tell? What's a way of telling, independent of my particular choice of L1 and L2?
My answer: you don't tell. The independent way, which is only almost independent, is to do the concatenation construction from the video (or else something very much like it).
Consider two languages A1 and A2 over the alphabet {0, 1}, A1 is the language containing all strings that end with 1 (e.g. 1, 01, 011, 01001 etc.), A2 is the language containing all strings that start with 11 (e.g. 11, 110, 1101101 etc.). These are two regular languages (we can make Finite-State Automata for both of these). Now consider the regular language A which is the concatenation: A1A2.
Consider the input string w = "1001110". Where would we split x and y? Is x = "100111"? This is an accepted string in A1, this would make y = "0", however this is not an accepted string in A2. Is x = "10011"? This would make y = "10", which is still not accepted! The correct answer here is x = "1001" and y = "110". But how do we know when x has ended? There is no way of telling. This is why we don't know where to split in the initial approach
What I gather is that without taking many possible paths, the machines cannot know where to split the string so they can accept each of their respective pieces, and a DFA machine cannot take multiple paths from a single state. However, nondeterminism (as introduced with NFAs) provides the notion of many possible paths and now all the machines have to do is guess the right path where the input string is split properly. Prof. Sipser proves this with the closure properties.
Very clear. Thank you
Why the NFA added the null symbol \epsilon compared to the DFA?
Can someone help me with an example of why the first state can't be the accepting state, in closure under *?
Say you have {0} as the alphabet and construct an automaton accepting an odd number of 0s, so the set of accepting states will be A = {0, 000, 00000, ...} . One possible automaton will consist of two states q0 and q1, a cycle of length 2, and q0 as the starting state, q1 as the accepting state. Now, note that A* will be {epsilon, 0, 000, 00000 ...}. Say you add the epsilon transition from q1 to q0 and at the same time making q0 an accepting state. Suddenly, your NFA for A* will accept the string 00 (by going q0 to q1 using 0, from q1 back to q0 using the epsilon transition, then from q0 to q1 using 0, and finally from q1 to q0 using the epsilon transition), which it shouldn't. This is a very simple counterexample.
@@danielghenghea7104 Hi, thanks for your explanation. I have a few other questions if you wouldn't mind. If you have a language that only accepts an odd number of 0 and the only symbol is 0, then wouldn't 00 be part of the A* set, and it should accept 00 too, right? I assume all even 0 strings should be part of A*. And let us assume even 0 strings are not a part of your language A* and following the lecture we add a new state behind the initial start state that is now the accepting state and start state q0, and it is connected to our initial start state with epsilon which now is q1 and as you mentioned our final state is now q2. We connect q2 with q1 for epsilon transition, then this automaton follows closure under * and now if the string 00 comes, will it not accept? It can go from q0 to q1 via epsilon transition, then a 0 from q1 to q2, again come back from q2 to q1 via epsilon transition and then finally from q1 to q2 when final 0 comes. If I remember correctly professor said as long as there is a state which accepts in the set of states an NFA could be in after the final transition, NFA will accept that string. Wouldn't that hold here too?
Sorry about the above example. Here’s one I thin works: say you have a DFA with two states q0 and q1, with a transition of 0 from q0 to q1 and a transition of 1 from q1 to q0, with q1 as the accepting state and q0 the start state. Also, let A be the language recognised by this DFA. Then surely A* won’t have a string ending in 1 since gluing together from A won’t give us this. However, if we make the initial state an accept state, the modified automaton will accept strings with 1.
for the rejected case (abb) why we didn't use "empty string" similar to the accepted case "aab"?
I wondered the same thing, but it was answered at around 54:50, if there is another input and the machine has nowhere to go, it "is just going to die", which is presumably a reject state. While you can get to an accept state with abb, the second b moves the machine away from an accept state to a reject state because it has nowhere to go.
7:50 Nondeterministic behavior.
Is every DFA an NFA? My initial guess was yes but not every DFA accepts the empty string. NFA must accept the empty string.
Thanks in advance.
An NFA does not have to accept the empty string, for example, if the starting state of the NFA is a non-accepting state and there aren’t any empty string transitions out of that state, it won’t accept the empty string.
DFAs are usually considered NFAs, as they are a subset of NFAs, but it seems like it can sometimes depend on what definition of NFA you are using. This is from the Wikipedia article for "Nondeterministic finite automaton": "… In particular, every DFA is also an NFA. Sometimes the term NFA is used in a narrower sense, referring to an NFA that is not a DFA, but not in this article."
Adding onto what Elliot said, the first NFA introduced by Professor Sipser does not accept the empty string :)
every DFA is NFA but not every NFA is DFA
Why is abb not accepted considering that after ab, and you are on state q3, you can take an epsilon transition to q4, then read b again, or is this not allowed?
That is not allowed. After reading ab, you are on q3. Let's say you take the epsilon transition. You end up at q4. But you are still not done reading the entire string. So you read b, and there is no transition. Your branch dies off, in the professors words. So you can't accept it, even though it is on q4.
TLDR: We check acceptance by seeing if we're at an accept state only after the entire string has been read.
@@karanahlawat9106 So that means even though the machine is at an accept state, but still hasn't read the string completely, and when it tries to read the next string but there is nowhere to go, so the branch just dies off( even though it was at the accept state just before reading the last string). Am I right?
@@praagyadhungel1357 Yeah. you only worry about the accept state after the string is read in its entirety.
why epsilon inputs are not represented in DFAs and only in NFAs?
sorry, it is answered at 13:13
Wait! How string ab be accepted since q3 not accept state/final state? 14:
Q4 can be reached with null/empty transition
There are two possible branches: 1) stay in q3 and 2) reach q4 with an epsilon transition. Since 2nd branch lands the machine in an accepting state, the machine accepts as a whole. Remember all you need is a single accepting state for the NFA to accept.
40:25 So True
Thanks sir!
are these courses incomplete, if so to what extent?
This course has: Syllabus, Calendar, Instructor Insights, Readings, Lecture Notes, Video Lectures, Assignments (no solutions), Exams (no solutions). See the course on MIT OpenCourseWare for more info and materials: ocw.mit.edu/18-404JF20. Best wishes on your studies!
@@mitocw ok, although my question was precisely if the content has been edited, but I'll figure it out for now
2:50
7:12
If you have any problems understanding this lecture, don't blame yourself. It is a somewhat careless, ambiguous and even erroneous description. The wikipedia page is better.
That's on top of the misleading, though conventional, use of the term 'Nondeterministic'. Perhaps 'Multi-deterministic' might be a more descriptive term.
Where's my coffee??😠😤
Regarding transformation of REs to NFAs, while I like the method used in his book, I think Thompson's construction is slightly more intuitive...
Just skimmed through Thompson's method, and it looks like it highly prioritizes the ability to implement that conversion by simply following the rules. As for intuitiveness - I think it's a matter of preference :)
It would be even better if there are some practical examples