Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> does a CS regular expression without stars have no matches?

It's worth elaborating this one, because testing whether an ordinary star-free regular expression has no matches is super easy. Not sure what the author means by "CS" (maybe just "computer science"?), but the actual problem is "Does a generalized regular expression without stars have no matches?" Here, "generalized" means that we have two new operators, beyond union (alternation) and concatenation: they are intersection and negation. [0]

You may recall that constructing an intersection automaton involves a cross product of NFAs, and constructing the negation automaton involves just flipping the accepting/rejecting states of a DFA. However, the NFA -> DFA construction can incur an exponential blowup of states.

So at the very least, intuition shows that the automata we're working with are enormous w.r.t. the expression size. But at this point my intuition stops, so I won't be able to explain how we get from here to a TOWER-complete decision procedure for whether the accepting set is empty.

[0] https://planetmath.org/generalizedregularexpression



I would assume CS regular expression means the computer sci version vs the (non-regular) regexes in popular use.


I find the need for this differentiation annoying. Referring to the class of languages is often useful, but I find most software engineers don’t have a basic understanding (and mine is only basic!) of classes of language. I always have to clarify what I mean by regular expression in these discussions. It’s disappointing that the term has been borrowed by frequently Turing complete string processing libraries.


Languages change. Meanings of words drift. Regex has been used this way since the mid-70s. That is a long time now.

However, software regexes aren't turing complete. I think they are just context sensitive.


I believe Perl regexes are Turing complete, although this is second hand info, I’ve never tried myself. Most are context sensitive though you’re right.


My guess would be context sensitive?


> You may recall that constructing an intersection automaton involves a cross product of NFAs

I'm not sure what you mean by "cross product" here, but there's nothing multiplicative involved. You just run both NFAs simultaneously. This gives you a number of states to track equal to the sum, not the product, of the two NFAs being intersected.


Most likely they mean Cartesian product instead of cross product.

Also whatever you are describing is not an intersection per se as it’s not a permutation of the two possible combined states, it’s a construction for doing certain computations on intersected NFAs given its constituents. If each of two intersected NFA has three states the intersection can be up to 9 (the cardinality of the Cartesian product) states


That's one way to perform the computation, but that's not an NFA. The product construction involves making a single new NFA to do the job.


It is an NFA in every way except that you no longer have a single list of accepting states. "Doing two different things simultaneously" is the whole concept of nondeterminism.

If you loosen the definition of "NFA" that you're working with from requiring a set of final states to requiring a function from a set of states to {0, 1}, everything will still work exactly the same way, all of your theorems will still hold, but intersecting two NFAs will consist of adding one state and adjusting the accept function.


> It is an NFA in every way except

So not an NFA. That's fine, you can always define "extended NFAs", and some variants are practically useful. For example, most practical "regex" implementations have features like backreferences that are very convenient, but strictly more powerful than DFAs. You just have to be aware that you lose all the well-established theory around NFAs if you do something like that.


> You just have to be aware that you lose all the well-established theory around NFAs if you do something like that.

As I explicitly observed above:

>> everything will still work exactly the same way, all of your theorems will still hold

you don't lose any of the established theory by doing this.


You are thinking of the union, where you can indeed just put the two NFAs “side-by-side”. This does not work for the intersection though, there you need the product construction.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: