I really liked this brief overview of type systems overall, but I especially liked that the author took the effort to move away from the terms "strongly typed" and "weakly typed". Too often these terms cause confusion. Even worse is when people confuse them for "statically typed" and "dynamically typed" (respectively), leading to people making claims like "Python is weakly typed" and "C is strongly typed" (which, depending on your definitions, are not true statements).
This material wasn't covered in my core curriculum for my CS degrees, which I think is a shame. Some students might find this stuff interesting, and I think introducing it to undergrads at least a little bit and saying "By the way, this is a thing you can research" could help some students with breaking into the academic side of CS.
2. Can I make this type system map to concepts in the problem domain?
In C, the answer to 1 is "yes" and the answer to 2 is "to a limited extent". In fact, in C, you can subvert the type system so easily that it's considered impossible not to, to the extent that using the standard library requires throwing type information away, by using pointers-to-void or generic "char" buffers, and as for the second... well, Apps Hungarian notation exists for a reason. (Systems Hungarian exists for a reason, too, but I'll try to keep the insults out of this post.)
The first question is interesting because it tells you if the type system can be used to enforce invariants, and the second question is interesting because it tells you if those invariants will be worth enforcing. If you're writing in Standard Pascal, and your type system enforces array length as a part of array type, well, to quote a metal album: "So Far, So Good... So What?" What does that tell me about whether my program is semantically correct as regards the problem domain? Similarly with enforcing size specification types: Differentiating between a short, an int, and a long tells me Sweet Fanny Adams about what values of those sizes mean in my program. Typing isn't "strong" if it's enforcing rules which ultimately make no sense, and if it's "sound", well, mathematical proofs of something I'm not interested in are themselves uninteresting.
I'd extend 1 to include "can I easily avoid subverting this type system?" Guarantees are better, but as long as I can get help that's enough for a tool to be useful assuming 2.
Regarding 2, I've long complained that the conception of types embedded in many systems (and many heads) is very representational and most of the time we don't actually care about representation. The exceptions I typically mention are where we care a lot about performance, and at system boundaries. We can fit those into your analysis by asserting (reasonably, I think) that in those instances representation starts to be a part of your problem domain.
I'd also extend 1 to include "can I subvert this type system when I really need to?"
(And to those who would say "You never need to", I don't think you can be quite that dogmatic. The real world includes a lot of unforeseen scenarios...)
Aha! I will counter that you never actually want or need to break the type comprehension of your executable but the static analysis of the compiler or dynamic analysis at runtime may be inaccurate with regards to the truth and motivate you to try and subvert it. This may be about what you were thinking when you wrote your comment but I wanted to highlight a small but important difference. I may find a useful bitwise function that takes arguments of type int and really want to execute the same bitwise function on a pair of doubles, in this case the actual type of data I want the function to operate on is "a series of bytes" but the compiler/whatever's typing system may force me to declare the function as taking an IEEE 754 or two's complement integer instead.
In these cases the typing system is constraining you from declaring a type safe operation the right way by forcing you to play within a subset of the universe of valid types.
Imagine that you have a binary-only third-party library, one function of which returns an opaque type. Imagine that you want to unpack that and observe the state of some data inside it (perhaps guided by online hints), and the vendor won't give you the ability to do so because 1) they're busy, or 2) they want a lot of money, or 3) they're out of business. You don't want to have to subvert the type system, but you still kind of need to.
And if you're going to tell me that the only reason you need to in this scenario is because the third-party vendor didn't do their job (including creating the types) right, I'd agree. That doesn't actually change what you need to do, though.
I agree, but in that case you're fighting against the typing system more than typing itself, whatever you are doing with that thing and however you would abstractly define that type (maybe as "We expect this thing to have the type of the thing that terrible third party library usually gives us back") you will make assumptions based on the fact that that value is actually one of those and you never want those assumptions violated. In this case the typing system is too restrictive and it might be a case where duck-typing would come in handy (since you can usually post-apply types in that sort of a setting).
It's an important question. Systems where you can't ever subvert the type system may be useful in some places (sandboxing) where systems with an escape hatch aren't, and maybe vice-versa (plausibly "in theory", certainly "in practice" IME). The two might be variants of the same underlying language, though - Safe Haskell is at least an attempt in this space; I don't know how well it delivers.
I've always felt type coercion is a pragmatic recognition of a lack of time, but also a lack of analysis and abstraction. If you are type coercing in an existing body of code, you probably walked into an ill-understood quirk in the key information model, or a change in circumstance.
If you are routinely coercing types in new code, you haven't done enough design and analysis. Something is off-key.
>"well, Apps Hungarian notation exists for a reason. (Systems Hungarian exists for a reason, too, but I'll try to keep the insults out of this post.)"
Can you elaborate? I read the wiki page on Apps Hungarian. I am not following. How does/would this address issues with the type system in C?
>"Can I make this type system map to concepts in the problem domain?"
I didn't really follow your Pascal example explanation of this, you first mentioned "because it tells you if the type system can be used to enforce invariants." But the went on to talk about array lengths. How does whether or not the invariants are enforced by the type system help you answer the question of whether or not it maps to the problem domain? Might you be able to provide another example? Thanks.
Apps Hungarian is, to a first approximation, a way for your program to use a type system not supported by the language. It's an ugly hack, in no small part because the type-checking is done visually by the programmer, rather than automatically by the compiler. People only use it if there's substantial value in the type system they're bringing in... which generally only happens if the language's type system is inexpressive.
Regarding Pascal, it's important to note that the array length is not an optional part of the type. An array of length 10 is type-incompatible with an array of length 20. A function whose input is an array must specify the length of the array it accepts, and the function cannot be called with an argument of different length. I assume the author's overall point is that this type system, while strong, is not useful for meaningful work.
> Can you elaborate? I read the wiki page on Apps Hungarian. I am not following. How does/would this address issues with the type system in C?
Apps Hungarian allows you to encode information about the value a variable contains which you can't capture in the type system.
For example, imagine you're writing a program which handles user input. You need to distinguish between Sanitized Strings and Unsanitized Strings because if you don't, you open yourself up to security problems. Absent a way to extend the type system to put this information in a type, you do this:
char *usStr; /* An Unsanitized String */
char *snStr; /* A SaNitized string */
You add a couple letters to the beginnings of the variable names to encode what the type system doesn't.
You can see that if you have a line of code which says:
snStr = usStr;
it is wrong, and your brain can print a full-color warning message with highlighting.
Bottom Line: Apps Hungarian makes it easier for you to be the type checker.
> I didn't really follow your Pascal example explanation of this, you first mentioned "because it tells you if the type system can be used to enforce invariants." But the went on to talk about array lengths. How does whether or not the invariants are enforced by the type system help you answer the question of whether or not it maps to the problem domain? Might you be able to provide another example? Thanks.
Array lengths are an invariant. They're just one I don't care about, because nothing in the problem domain is modeled by how long a given array is. The contents of those arrays are much more important, and the types should vary based on that, instead.
A somewhat simplistic example:
You have a program which prints travel itineraries for people who must be addressed correctly along the way, where correctly means using their prenomial and postnomial titles. For example "Dr. Mary Richards, Ph.D." would be insulted to be merely "Mary Richards" or "Dr. Mary Richards, MD". You also have to print route information, which is a list of towns and states, like "Baltimore, MD".
Therefore, you have to ensure three things: Prenomials are always prepended to names, postnomials are always appended to names, and town names are always suffixed with state abbreviations. Oh, and if you print a name without prenomials and postnomials, you'll not escape unscathed.
Wouldn't it be nice if the type system were capable of keeping track of which strings were prenomials, postnomials, personal names, town names, and state abbreviations, to ensure you never try to append a state abbreviation onto a personal name, and never try to print a personal name alone? Those are the kinds of interesting invariants I'm talking about.
This material wasn't covered in my core curriculum for my CS degrees, which I think is a shame. Some students might find this stuff interesting, and I think introducing it to undergrads at least a little bit and saying "By the way, this is a thing you can research" could help some students with breaking into the academic side of CS.