Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I've always seen "sanitization" as more of an output-encoding problem.

People love to consider sanitizing the inputs, but how you do so doesn't depend on the inputs but on the specific usage of it - more-or-less the output of your program.

Rather than trying to think of all the ways the inputs to your program could be abused to cause abuse, I find that it is safer to start at where the output occurs - database calls, system calls, etc. The most commonly used of these (database calls, shell commands, etc) tend to have a variety of encoding capabilities to ensure that when you want to stick a string in a particular place it does exactly that regardless of whether the string came from user input or elsewhere. For example, bind parameters for databases, or proper escaping functions.

If you think about it as sanitizing input it means you tend to misplace your attention to detail and only consider the entry to your application. A single input is often used to do multiple things through a program so you cannot properly handle sanitization at input.

The real push should be for proper output encoding, not input sanitization.



The purpose of sanitizing input is not to prevent security vulnerabilities. It is to make sure the values taken by your program are valid. If you accept a number range, and the user inputs a word, it's invalid input for your parameter and your program will crash. Input sanitizing validates the input is correct for your use. It indirectly improves security, but is not itself a practice of making an app more secure.


The term "sanitizing" is not used to reference this, as commented on, what you are describing is "validating" the user input. That should, of course, happen. Many validations will result in only accepting input that happens to be safe for many uses - i.e., if it's a valid number between 1-100 you could of course send it to an integer field in a database without doing any special encoding, but I wouldn't rely on my input validation doing this in my model layer.

Encoding a "safe" value doesn't make things any less safe. Failure to encode it, however, leaves potential holes in your application. Something may bypass input validation and be given to the database as an unsafe, unvalidated value. Usage of the value may change (new functionality using it differently, changed storage in database, etc) and in the new usage the value may not be safe.

Input validation is obviously something you want to do, but it should never be relied upon for protecting from injection attacks.


You actually said it, which is funny, but the right word for this is "validating".

Here's the chain:

1. Get raw input.

2. Validate it (number, not number, in range, not in range?)

3. Optionally format it to canonical format (i.e. trim whitespace etc.)

... later....

4. Encode it for where you want to use it (SQL, HTML etc.).

Sometimes steps 2 and 3 are done in the opposite order, or as an atomic single operation, but point is, we have perfectly reasonable words for all that: validating, formatting, encoding.


This is why I just call it encoding and decoding. Proper words, and assume context (encoding for what... decoding from what).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: