<?xml version="1.0" encoding="utf-8"?>
<feed xml:lang="en-us" xmlns="http://www.w3.org/2005/Atom"><title>Simon Willison's Weblog: escaping</title><link href="http://simonwillison.net/" rel="alternate"/><link href="http://simonwillison.net/tags/escaping.atom" rel="self"/><id>http://simonwillison.net/</id><updated>2010-07-04T18:23:00+00:00</updated><author><name>Simon Willison</name></author><entry><title>Escaping regular expression characters in JavaScript (updated)</title><link href="https://simonwillison.net/2010/Jul/4/escaping/#atom-tag" rel="alternate"/><published>2010-07-04T18:23:00+00:00</published><updated>2010-07-04T18:23:00+00:00</updated><id>https://simonwillison.net/2010/Jul/4/escaping/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="http://simonwillison.net/2006/Jan/20/escape/#p-6"&gt;Escaping regular expression characters in JavaScript (updated)&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
The JavaScript regular expression meta-character escaping code I posted back in 2006 has some serious flaws—I’ve just posted an update to the original post.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/escaping"&gt;escaping&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/javascript"&gt;javascript&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/regular-expressions"&gt;regular-expressions&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/recovered"&gt;recovered&lt;/a&gt;&lt;/p&gt;



</summary><category term="escaping"/><category term="javascript"/><category term="regular-expressions"/><category term="recovered"/></entry><entry><title>Unicode code converter</title><link href="https://simonwillison.net/2009/Dec/15/unicode/#atom-tag" rel="alternate"/><published>2009-12-15T22:10:29+00:00</published><updated>2009-12-15T22:10:29+00:00</updated><id>https://simonwillison.net/2009/Dec/15/unicode/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="http://rishida.net/tools/conversion/"&gt;Unicode code converter&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Fantastically useful tool to convert strings of characters in to every unicode and/or escaping syntax you can possibly imagine.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="http://open.blogs.nytimes.com/2009/12/15/what-were-trolling/"&gt;NYTimes Open Blog&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/escaping"&gt;escaping&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/tools"&gt;tools&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/unicode"&gt;unicode&lt;/a&gt;&lt;/p&gt;



</summary><category term="escaping"/><category term="tools"/><category term="unicode"/></entry><entry><title>Reducing XSS by way of Automatic Context-Aware Escaping in Template Systems</title><link href="https://simonwillison.net/2009/Apr/14/contextaware/#atom-tag" rel="alternate"/><published>2009-04-14T09:26:04+00:00</published><updated>2009-04-14T09:26:04+00:00</updated><id>https://simonwillison.net/2009/Apr/14/contextaware/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="http://googleonlinesecurity.blogspot.com/2009/03/reducing-xss-by-way-of-automatic.html"&gt;Reducing XSS by way of Automatic Context-Aware Escaping in Template Systems&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
The Google Online Security Blog reminds us that simply HTML-escaping everything isn’t enough—the type of escaping needed depends on the current markup context, for example variables inside JavaScript blocks should be escaped differently. Google’s open source Ctemplate library uses an HTML parser to keep track of the current context and apply the correct escaping function automatically.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="http://www.reddit.com/r/programming/comments/8c6os/hacker_targets_twitter_to_teach_the_company_a/c08u1k8"&gt;taviso&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ctemplate"&gt;ctemplate&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/django"&gt;django&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/escaping"&gt;escaping&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/google"&gt;google&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/html"&gt;html&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/open-source"&gt;open-source&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/security"&gt;security&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/xss"&gt;xss&lt;/a&gt;&lt;/p&gt;



</summary><category term="ctemplate"/><category term="django"/><category term="escaping"/><category term="google"/><category term="html"/><category term="open-source"/><category term="security"/><category term="xss"/></entry><entry><title>Escaping regular expression characters in JavaScript</title><link href="https://simonwillison.net/2006/Jan/20/escape/#atom-tag" rel="alternate"/><published>2006-01-20T12:19:13+00:00</published><updated>2006-01-20T12:19:13+00:00</updated><id>https://simonwillison.net/2006/Jan/20/escape/#atom-tag</id><summary type="html">
    &lt;p id="p-0"&gt;JavaScript's support for regular expressions is generally pretty good, but there is one notable omission: an escaping mechanism for literal strings. Say for example you need to create a regular expression that removes a specific string from the end of a string. If you know the string you want to remove when you write the script this is easy:&lt;/p&gt;

&lt;div class="highlight highlight-source-js"&gt;&lt;pre&gt;&lt;span class="pl-k"&gt;var&lt;/span&gt; &lt;span class="pl-s1"&gt;newString&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-s1"&gt;oldString&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-en"&gt;replace&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-pds"&gt;&lt;span class="pl-c1"&gt;/&lt;/span&gt;&lt;span class="pl-s"&gt;R&lt;/span&gt;&lt;span class="pl-s"&gt;e&lt;/span&gt;&lt;span class="pl-s"&gt;m&lt;/span&gt;&lt;span class="pl-s"&gt;o&lt;/span&gt;&lt;span class="pl-s"&gt;v&lt;/span&gt;&lt;span class="pl-s"&gt;e&lt;/span&gt;&lt;span class="pl-s"&gt; &lt;/span&gt;&lt;span class="pl-s"&gt;f&lt;/span&gt;&lt;span class="pl-s"&gt;r&lt;/span&gt;&lt;span class="pl-s"&gt;o&lt;/span&gt;&lt;span class="pl-s"&gt;m&lt;/span&gt;&lt;span class="pl-s"&gt; &lt;/span&gt;&lt;span class="pl-s"&gt;e&lt;/span&gt;&lt;span class="pl-s"&gt;n&lt;/span&gt;&lt;span class="pl-s"&gt;d&lt;/span&gt;&lt;span class="pl-cce"&gt;$&lt;/span&gt;&lt;span class="pl-c1"&gt;/&lt;/span&gt;&lt;/span&gt;&lt;span class="pl-kos"&gt;,&lt;/span&gt; &lt;span class="pl-s"&gt;''&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt;&lt;span class="pl-kos"&gt;;&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p id="p-1"&gt;But what if the string to be removed comes from a variable? You'll need to construct a regular expression from the variable, using the RegExp constructor function:&lt;/p&gt;

&lt;div class="highlight highlight-source-js"&gt;&lt;pre&gt;&lt;span class="pl-k"&gt;var&lt;/span&gt; &lt;span class="pl-s1"&gt;re&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-k"&gt;new&lt;/span&gt; &lt;span class="pl-v"&gt;RegExp&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-s1"&gt;stringToRemove&lt;/span&gt; &lt;span class="pl-c1"&gt;+&lt;/span&gt; &lt;span class="pl-s"&gt;'$'&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt;&lt;span class="pl-kos"&gt;;&lt;/span&gt;
&lt;span class="pl-k"&gt;var&lt;/span&gt; &lt;span class="pl-s1"&gt;newString&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-s1"&gt;oldString&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-en"&gt;replace&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-s1"&gt;re&lt;/span&gt;&lt;span class="pl-kos"&gt;,&lt;/span&gt; &lt;span class="pl-s"&gt;''&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt;&lt;span class="pl-kos"&gt;;&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p id="p-2"&gt;But what if the string you want to remove may contain regular expression metacharacters - characters like $ or . that affect the behaviour of the expression? Languages such as Python provide functions for escaping these characters (see &lt;a href="https://docs.python.org/2/library/re.html#re.escape" title="Python re module contents"&gt;re.escape&lt;/a&gt;); with JavaScript you have to write your own.&lt;/p&gt;

&lt;p id="p-3"&gt;Here's mine:&lt;/p&gt;

&lt;div class="highlight highlight-source-js"&gt;&lt;pre&gt;&lt;span class="pl-v"&gt;RegExp&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-en"&gt;escape&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-k"&gt;function&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-s1"&gt;text&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt; &lt;span class="pl-kos"&gt;{&lt;/span&gt;
  &lt;span class="pl-k"&gt;if&lt;/span&gt; &lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-c1"&gt;!&lt;/span&gt;&lt;span class="pl-smi"&gt;arguments&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-c1"&gt;callee&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-c1"&gt;sRE&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt; &lt;span class="pl-kos"&gt;{&lt;/span&gt;
    &lt;span class="pl-k"&gt;var&lt;/span&gt; &lt;span class="pl-s1"&gt;specials&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-kos"&gt;[&lt;/span&gt;
      &lt;span class="pl-s"&gt;'/'&lt;/span&gt;&lt;span class="pl-kos"&gt;,&lt;/span&gt; &lt;span class="pl-s"&gt;'.'&lt;/span&gt;&lt;span class="pl-kos"&gt;,&lt;/span&gt; &lt;span class="pl-s"&gt;'*'&lt;/span&gt;&lt;span class="pl-kos"&gt;,&lt;/span&gt; &lt;span class="pl-s"&gt;'+'&lt;/span&gt;&lt;span class="pl-kos"&gt;,&lt;/span&gt; &lt;span class="pl-s"&gt;'?'&lt;/span&gt;&lt;span class="pl-kos"&gt;,&lt;/span&gt; &lt;span class="pl-s"&gt;'|'&lt;/span&gt;&lt;span class="pl-kos"&gt;,&lt;/span&gt;
      &lt;span class="pl-s"&gt;'('&lt;/span&gt;&lt;span class="pl-kos"&gt;,&lt;/span&gt; &lt;span class="pl-s"&gt;')'&lt;/span&gt;&lt;span class="pl-kos"&gt;,&lt;/span&gt; &lt;span class="pl-s"&gt;'['&lt;/span&gt;&lt;span class="pl-kos"&gt;,&lt;/span&gt; &lt;span class="pl-s"&gt;']'&lt;/span&gt;&lt;span class="pl-kos"&gt;,&lt;/span&gt; &lt;span class="pl-s"&gt;'{'&lt;/span&gt;&lt;span class="pl-kos"&gt;,&lt;/span&gt; &lt;span class="pl-s"&gt;'}'&lt;/span&gt;&lt;span class="pl-kos"&gt;,&lt;/span&gt; &lt;span class="pl-s"&gt;'\\'&lt;/span&gt;
    &lt;span class="pl-kos"&gt;]&lt;/span&gt;&lt;span class="pl-kos"&gt;;&lt;/span&gt;
    &lt;span class="pl-smi"&gt;arguments&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-c1"&gt;callee&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-c1"&gt;sRE&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-k"&gt;new&lt;/span&gt; &lt;span class="pl-v"&gt;RegExp&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;
      &lt;span class="pl-s"&gt;'(\\'&lt;/span&gt; &lt;span class="pl-c1"&gt;+&lt;/span&gt; &lt;span class="pl-s1"&gt;specials&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-en"&gt;join&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-s"&gt;'|\\'&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt; &lt;span class="pl-c1"&gt;+&lt;/span&gt; &lt;span class="pl-s"&gt;')'&lt;/span&gt;&lt;span class="pl-kos"&gt;,&lt;/span&gt; &lt;span class="pl-s"&gt;'g'&lt;/span&gt;
    &lt;span class="pl-kos"&gt;)&lt;/span&gt;&lt;span class="pl-kos"&gt;;&lt;/span&gt;
  &lt;span class="pl-kos"&gt;}&lt;/span&gt;
  &lt;span class="pl-k"&gt;return&lt;/span&gt; &lt;span class="pl-s1"&gt;text&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-en"&gt;replace&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-smi"&gt;arguments&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-c1"&gt;callee&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-c1"&gt;sRE&lt;/span&gt;&lt;span class="pl-kos"&gt;,&lt;/span&gt; &lt;span class="pl-s"&gt;'\\$1'&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt;&lt;span class="pl-kos"&gt;;&lt;/span&gt;
&lt;span class="pl-kos"&gt;}&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p id="p-4"&gt;This deals with another common problem in JavaScript: compiling a regular expression once (rather than every time you use it) while keeping it local to a function. &lt;code&gt;argmuments.callee&lt;/code&gt; inside a function always refers to the function itself, and since JavaScript functions are objects you can store properties on them. In this case, the first time the function is run it compiles a regular expression and stashes it in the sRE property. On subsequent calls the pre-compiled expression can be reused.&lt;/p&gt;

&lt;p id="p-5"&gt;In the above snippet I've added my function as a property of the &lt;code&gt;RegExp&lt;/code&gt; constructor. There's no pressing reason to do this other than a desire to keep generic functionality relating to regular expression handling the same place. If you rename the function it will still work as expected, since the use of &lt;code&gt;arguments.callee&lt;/code&gt; eliminates any coupling between the function definition and the rest of the code.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Update 18th Feb 2025&lt;/strong&gt;: 19 years after I published this &lt;code&gt;RegExp.escape()&lt;/code&gt; has &lt;a href="https://simonwillison.net/2025/Feb/18/tc39proposal-regex-escaping/"&gt;made it into the language&lt;/a&gt;!&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/escaping"&gt;escaping&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/javascript"&gt;javascript&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/regular-expressions"&gt;regular-expressions&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="escaping"/><category term="javascript"/><category term="regular-expressions"/></entry></feed>