No visible cause for “Unexpected token ILLEGAL”

The error

When code is parsed by the JavaScript interpreter, it gets broken into pieces called “tokens”. When a token cannot be classified into one of the four basic token types, it gets labelled “ILLEGAL” on most implementations, and this error is thrown.

The same error is raised if, for example, you try to run a js file with a rogue @ character, a misplaced curly brace, bracket, “smart quotes”, single quotes not enclosed properly (e.g. this.run('dev1)) and so on.

A lot of different situations can cause this error. But if you don’t have any obvious syntax error or illegal character, it may be caused by an invisible illegal character. That’s what this answer is about.

But I can’t see anything illegal!

There is an invisible character in the code, right after the semicolon. It’s the Unicode U+200B Zero-width space character (a.k.a. ZWSP, HTML entity ​). That character is known to cause the Unexpected token ILLEGAL JavaScript syntax error.

And where did it come from?

I can’t tell for sure, but my bet is on jsfiddle. If you paste code from there, it’s very likely to include one or more U+200B characters. It seems the tool uses that character to control word-wrapping on long strings.

UPDATE 2013-01-07

After the latest jsfiddle update, it’s now showing the character as a red dot like codepen does. Apparently, it’s also not inserting U+200B characters on its own anymore, so this problem should be less frequent from now on.

UPDATE 2015-03-17

Vagrant appears to sometimes cause this issue as well, due to a bug in VirtualBox. The solution, as per this blog post is to set sendfile off; in your nginx config, or EnableSendfile Off if you use Apache.

It’s also been reported that code pasted from the Chrome developer tools may include that character, but I was unable to reproduce that with the current version (22.0.1229.79 on OSX).

How can I spot it?

The character is invisible, do how do we know it’s there? You can ask your editor to show invisible characters. Most text editors have this feature. Vim, for example, displays them by default, and the ZWSP shows as <u200b>. You can also debug it online: jsbin displays the character as a red dot on its code panes (but seems to remove it after saving and reloading the page). CodePen.io also displays it as a dot, and keeps it even after saving.

Related problems

That character is not something bad, it can actually be quite useful. This example on Wikipedia demonstrates how it can be used to control where a long string should be wrapped to the next line. However, if you are unaware of the character’s presence on your markup, it may become a problem. If you have it inside of a string (e.g., the nodeValue of a DOM element that has no visible content), you might expect such string to be empty, when in fact it’s not (even after applying String.trim).

ZWSP can also cause extra whitespace to be displayed on an HTML page, for example when it’s found between two <div> elements (as seen on this question). This case is not even reproducible on jsfiddle, since the character is ignored there.

Another potential problem: if the web page’s encoding is not recognized as UTF-8, the character may actually be displayed (as ​ in latin1, for example).

If ZWSP is present on CSS code (inline code, or an external stylesheet), styles can also not be parsed properly, so some styles don’t get applied (as seen on this question).

The ECMAScript Specification

I couldn’t find any mention to that specific character on the ECMAScript Specification (versions 3 and 5.1). The current version mentions similar characters (U+200C and U+200D) on Section 7.1, which says they should be treated as IdentifierParts when “outside of comments, string literals, and regular expression literals”. Those characters may, for example, be part of a variable name (and var x\u200c; indeed works).

Section 7.2 lists the valid White space characters (such as tab, space, no-break space, etc.), and vaguely mentions that any other Unicode “space separator” (category “Zs”) should be treated as white space. I’m probably not the best person to discuss the specs in this regard, but it seems to me that U+200B should be considered white space according to that, when in fact the implementations (at least Chrome and Firefox) appear to treat them as an unexpected token (or part of one), causing the syntax error.

Leave a Comment