List of all unicode’s open/close brackets?

There is a plain-text database of information about every Unicode character available from the Unicode Consortium; the format is described in Unicode Annex #44. The primary information is contained in UnicodeData.txt. Open and close punctuation characters are denoted with Ps (punctuation start) and Pe (punctuation end) in the General_Category field (the third field, delimited by ;). Look for those character, and you’ll find what you’re looking for.

Note that not all characters that you consider brackets may be listed; for instance, quotation marks (including “«»”). are indicated with Pi and Pf (initial and final punctuation), so you might want to include those as well. And some character, like < and >, are used as brackets in some contexts (such as HTML/XML), while they are considered math symbols (Sm) in UnicodeData.txt. Those you are going to have to find by hand; there is no pre-determined listing of those.

Here’s a quick Bash script to get this information, and its output. I’ve included both brackets and quotes. (note: on some Bash implementations UTF-8 printing has a bug that causes it to print U+00AB “«” and U+00BB “»” as “?”, and some terminals don’t have the ability to render all characters correctly.)

while IFS=';' read number name category rest
do 
    if [[ "$category" =~ Ps|Pe|Pi|Pf ]]
    then 
        printf "%s (U+%s, %s): \u"$number"\n" "$name" "$number" "$category"
    fi
done <UnicodeData.txt
LEFT PARENTHESIS (U+0028, Ps): (
RIGHT PARENTHESIS (U+0029, Pe): )
LEFT SQUARE BRACKET (U+005B, Ps): [
RIGHT SQUARE BRACKET (U+005D, Pe): ]
LEFT CURLY BRACKET (U+007B, Ps): {
RIGHT CURLY BRACKET (U+007D, Pe): }
LEFT-POINTING DOUBLE ANGLE QUOTATION MARK (U+00AB, Pi): «
RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK (U+00BB, Pf): »
TIBETAN MARK GUG RTAGS GYON (U+0F3A, Ps): ༺
TIBETAN MARK GUG RTAGS GYAS (U+0F3B, Pe): ༻
TIBETAN MARK ANG KHANG GYON (U+0F3C, Ps): ༼
TIBETAN MARK ANG KHANG GYAS (U+0F3D, Pe): ༽
OGHAM FEATHER MARK (U+169B, Ps): ᚛
OGHAM REVERSED FEATHER MARK (U+169C, Pe): ᚜
LEFT SINGLE QUOTATION MARK (U+2018, Pi): ‘
RIGHT SINGLE QUOTATION MARK (U+2019, Pf): ’
SINGLE LOW-9 QUOTATION MARK (U+201A, Ps): ‚
SINGLE HIGH-REVERSED-9 QUOTATION MARK (U+201B, Pi): ‛
LEFT DOUBLE QUOTATION MARK (U+201C, Pi): “
RIGHT DOUBLE QUOTATION MARK (U+201D, Pf): ”
DOUBLE LOW-9 QUOTATION MARK (U+201E, Ps): „
DOUBLE HIGH-REVERSED-9 QUOTATION MARK (U+201F, Pi): ‟
SINGLE LEFT-POINTING ANGLE QUOTATION MARK (U+2039, Pi): ‹
SINGLE RIGHT-POINTING ANGLE QUOTATION MARK (U+203A, Pf): ›
LEFT SQUARE BRACKET WITH QUILL (U+2045, Ps): ⁅
RIGHT SQUARE BRACKET WITH QUILL (U+2046, Pe): ⁆
SUPERSCRIPT LEFT PARENTHESIS (U+207D, Ps): ⁽
SUPERSCRIPT RIGHT PARENTHESIS (U+207E, Pe): ⁾
SUBSCRIPT LEFT PARENTHESIS (U+208D, Ps): ₍
SUBSCRIPT RIGHT PARENTHESIS (U+208E, Pe): ₎
LEFT-POINTING ANGLE BRACKET (U+2329, Ps): 〈
RIGHT-POINTING ANGLE BRACKET (U+232A, Pe): 〉
MEDIUM LEFT PARENTHESIS ORNAMENT (U+2768, Ps): ❨
MEDIUM RIGHT PARENTHESIS ORNAMENT (U+2769, Pe): ❩
MEDIUM FLATTENED LEFT PARENTHESIS ORNAMENT (U+276A, Ps): ❪

Leave a Comment

tech