There is a plain-text database of information about every Unicode character available from the Unicode Consortium; the format is described in Unicode Annex #44. The primary information is contained in UnicodeData.txt. Open and close punctuation characters are denoted with Ps
(punctuation start) and Pe
(punctuation end) in the General_Category field (the third field, delimited by ;
). Look for those character, and you’ll find what you’re looking for.
Note that not all characters that you consider brackets may be listed; for instance, quotation marks (including “«»”). are indicated with Pi
and Pf
(initial and final punctuation), so you might want to include those as well. And some character, like <
and >
, are used as brackets in some contexts (such as HTML/XML), while they are considered math symbols (Sm
) in UnicodeData.txt. Those you are going to have to find by hand; there is no pre-determined listing of those.
Here’s a quick Bash script to get this information, and its output. I’ve included both brackets and quotes. (note: on some Bash implementations UTF-8 printing has a bug that causes it to print U+00AB “«” and U+00BB “»” as “?”, and some terminals don’t have the ability to render all characters correctly.)
while IFS=';' read number name category rest do if [[ "$category" =~ Ps|Pe|Pi|Pf ]] then printf "%s (U+%s, %s): \u"$number"\n" "$name" "$number" "$category" fi done <UnicodeData.txt
LEFT PARENTHESIS (U+0028, Ps): ( RIGHT PARENTHESIS (U+0029, Pe): ) LEFT SQUARE BRACKET (U+005B, Ps): [ RIGHT SQUARE BRACKET (U+005D, Pe): ] LEFT CURLY BRACKET (U+007B, Ps): { RIGHT CURLY BRACKET (U+007D, Pe): } LEFT-POINTING DOUBLE ANGLE QUOTATION MARK (U+00AB, Pi): « RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK (U+00BB, Pf): » TIBETAN MARK GUG RTAGS GYON (U+0F3A, Ps): ༺ TIBETAN MARK GUG RTAGS GYAS (U+0F3B, Pe): ༻ TIBETAN MARK ANG KHANG GYON (U+0F3C, Ps): ༼ TIBETAN MARK ANG KHANG GYAS (U+0F3D, Pe): ༽ OGHAM FEATHER MARK (U+169B, Ps): ᚛ OGHAM REVERSED FEATHER MARK (U+169C, Pe): ᚜ LEFT SINGLE QUOTATION MARK (U+2018, Pi): ‘ RIGHT SINGLE QUOTATION MARK (U+2019, Pf): ’ SINGLE LOW-9 QUOTATION MARK (U+201A, Ps): ‚ SINGLE HIGH-REVERSED-9 QUOTATION MARK (U+201B, Pi): ‛ LEFT DOUBLE QUOTATION MARK (U+201C, Pi): “ RIGHT DOUBLE QUOTATION MARK (U+201D, Pf): ” DOUBLE LOW-9 QUOTATION MARK (U+201E, Ps): „ DOUBLE HIGH-REVERSED-9 QUOTATION MARK (U+201F, Pi): ‟ SINGLE LEFT-POINTING ANGLE QUOTATION MARK (U+2039, Pi): ‹ SINGLE RIGHT-POINTING ANGLE QUOTATION MARK (U+203A, Pf): › LEFT SQUARE BRACKET WITH QUILL (U+2045, Ps): ⁅ RIGHT SQUARE BRACKET WITH QUILL (U+2046, Pe): ⁆ SUPERSCRIPT LEFT PARENTHESIS (U+207D, Ps): ⁽ SUPERSCRIPT RIGHT PARENTHESIS (U+207E, Pe): ⁾ SUBSCRIPT LEFT PARENTHESIS (U+208D, Ps): ₍ SUBSCRIPT RIGHT PARENTHESIS (U+208E, Pe): ₎ LEFT-POINTING ANGLE BRACKET (U+2329, Ps): 〈 RIGHT-POINTING ANGLE BRACKET (U+232A, Pe): 〉 MEDIUM LEFT PARENTHESIS ORNAMENT (U+2768, Ps): ❨ MEDIUM RIGHT PARENTHESIS ORNAMENT (U+2769, Pe): ❩ MEDIUM FLATTENED LEFT PARENTHESIS ORNAMENT (U+276A, Ps): ❪