If you don’t know if a stringlike object is a Python 2 string (bytes) or Python 3 string (unicode). You could have a generic converter.
Python3 shell:
>>> def to_bytes(s): ... if type(s) is bytes: ... return s ... elif type(s) is str or (sys.version_info[0] < 3 and type(s) is unicode): ... return codecs.encode(s, 'utf-8') ... else: ... raise TypeError("Expected bytes or string, but got %s." % type(s)) ... >>> to_bytes("hello") b'hello' >>> to_bytes("hello".encode('utf-8')) b'hello'
On Python 2 both these expressions evaluate to True
: type("hello") == bytes
and type("hello") == str
. And type(u"hello") == str
evaluates to False
, while type(u"hello") == unicode
is True
.
On Python 3 type("hello") == bytes
is False
, and type("hello") == str
is True
. And type("hello") == unicode
raises a NameError
exception since unicode
isn’t defined on 3.
Python 2 shell:
>>> to_bytes(u"hello") 'hello' >>> to_bytes("hello") 'hello'