How to remove xa0 from string in Python?

I am currently using Beautiful Soup to parse an HTML file and calling get_text(), but it seems like I’m being left with a lot of \xa0 Unicode representing spaces. Is there an efficient way to remove all of them in Python 2.7, and change them into spaces? I guess the more generalized question would be, is there a way to remove Unicode formatting?

I tried using: line = line.replace(u'\xa0',' '), as suggested by another thread, but that changed the \xa0’s to u’s, so now I have “u”s everywhere instead. ):

EDIT: The problem seems to be resolved by str.replace(u'\xa0', ' ').encode('utf-8'), but just doing .encode('utf-8') without replace() seems to cause it to spit out even weirder characters, \xc2 for instance. Can anyone explain this?

What is the meaning of [:] in python [duplicate]
Python – ‘ascii’ codec can’t decode byte
UnicodeEncodeError: ‘ascii’ codec can’t encode character u’\xa0′ in position 20: ordinal not in range(128)
Using unicode character u201c
What is a unicode string?
Convert Unicode to ASCII without errors in Python
TypeError: ‘int’ object is not subscriptable
TypeError: ‘int’ object is not callable
TypeError: ‘int’ object is not callable
TypeError: ‘int’ object is not callable
How to resolve TypeError: can only concatenate str (not “int”) to str [duplicate]
Cannot find module cv2 when using OpenCV
Where is BeautifulSoup4 hiding?
Cannot find module cv2 when using OpenCV
Python ‘If not’ syntax [duplicate]
RuntimeWarning: invalid value encountered in divide
Converting dictionary to JSON
How to correct TypeError: Unicode-objects must be encoded before hashing?
WinError 2 The system cannot find the file specified (Python)
IndexError: tuple index out of range —– Python
sklearn error ValueError: Input contains NaN, infinity or a value too large for dtype(‘float64’)
sklearn error ValueError: Input contains NaN, infinity or a value too large for dtype(‘float64’)
TypeError: cannot perform reduce with flexible type
Could not find a version that satisfies the requirement tensorflow
sklearn error ValueError: Input contains NaN, infinity or a value too large for dtype(‘float64’)
Could not find a version that satisfies the requirement tensorflow
Local variable referenced before assignment?
ln (Natural Log) in Python
ImportError: numpy.core.multiarray failed to import
Python Traceback (most recent call last)
Unable to plot Double Bar, Bar plot using pyplot for ndarray
How to pip or easy_install tkinter on Windows
Cannot find module cv2 when using OpenCV
Convert list to tuple in Python
ImportError: numpy.core.multiarray failed to import
error UnicodeDecodeError: ‘utf-8’ codec can’t decode byte 0xff in position 0: invalid start byte
u’\ufeff’ in Python string
Understand the Find() function in Beautiful Soup
Why are Python’s ‘private’ methods not actually private?
Remove list from list in Python
How to create a new text file using Python
Python – Reading and writing csv files with utf-8 encoding
TypeError: write() argument must be str, not bytes (Python 3 vs Python 2 )
What does the ‘b’ character do in front of a string literal?
python socket.error: [Errno 98] Address already in use
How to import files in python using sys.path.append?
How do I install the yaml package for Python?
Check string “None” or “not” in Python 2.7
Change figure size and figure format in matplotlib
What does the ‘b’ character do in front of a string literal?
how to update spyder on anaconda
how does \r (carriage return) work in Python
What is Python buffer type for?
AttributeError(“‘str’ object has no attribute ‘read'”)
IndexError: index 1 is out of bounds for axis 0 with size 1/ForwardEuler
Python Save to file
(unicode error) ‘unicodeescape’ codec can’t decode bytes in position 2-3: truncated \UXXXXXXXX escape
Add list to set?
Create 3D array using Python
TypeError: unsupported operand type(s) for -: ‘list’ and ‘list’
ImportError: No module named scipy
AttributeError: ‘datetime’ module has no attribute ‘strptime’
Python add item to the tuple
Checking whether pip is installed?
python encoding utf-8
Converting binary to decimal integer output
Decoding UTF-8 strings in Python
How can I read pdf in python?
No module named setuptools
How to have an array of arrays in Python
ImportError: No Module Named bs4 (BeautifulSoup)
Difference between BeautifulSoup and Scrapy crawler?
ValueError: unsupported pickle protocol: 3, python2 pickle can not load the file dumped by python 3 pickle?
ValueError: unsupported pickle protocol: 3, python2 pickle can not load the file dumped by python 3 pickle?
AttributeError(“‘str’ object has no attribute ‘read'”)
(unicode error) ‘unicodeescape’ codec can’t decode bytes in position 2-3: truncated \UXXXXXXXX escape
How to use 2to3 properly for python?
“Unicode Error “unicodeescape” codec can’t decode bytes… Cannot open text files in Python 3
Why does using from __future__ import print_function breaks Python2-style print?
How to fix: “UnicodeDecodeError: ‘ascii’ codec can’t decode byte”
Python Error – int object has no attribute
Using BeautifulSoup to search HTML for string
Difference between scikit-learn and sklearn
Loading a file with more than one line of JSON into Pandas
Type error Unhashable type:set
How to detect key presses?
BeautifulSoup getting href
How to find elements by class
What is the difference between json.load() and json.loads() functions
How can I from bs4 import BeautifulSoup?
Python: count repeated elements in the list
How to XOR two strings in Python
TypeError: ‘_io.TextIOWrapper’ object is not subscriptable
Copy a list of list by value and not reference
Why I get ‘list’ object has no attribute ‘items’?
Python 101: Can’t open file: No such file or directory
Split a python list into other “sublists” i.e smaller lists
TypeError: argument of type ‘NoneType’ is not iterable
scrapy run spider from script
BeautifulSoup and lxml.html – what to prefer?

How to remove \xa0 from string in Python?

Leave a Comment Cancel reply

Related Posts:

Leave a Comment Cancel reply