I am currently using Beautiful Soup to parse an HTML file and calling get_text()
, but it seems like I’m being left with a lot of \xa0 Unicode representing spaces. Is there an efficient way to remove all of them in Python 2.7, and change them into spaces? I guess the more generalized question would be, is there a way to remove Unicode formatting?
I tried using: line = line.replace(u'\xa0',' ')
, as suggested by another thread, but that changed the \xa0’s to u’s, so now I have “u”s everywhere instead. ):
EDIT: The problem seems to be resolved by str.replace(u'\xa0', ' ').encode('utf-8')
, but just doing .encode('utf-8')
without replace()
seems to cause it to spit out even weirder characters, \xc2 for instance. Can anyone explain this?
Related Posts:
- What is the meaning of [:] in python [duplicate]
- Python – ‘ascii’ codec can’t decode byte
- UnicodeEncodeError: ‘ascii’ codec can’t encode character u’\xa0′ in position 20: ordinal not in range(128)
- Using unicode character u201c
- What is a unicode string?
- Convert Unicode to ASCII without errors in Python
- TypeError: ‘int’ object is not subscriptable
- TypeError: ‘int’ object is not callable
- TypeError: ‘int’ object is not callable
- TypeError: ‘int’ object is not callable
- How to resolve TypeError: can only concatenate str (not “int”) to str [duplicate]
- Cannot find module cv2 when using OpenCV
- Where is BeautifulSoup4 hiding?
- Cannot find module cv2 when using OpenCV
- Python ‘If not’ syntax [duplicate]
- RuntimeWarning: invalid value encountered in divide
- Converting dictionary to JSON
- How to correct TypeError: Unicode-objects must be encoded before hashing?
- WinError 2 The system cannot find the file specified (Python)
- IndexError: tuple index out of range —– Python
- sklearn error ValueError: Input contains NaN, infinity or a value too large for dtype(‘float64’)
- sklearn error ValueError: Input contains NaN, infinity or a value too large for dtype(‘float64’)
- TypeError: cannot perform reduce with flexible type
- Could not find a version that satisfies the requirement tensorflow
- sklearn error ValueError: Input contains NaN, infinity or a value too large for dtype(‘float64’)
- Could not find a version that satisfies the requirement tensorflow
- Local variable referenced before assignment?
- ln (Natural Log) in Python
- ImportError: numpy.core.multiarray failed to import
- Python Traceback (most recent call last)
- Unable to plot Double Bar, Bar plot using pyplot for ndarray
- How to pip or easy_install tkinter on Windows
- Cannot find module cv2 when using OpenCV
- Convert list to tuple in Python
- ImportError: numpy.core.multiarray failed to import
- error UnicodeDecodeError: ‘utf-8’ codec can’t decode byte 0xff in position 0: invalid start byte
- u’\ufeff’ in Python string
- Understand the Find() function in Beautiful Soup
- Why are Python’s ‘private’ methods not actually private?
- Remove list from list in Python
- How to create a new text file using Python
- Python – Reading and writing csv files with utf-8 encoding
- TypeError: write() argument must be str, not bytes (Python 3 vs Python 2 )
- What does the ‘b’ character do in front of a string literal?
- python socket.error: [Errno 98] Address already in use
- How to import files in python using sys.path.append?
- How do I install the yaml package for Python?
- Check string “None” or “not” in Python 2.7
- Change figure size and figure format in matplotlib
- What does the ‘b’ character do in front of a string literal?
- how to update spyder on anaconda
- how does \r (carriage return) work in Python
- What is Python buffer type for?
- AttributeError(“‘str’ object has no attribute ‘read'”)
- IndexError: index 1 is out of bounds for axis 0 with size 1/ForwardEuler
- Python Save to file
- (unicode error) ‘unicodeescape’ codec can’t decode bytes in position 2-3: truncated \UXXXXXXXX escape
- Add list to set?
- Create 3D array using Python
- TypeError: unsupported operand type(s) for -: ‘list’ and ‘list’
- ImportError: No module named scipy
- AttributeError: ‘datetime’ module has no attribute ‘strptime’
- Python add item to the tuple
- Checking whether pip is installed?
- python encoding utf-8
- Converting binary to decimal integer output
- Decoding UTF-8 strings in Python
- How can I read pdf in python?
- No module named setuptools
- How to have an array of arrays in Python
- ImportError: No Module Named bs4 (BeautifulSoup)
- Difference between BeautifulSoup and Scrapy crawler?
- ValueError: unsupported pickle protocol: 3, python2 pickle can not load the file dumped by python 3 pickle?
- ValueError: unsupported pickle protocol: 3, python2 pickle can not load the file dumped by python 3 pickle?
- AttributeError(“‘str’ object has no attribute ‘read'”)
- (unicode error) ‘unicodeescape’ codec can’t decode bytes in position 2-3: truncated \UXXXXXXXX escape
- How to use 2to3 properly for python?
- “Unicode Error “unicodeescape” codec can’t decode bytes… Cannot open text files in Python 3
- Why does using from __future__ import print_function breaks Python2-style print?
- How to fix: “UnicodeDecodeError: ‘ascii’ codec can’t decode byte”
- Python Error – int object has no attribute
- Using BeautifulSoup to search HTML for string
- Difference between scikit-learn and sklearn
- Loading a file with more than one line of JSON into Pandas
- Type error Unhashable type:set
- How to detect key presses?
- BeautifulSoup getting href
- How to find elements by class
- What is the difference between json.load() and json.loads() functions
- How can I from bs4 import BeautifulSoup?
- Python: count repeated elements in the list
- How to XOR two strings in Python
- TypeError: ‘_io.TextIOWrapper’ object is not subscriptable
- Copy a list of list by value and not reference
- Why I get ‘list’ object has no attribute ‘items’?
- Python 101: Can’t open file: No such file or directory
- Split a python list into other “sublists” i.e smaller lists
- TypeError: argument of type ‘NoneType’ is not iterable
- scrapy run spider from script
- BeautifulSoup and lxml.html – what to prefer?