It is simple and straightforward 🙂
Just check the official documentation. I would make there a little change so you could control the spider to run only when you do python myscript.py
and not every time you just import from it. Just add an if __name__ == "__main__"
:
import scrapy from scrapy.crawler import CrawlerProcess class MySpider(scrapy.Spider): # Your spider definition pass if __name__ == "__main__": process = CrawlerProcess({ 'USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)' }) process.crawl(MySpider) process.start() # the script will block here until the crawling is finished
Now save the file as myscript.py
and run ‘python myscript.py`.
Enjoy!
Related Posts:
- Web scraping redoc web api
- TypeError: ‘int’ object is not callable
- Cannot find module cv2 when using OpenCV
- Cannot find module cv2 when using OpenCV
- Python ‘If not’ syntax [duplicate]
- RuntimeWarning: invalid value encountered in divide
- WinError 2 The system cannot find the file specified (Python)
- IndexError: tuple index out of range —– Python
- sklearn error ValueError: Input contains NaN, infinity or a value too large for dtype(‘float64’)
- sklearn error ValueError: Input contains NaN, infinity or a value too large for dtype(‘float64’)
- TypeError: cannot perform reduce with flexible type
- Could not find a version that satisfies the requirement tensorflow
- sklearn error ValueError: Input contains NaN, infinity or a value too large for dtype(‘float64’)
- Could not find a version that satisfies the requirement tensorflow
- Local variable referenced before assignment?
- ln (Natural Log) in Python
- ImportError: numpy.core.multiarray failed to import
- Python Traceback (most recent call last)
- Unable to plot Double Bar, Bar plot using pyplot for ndarray
- How to pip or easy_install tkinter on Windows
- Convert list to tuple in Python
- ImportError: numpy.core.multiarray failed to import
- How to remove \xa0 from string in Python?
- What is the meaning of [:] in python [duplicate]
- Why are Python’s ‘private’ methods not actually private?
- Remove list from list in Python
- How to create a new text file using Python
- TypeError: write() argument must be str, not bytes (Python 3 vs Python 2 )
- python socket.error: [Errno 98] Address already in use
- How to import files in python using sys.path.append?
- TypeError: Object of type ‘bytes’ is not JSON serializable
- How do I install the yaml package for Python?
- Check string “None” or “not” in Python 2.7
- Change figure size and figure format in matplotlib
- how to update spyder on anaconda
- how does \r (carriage return) work in Python
- What is Python buffer type for?
- AttributeError(“‘str’ object has no attribute ‘read'”)
- Python Save to file
- Add list to set?
- Create 3D array using Python
- ImportError: No module named scipy
- Checking whether pip is installed?
- Converting binary to decimal integer output
- Decoding UTF-8 strings in Python
- How can I read pdf in python?
- No module named setuptools
- How to have an array of arrays in Python
- Difference between BeautifulSoup and Scrapy crawler?
- ValueError: unsupported pickle protocol: 3, python2 pickle can not load the file dumped by python 3 pickle?
- ValueError: unsupported pickle protocol: 3, python2 pickle can not load the file dumped by python 3 pickle?
- AttributeError(“‘str’ object has no attribute ‘read'”)
- How to use 2to3 properly for python?
- Why does using from __future__ import print_function breaks Python2-style print?
- Python – ‘ascii’ codec can’t decode byte
- Python Error – int object has no attribute
- Difference between scikit-learn and sklearn
- Loading a file with more than one line of JSON into Pandas
- Type error Unhashable type:set
- How to detect key presses?
- Python: count repeated elements in the list
- How to XOR two strings in Python
- TypeError: ‘_io.TextIOWrapper’ object is not subscriptable
- Why I get ‘list’ object has no attribute ‘items’?
- Python 101: Can’t open file: No such file or directory
- Split a python list into other “sublists” i.e smaller lists
- TypeError: argument of type ‘NoneType’ is not iterable
- python error: TypeError: an integer is required
- If list index exists, do X
- How to detect key presses?
- Numpy, multiply array with scalar
- TypeError: coercing to Unicode: need string or buffer, list found
- Installation of pygame with Anaconda
- urllib and “SSL: CERTIFICATE_VERIFY_FAILED” Error
- ‘virtualenv’ is not recognized as an internal or external command, operable program or batch file
- ImportError: cannot import name NUMPY_MKL
- Python: Pandas pd.read_excel giving ImportError: Install xlrd >= 0.9.0 for Excel support
- How can I copy a Python string?
- TypeError: super() takes at least 1 argument (0 given) error is specific to any python version?
- Convert string to variable name in python
- What is the meaning of “int(a[::-1])” in Python?
- Adding +1 to a variable inside a function
- dump() missing 1 required positional argument: ‘fp’ in python json
- How to completely uninstall python 2.7.13 on Ubuntu 16.04
- Spell Checker for Python
- How to open html file?
- “Python version 2.7 required, which was not found in the registry” error when attempting to install netCDF4 on Windows 8
- Installing scipy for python 2.7
- Install py2exe for python 2.7 over pip: this package requires Python 3.3 or later
- ImportError: No module named IPython
- Does python have header files like C/C++?
- How can I install a .egg Python package on Windows (attempt using easy_install not working)
- In python, how can I print lines that do NOT contain a certain string, rather than print lines which DO contain a certain string:
- input() error – NameError: name ‘…’ is not defined
- Symbol not found: __PyCodecInfo_GetIncrementalDecoder
- Python 2.7 mixing iteration and read methods would lose data
- Homebrew brew doctor warning about /Library/Frameworks/Python.framework, even with brew’s Python installed
- What does an ‘r’ represent before a string in python?
- Cannot remove entries from nonexistent file
- Python: OSError: [Errno 2] No such file or directory: ”