I need an XPATH to extract data from www.gsmarena.com

I am doing a research about how mobile phones evolved over years so I need to create a database with specifications of as many phones is possible. I am trying to scrap data from GSM Arena website.

Example page: http://www.gsmarena.com/samsung_galaxy_note7-8082.php

I am using XPATH that contains the label that precedes each value, example //tr[contains (.,”Sensors”)]/td[2]

But there are some values, last one in category, with no preceding label.

How do I pick this info:

Non-removable Li-Po 3500 mAh battery

or this ino:

Fast battery charging Qi wireless charging (market dependent) ANT+ support S-Voice natural language commands and dictation MP4/DivX/XviD/WMV/H.265 player MP3/WAV/WMA/eAAC+/FLAC player Photo/video editor Document editor

Do note that different phones have different number of rows on page, so using [number] in XPATH would pick different info from

http://www.gsmarena.com/samsung_galaxy_note7-8082.php – need to pick 5th row of features

http://www.gsmarena.com/samsung_sgh_600-49.php – need to pick 8th row of features

Leave a Comment