Files
sci-gui-agent-benchmark/mm_agents/gui_som/data_preparation

  1. Get the URLs from majestic_million and save them to majestic_million.csv
python3 majestic_million.py
  1. Run scrapy spider to get the data from the URLs
python scrapy_crawler.py