Data Spider
  • 繁體中文
  • 简体中文
  • English
2025-08-01

Data Scraping Requirements

1. Data Sources

Data can be obtained from the following platforms:

  • Desktop web version (such as websites accessed through browsers)
  • Mobile web version (such as pages accessed through mobile browsers)
  • Android applications (Apps)
  • iOS applications (Apps)

Currently, most websites support desktop web, mobile web, and Apps simultaneously, with generally consistent data content. However, the difficulty of scraping varies:

  • Desktop web and mobile web: Simplest to scrape, lowest cost.
  • Android App: Medium difficulty, more comprehensive data.
  • iOS App: Highest difficulty, suitable for specific needs (such as geographic location data).

Recommendation: Unless there are special requirements (such as restaurant coordinates from food delivery platforms), we usually prioritize scraping from desktop web for higher efficiency.

2. What data needs to be scraped?

Clearly defining the type of data you need is very important. More data means potentially higher scraping time and costs. For example, a product page on an e-commerce website may contain price, reviews, store information, etc., but this data may come from different sections with different scraping methods.

Taking JD desktop web version as an example, common data includes:

JD Product Page

Figure: JD product page showing price and reviews

  • Product Link: Such as https://item.jd.com/100162191634.html
  • Product ID: Such as 100162191634
  • Category: Such as "运动户外 > 运动鞋 > 阿迪达斯 GW3774"
  • Store Name: Such as "Adidas 京东自营旗舰店"
  • Main Image Link: URL of the first product image
  • Review Count: Such as "5万+"
  • Positive Rating: Such as "97% 买家好评"
  • Product Title: Such as "阿迪达斯 Yeezy350 暴龙兽椰子 42.5"
  • Original Price: Such as 835.36 元
  • Current Price: Such as 708.93 元
  • Color: Such as GW3774
  • Size: Such as 42.5

Review Data (requires separate scraping):

JD Review Page

Figure: JD review page showing user comments

  • Review Tags: Such as "穿起来超舒服 320" "尺码很准确 24"
  • Reviewer: Such as "依***q"
  • Review Content: Such as "这双 Yeezy 350 真的太戳我了..."
  • Review Time: Such as 2025-08-01
  • Rating: Such as 5 stars

Store Data (requires separate scraping):

JD Store Page

Figure: JD store page showing store information

  • Store Name: Such as "Adidas 京东自营旗舰店"
  • Store Review Count: Such as "5万+"
  • Store Followers: Such as "1011.2万"
  • Product Details: Such as 品牌、货号、功能

JD iOS App Example:

JD iOS Product Page

Figure: JD iOS App product page

JD iOS Review Page

Figure: JD iOS App review page

JD iOS Store Page

Figure: JD iOS App store page

JD iOS Product Details

Figure: JD iOS App product details

Web and App data content is basically consistent, but App data is more comprehensive, especially for geographic coordinate data (geographic location information) involving maps or food delivery, which can only be scraped from Apps.

3. Data Specifications

After determining the data to be scraped, it's recommended to use Excel spreadsheets to list data fields and examples for easy confirmation of requirements by both parties. You can prepare Excel yourself and send it to us, or we can organize it and confirm with you. Download Data Specification Example to view the template.

Recommendation: Before scraping, ensure Excel includes all fields (such as product title, price, reviews) and clearly defines example data to avoid later modifications.

4. Data Delivery Methods

After scraping, data can be delivered through various methods, depending on your technical capabilities and requirements:

Excel/CSV

Suitable for users familiar with Excel, simple and easy to use.

JSON

Suitable for users with basic programming skills, flexible and universal.

Database (such as MySQL)

Suitable for large data volumes and professional teams, requires programming skills.

Backend Management System

Suitable for users without programming background who need visualization.

Others

Such as file downloads or interface services (API).

For detailed explanations, please see Data Delivery Methods.

5. Data Collection Frequency

According to project requirements, data can be scraped at the following frequencies:

Daily

Suitable for scenarios requiring high real-time performance, such as price monitoring.

Weekly

Suitable for regular analysis, such as market trends.

Monthly

Suitable for long-term data collection, such as industry reports.

Summary and Recommendations

Clearly defining data scraping requirements is the key to successful collaboration. Here are some recommendations:

  • Choose data sources: Prioritize desktop web for simplicity and efficiency; choose Apps when special data is needed (such as coordinates).
  • Define data fields clearly: Use Excel to list required data to avoid omissions or duplicate work.
  • Choose delivery method: Select Excel, JSON, database, or backend system based on technical capabilities.
  • Determine frequency: Choose daily, weekly, or monthly scraping based on requirements.