返回所有文章
How to Obtain Google Maps Review Data? —— A Practical Analysis Based on Request Generation and Simulation

In the era of big data, review data is an extremely valuable type of information. For businesses, researchers, or data analysts, Google Maps review data can help them understand genuine user feedback on businesses, attractions, hotels, hospitals, and more, enabling them to make well-informed decisions.

1.The Value of Google Maps Review Data

复制代码
1. Business Analysis: Uncover user satisfaction points and pain points through reviews to optimize services.
2. Travel Recommendations: Reviews for hotels, restaurants, and attractions serve as crucial references for user choices.
3. Public Opinion Monitoring: The volume and sentiment of reviews can reflect market trends.
4. Data Mining: Enables sentiment analysis and keyword extraction.

2.Target API Analysis

复制代码
When accessing a specific location’s page on Google Maps, you’ll notice that reviews are not loaded all at once but are fetched in batches via asynchronous requests. The core API endpoint for this is:

https://www.google.com/maps/rpc/listugcposts?authuser=0&hl=el&pb=…

The pb parameter here is crucial. It contains necessary information such as the Place ID, pagination token, and request ID. If we can correctly construct this pb string, we can simulate frontend requests to obtain complete review data.

3.Core Parameter Parsing

复制代码
Through reverse engineering of frontend requests, the URL construction rules can be summarized as follows:

placeID: Extracted from the Google Maps share link, typically found in the !1sxxxx segment.

pageToken: A token used for pagination. It is empty for the first page and returned in the API response for subsequent pages.

pageSize: The number of reviews returned per request, e.g., 20.

requestID: A session request ID, usually a randomly generated string.

The concatenation logic for the pb parameter resembles::
!1m6!1s{placeID}
!6m4!4m1!1e1!4m1!1e3
!2m2!1i{pageSize}!2s{pageToken}
!5m2!1s{requestID}!7e81
!8m9!2b1!3b1!5b1!7b1
!12m4!1b1!2b1!4m1!1e1!11m0!13m1!1e1
After final concatenation, appending this to the API URL completes a request.

4.Python Implementation: Generating the URL

python 复制代码
def _generate_url(self, map_url, page_token, page_size, request_id):
        place_id_regex = re.compile(r"!1s([^!]+)")
        match = place_id_regex.search(map_url)
        if not match:
            raise ValueError(f"Could not extract place ID from URL: {map_url}")
        raw_place_id = match.group(1)
        try:
            raw_place_id = urllib.parse.unquote(raw_place_id)
        except Exception:
            pass
        encoded_place_id = urllib.parse.quote(raw_place_id)
        encoded_page_token = urllib.parse.quote(page_token)
        pb_components = [
            f"!1m6!1s{encoded_place_id}",
            "!6m4!4m1!1e1!4m1!1e3",
            f"!2m2!1i{page_size}!2s{encoded_page_token}",
            f"!5m2!1s{request_id}!7e81",
            "!8m9!2b1!3b1!5b1!7b1",
            "!12m4!1b1!2b1!4m1!1e1!11m0!13m1!1e1",
        ]
        pb_string = "".join(pb_components)
        return f"https://www.google.com/maps/rpc/listugcposts?authuser=0&hl=el&pb={pb_string}"

5.Pagination Handling

python 复制代码
def extract_next_page_token(data):
    text = data.decode("utf-8", errors="ignore")
    prefix = ")]}'\n"
    if text.startswith(prefix):
        text = text[len(prefix) :]
    try:
        result = json.loads(text)
    except json.JSONDecodeError:
        return ""
    token = get_nested_element(result, 1)
    return token if isinstance(token, str) else ""

6.Simulating Requests

Using the URL generation and pagination handling described above, we simulate sending requests.

python 复制代码
def _fetch_review_page(self, url):
        try:
            resp = self.http_client.get(url, timeout=10)
            resp.raise_for_status()
            return resp.content
        except httpx.RequestError as e:
            raise Exception(f"Fetch error for {url}: {e}")
        except httpx.HTTPStatusError as e:
            raise Exception(f"{url}: unexpected status code: {e.response.status_code}")

Ultimately, we obtain the raw review data returned by the API.