Skip to content
Beranda

/

Glosarium

/

Data

/

Pagination

Data

4 menit baca

Apa itu Pagination?

Pagination is the practice of dividing large datasets or content lists into discrete pages, requiring sequential navigation to access all records. In data extraction, handling pagination means automatically traversing all pages to collect the complete dataset.

What is Pagination?

Pagination is a user interface and data delivery pattern that splits large collections of items across multiple pages. Instead of returning 10,000 search results at once, a website might show 20 per page with navigation controls to move between pages. APIs similarly limit response sizes and provide mechanisms to request subsequent batches.

For data extraction and web scraping, pagination is a critical challenge. If your scraper only reads the first page, you capture a fraction of the available data. Handling pagination correctly means your extraction process automatically navigates through every page to build the complete dataset.

Types of Pagination

Different websites and APIs implement pagination in distinct ways, each requiring a different handling strategy:

  • Page number pagination: URLs contain a page parameter (e.g., ?page=2, ?page=3). The scraper increments the page number until no more results are returned.
  • Offset/limit pagination: The API accepts offset and limit parameters (e.g., ?offset=40&limit=20). Common in REST APIs.
  • Cursor-based pagination: Each response includes a cursor token pointing to the next batch. The client passes this token in the subsequent request. Used by modern APIs (Slack, Stripe, Facebook Graph API) because it handles real-time data changes gracefully.
  • Infinite scroll: Content loads dynamically as the user scrolls down the page. There are no page links — JavaScript fetches more items when the scroll position nears the bottom. Requires browser-based scraping to trigger the scroll events.
  • "Load More" buttons: Similar to infinite scroll, but requires clicking a button to fetch the next batch. The scraper must locate and click this element repeatedly.
  • Pagination in Web Scraping

    Handling pagination in a scraper involves:

  • Detection: Identifying which pagination pattern the target site uses. This may involve inspecting network requests, examining URL patterns, or looking for "next page" links in the HTML.
  • Iteration logic: Implementing the loop that requests each subsequent page. For numbered pagination, increment a counter. For cursor-based, extract the next cursor from each response.
  • Termination conditions: Knowing when to stop — no "next" link, an empty result set, reaching a known total count, or encountering a duplicate record.
  • Rate management: Adding delays between page requests to avoid triggering rate limits or bot detection. Many sites will block IPs that request pages too rapidly.
  • Deduplication: Some pagination implementations return overlapping results between pages. The scraper should track seen records and skip duplicates.
  • API Pagination Best Practices

    When consuming paginated APIs:

  • Always respect the API's rate limits between page requests.
  • Use cursor-based pagination when available — it's more reliable than offset-based for data that changes between requests.
  • Implement exponential backoff for rate limit errors (HTTP 429).
  • Log pagination progress so you can resume from the last successful page if the process fails partway through.
  • Validate total record counts against expected values to ensure no pages were missed.
  • Mengapa Ini Penting

    Failing to handle pagination means collecting incomplete data. A price monitoring scraper that only reads the first page of results will miss most products. Proper pagination handling is the difference between a partial sample and a complete dataset.

    Bagaimana Autonoly Menyelesaikannya

    Autonoly's AI agent automatically detects and handles pagination when extracting data from websites. Whether the site uses page numbers, infinite scroll, or load-more buttons, the agent navigates through all pages and collects the complete dataset without manual configuration.

    Pelajari lebih lanjut

    Contoh

    • Scraping all 500 product listings from an e-commerce category that shows 20 items per page across 25 pages

    • Extracting complete job listings from a career site that uses infinite scroll to load positions as you scroll down

    • Collecting all records from a paginated REST API that returns 100 items per request with cursor-based navigation

    Pertanyaan yang Sering Diajukan

    Infinite scroll requires a headless browser (like Playwright or Puppeteer) that can execute JavaScript and simulate scrolling. The scraper programmatically scrolls to the bottom of the page, waits for new content to load, extracts the newly visible items, and repeats until no more items appear. Monitoring network requests for the AJAX calls that fetch new data can also provide a more reliable approach.

    Cursor-based pagination uses an opaque token (cursor) returned with each response to identify the starting point for the next request. Unlike offset-based pagination, cursors remain valid even when items are added or removed between requests. This prevents the common offset problem of skipping or duplicating records when the underlying dataset changes during pagination.

    Berhenti membaca tentang otomasi.

    Mulai mengotomatisasi.

    Jelaskan apa yang Anda butuhkan dalam bahasa sehari-hari. AI agent Autonoly membangun dan menjalankan otomasi untuk Anda — tanpa kode.

    Lihat Fitur