Skip to content

/

용어 사전

/

데이터

/

Screen Scraping

데이터

4분 소요

Screen Scraping란 무엇인가요?

Screen scraping is a technique for extracting data from an application's visual display rather than its underlying data source. It captures what appears on screen, translating visual output into structured data.

What is Screen Scraping?

Screen scraping is the process of extracting data by reading what is displayed on a computer screen, rather than accessing the data through APIs, databases, or file exports. The term originates from the era of mainframe terminals, where the only way to get data out of a legacy system was to programmatically read the characters displayed on the terminal screen.

Today, screen scraping has evolved to encompass several related techniques:

  • Terminal screen scraping: Reading text from mainframe terminal emulators (3270, 5250). Still used in banking, insurance, and government systems that run on legacy mainframes.
  • Desktop application scraping: Using accessibility APIs or UI automation frameworks (like Microsoft UI Automation or AutoIt) to read data from Windows or macOS desktop applications.
  • Web screen scraping: Rendering a web page in a browser and reading the visible content, particularly for JavaScript-heavy sites where the data isn't available in the raw HTML.
  • Visual/OCR scraping: Capturing a screenshot and using optical character recognition to convert the image to text. Used for applications that render content as images or canvas elements.
  • Screen Scraping vs. Web Scraping

    While the terms are sometimes used interchangeably, they have distinct meanings:

  • Web scraping parses HTML source code to extract data. It works at the markup level, using CSS selectors or XPath to target elements.
  • Screen scraping reads the rendered visual output. It works at the display level, reading what the user actually sees.
  • Web scraping is generally faster and more reliable because it works with structured markup. Screen scraping is used when the underlying data source isn't accessible — legacy systems with no API, applications that render content as images, or heavily obfuscated pages where the HTML structure doesn't cleanly map to the visible data.

    Modern Screen Scraping with Headless Browsers

    The line between web scraping and screen scraping has blurred with headless browsers. When a scraper uses Playwright or Puppeteer to render a page, execute JavaScript, and then read the resulting DOM, it is performing a hybrid of both techniques. The browser renders the page as a user would see it, but the scraper extracts data from the rendered DOM rather than a screenshot.

    This approach is particularly valuable for:

  • Single-page applications where data is loaded dynamically via API calls
  • Pages that use anti-scraping techniques to obscure data in the HTML source
  • Interactive applications where data only appears after specific user actions (clicking tabs, expanding sections, scrolling)
  • Use Cases and Limitations

    Screen scraping remains relevant in specific scenarios:

  • Legacy system integration: Extracting data from mainframe applications that lack modern APIs.
  • Desktop automation: Reading data from desktop applications (ERP systems, proprietary tools) for cross-system integration.
  • Testing and QA: Verifying that applications display the correct information to users.
  • Limitations include fragility (any UI change can break the scraper), performance (rendering is slower than parsing HTML), and accuracy (OCR-based approaches can misread characters).

    왜 중요한가요

    Screen scraping provides a last-resort integration method for systems that offer no API, database access, or file export. For organizations with legacy systems or locked-down applications, it may be the only way to automate data extraction without manual re-entry.

    Autonoly는 어떻게 해결하나요

    Autonoly uses a real browser to interact with applications exactly as a user would, making it effective for screen scraping scenarios. The AI agent reads rendered page content, handles dynamic loading, and extracts data from applications that resist traditional scraping techniques.

    자세히 보기

    예시

    • Extracting account balances from a legacy banking portal that renders data using JavaScript canvas elements

    • Reading invoice data from a supplier portal that uses a Flash-to-HTML5 migration with non-standard DOM structures

    • Pulling data from a government reporting system built on 1990s web technology with frames and dynamic content

    자주 묻는 질문

    Use screen scraping when the data you need is not available in the HTML source — for example, content rendered by JavaScript frameworks, data displayed as images or canvas elements, or information in legacy desktop applications with no API. Web scraping is preferred when data is accessible in the HTML markup, as it is faster and more reliable.

    Screen scraping is a technique within RPA (Robotic Process Automation), but they are not the same. RPA is a broader category that includes any automated interaction with applications — clicking buttons, filling forms, navigating menus — in addition to reading screen data. Screen scraping focuses specifically on extracting data from what is displayed on screen.

    자동화에 대해 읽기만 하지 마세요.

    직접 자동화하세요.

    필요한 것을 쉬운 말로 설명하세요. Autonoly의 AI 에이전트가 자동화를 구축하고 실행합니다. 코딩 필요 없음.

    기능 보기