What is Data Processing?
Data Processing is the bridge between raw extracted data and polished, actionable output. When you scrape a website or pull data from an API, the result is rarely ready to use directly. Duplicates, inconsistent formats, missing fields, and irrelevant rows are common. Data Processing gives you the tools to clean, transform, and enrich that data — all within the same automation pipeline, without needing a separate ETL tool or spreadsheet.
Autonoly offers two approaches that work together: no-code transforms for common operations like filtering and deduplication, and full Python execution for custom logic, statistical analysis, and machine learning. Both run in secure, isolated cloud environments and integrate seamlessly with every other Autonoly feature.
Why Process Data Inside Your Automation?
Many teams extract data with one tool, clean it in a spreadsheet, and then manually upload it somewhere else. This creates manual steps, introduces errors, and doesn't scale. By processing data inside the automation pipeline, you get a fully hands-off workflow from extraction to delivery.
No-Code Transforms
For the most common data operations, no code is needed. Autonoly provides built-in transforms that you can apply through the AI Agent Chat or the Visual Workflow Builder:
Deduplication
Remove duplicate rows based on one or more key fields. Useful when scraping overlapping pages, merging data from multiple sources, or cleaning up datasets where items appear more than once.
Filtering and Sorting
Keep only the rows that match your criteria — filter by price range, date, status, keyword presence, or any custom condition. Sort results by any field in ascending or descending order.
Format Conversion
Standardize messy data:
Dates — convert between formats (MM/DD/YYYY to ISO 8601, relative dates like "2 days ago" to absolute)
Currencies — normalize currency symbols, convert between formats
Phone numbers — standardize to international format
Text — trim whitespace, fix capitalization, remove HTML tags
Text Manipulation
Apply regex patterns, split strings into fields, join multiple values, and use templates to construct new fields from existing data. This is particularly useful when extracted data needs restructuring before it reaches its destination.
JSON Parsing and Restructuring
When working with API responses or complex nested data, you can parse JSON structures, extract specific nested fields, and flatten hierarchies into tabular formats suitable for spreadsheets and databases.
Combine no-code transforms with Data Extraction to build complete scrape-and-clean pipelines.
Python Execution
When built-in transforms aren't enough, switch to Python. Autonoly provides a full Python 3 environment with popular libraries pre-installed:
pandas — dataframe operations, groupby, pivot tables, merges
numpy — numerical computation, statistical functions
requests — make HTTP calls to external APIs for data enrichment
scikit-learn — machine learning, clustering, classification
BeautifulSoup — additional HTML parsing if needed
You can also install any package with pip at runtime. Need a specialized library for geocoding, NLP, or financial calculations? Just include the pip install in your script.
How Python Scripts Work
- Your script receives input data from the previous step (extracted data, API response, or file contents)
- You process it using any Python logic — from a three-line dedup to a 200-line ML pipeline
- The script outputs results that flow to the next step in the workflow
This runs in a secure, isolated environment. Your scripts can't affect other users or access anything outside the designated input and output channels.
Common Python Use Cases
Custom scoring models — score leads, rank products, or classify items using business-specific logic
Statistical analysis — calculate averages, medians, standard deviations, correlations across extracted datasets
Data enrichment — call external APIs to add geocoding, company info, or market data to your records
Machine learning — run classification, clustering, or prediction models on collected data
Custom formatting — generate complex reports, build structured outputs, or prepare data for specific downstream systems
Building ETL Pipelines
Data Processing is most powerful when chained with other steps to create full ETL (Extract, Transform, Load) pipelines. Here's a real example:
- Extract — Browser Automation visits 50 competitor websites and Data Extraction scrapes current product prices
- Transform — Data Processing deduplicates the results, calculates average price per product category, and flags items where the price changed more than 10%
- Load — Results push to Google Sheets for the team to review, and a summary alert fires to Slack
You design these pipelines visually in the Visual Workflow Builder or let the AI Agent Chat build them from a natural language description.
Variable Passing Between Steps
Each processing step can output data that the next step consumes. This variable passing happens automatically — the output of a Python script becomes the input of the next transform, which feeds into the export step. Use Logic & Flow to add conditional branches (e.g., "if the dataset has more than 1000 rows, split into batches").
Data Validation
Before data reaches its destination, you can add validation rules:
Type checking — ensure numeric fields contain numbers, dates are valid, URLs are properly formatted
Required fields — flag or remove rows with missing critical data
Range constraints — prices must be positive, dates must be in the future, quantities within expected bounds
Custom rules — any validation logic you can express in a Python condition
Catching data quality issues inside the pipeline prevents bad data from reaching your spreadsheets, databases, or downstream systems.
Explore the templates library for pre-built data processing pipelines, or check the pricing page for processing limits on each plan.