Data Optimization
A complete pipeline for raw-to-ready data preparation. Clean, transform, aggregate, search and visualise your datasets before feeding them into statistical models or machine learning workflows. Each submodule works with CSV data uploaded directly in the browser — no server required.
Submodules
Data Cleaning
Detect and resolve data quality issues
Identify missing values, duplicate records, outliers and type mismatches. Apply automated or rule-based imputation, deduplication and normalisation strategies to prepare datasets for analysis.
Data Mapping
Transform and align schema structures
Define field-to-field mappings between source and target schemas. Apply transformations, lookups and computed columns. Validate mapping coverage and preview transformed output before export.
Data Aggregation
Group, summarise and reshape datasets
Apply GROUP BY operations with multiple aggregation functions — SUM, COUNT, MEAN, MIN, MAX, STDDEV. Pivot and reshape data, build multi-level groupings and export summary tables.
Advanced Search
Query datasets with powerful filter expressions
Multi-column filter builder with AND / OR logic, regex patterns, numeric ranges and date filters. Sort, paginate and export filtered results. Full-text search across all string columns.
Data Plot on Map
Visualise geographic data on an interactive map
Plot datasets containing latitude / longitude coordinates on an interactive world map. Colour-code by category, scale point size by a numeric column, and apply filters to explore spatial distributions.
About Data Optimization
Data quality is the foundation of every reliable analysis. The Data Optimization module provides a structured pipeline to take raw, messy datasets and transform them into clean, well-structured inputs. All tools operate client-side on uploaded CSV files and integrate seamlessly with the Statistics, ML and Professional modules.