CleanSmartLabs Product Support
No-code tools that make data engineering accessible.
CleanSmart Support
Introduction
CleanSmart is an enterprise-grade AI-powered data cleaning platform that transforms messy, inconsistent data into clean, standardized datasets. Whether you're dealing with duplicate records, inconsistent formatting, missing values, or data from multiple sources, CleanSmart automates the tedious work of data cleaning so you can focus on what matters.
Key Features
- AI-Powered Cleaning: Advanced algorithms detect duplicates, standardize formats, fill missing values, and flag anomalies
- Multi-Source Merging: Combine data from multiple sources with intelligent conflict resolution
- Full Audit Trail: Track every change with the ability to approve or reject modifications
- Flexible Export: Download cleaned data in CSV or JSON formats with optional metadata
Two Main Workflows
- Single-Source Cleaning: Upload a single CSV file and run it through our 4-step automated cleaning pipeline
- Multi-Source Processing: Combine multiple data sources into a unified master dataset with enriched spoke files
Getting Started
Creating Your Account
- Visit the CleanSmart registration page
- Choose your plan (Starter, Pro, or Business)
- Enter your details and create your account
- All plans include a 7-day free trial with full access to features
Your First Data Cleaning Project
Quick Start (Single-Source):
- Navigate to Data Sources
- Click Upload CSV and select your file
- Review the field mapping and click Confirm
- Go to Data Cleaning and click Start Cleaning
- Review changes in the Change Log
- View results in Analytics
- Download your cleaned data from Export
Data Sources
The Data Sources page is your central hub for managing all uploaded data files and source groups.
Uploading Files
To upload a single file:
- Click the Upload CSV button or drag and drop a file
- CleanSmart analyzes your file and displays a preview
- Review the detected columns and data types
- Click Confirm to add the source to your workspace
Supported file formats:
- CSV (Comma-Separated Values)
- Files up to your plan's size limit
Field Mapping
After uploading, CleanSmart automatically detects field types (name, email, phone, address, etc.). You can customize this mapping:
- Click the Configure button on any source
- Review each column's assigned field type
- Use the dropdown to change field types if needed
- Click Save Mapping to apply changes
Accurate field mapping ensures better cleaning results, especially for:
- Phone number formatting
- Email validation
- Name capitalization
- Address standardization
Managing Sources
Each uploaded source shows:
- File name and upload date
- Record count (number of rows)
- Mapping status (configured or needs mapping)
- Last sync time (for integrations)
Actions available:
- Configure: Edit field mappings
- Delete: Remove the source (this cannot be undone)
- Clear All: Remove all sources from your workspace
Single-Source Data Cleaning
The single-source workflow is a 4-step automated pipeline that cleans your data using AI-powered algorithms.
The 4-Step Pipeline
Step 1: SmartMatch (Duplicate Detection & Merging)
SmartMatch uses AI-powered semantic similarity to find duplicate records that traditional exact-matching would miss.
What it detects:
- Exact duplicates (identical records)
- Near-duplicates (e.g., "John Smith" vs "Jon Smith")
- Semantic duplicates (e.g., "IBM" vs "International Business Machines")
Configuration options:
- Composite Key Selection: Choose which fields to use for matching
- Sensitivity Threshold: Adjust how strict the matching is
- Data Type: Select entity, transactional, or line-item data
Results include:
- Total duplicates found
- Similarity scores (0-100%)
- Matching reasons (why records were flagged)
Step 2: AutoFormat (Format Standardization)
AutoFormat fixes inconsistent formatting across your data.
What it fixes:
- Name capitalization: "john smith" → "John Smith"
- Phone numbers: Removes letters, standardizes format
- Email addresses: Lowercase, typo detection
- Dates: Consistent date formatting
- Addresses: Standardized abbreviations (St., Ave., etc.)
- Common typos: Corrects frequent spelling errors
Step 3: SmartFill (Missing Value Imputation)
SmartFill uses machine learning to intelligently fill missing values based on patterns in your existing data.
How it works:
- Analyzes relationships between fields
- Predicts missing values using existing patterns
- Assigns confidence scores to each prediction
- Never fills fields where prediction confidence is too low
Example:
If most customers from "90210" zip code are in "Beverly Hills, CA", SmartFill can predict the city/state for records with only a zip code.
Step 4: LogicGuard (Anomaly Detection)
LogicGuard identifies outliers and impossible values that may indicate data quality issues.
What it detects:
- Numerical outliers: Ages over 150, negative prices
- Statistical anomalies: Values far from the mean
- Pattern violations: Phone numbers with wrong digit counts
- Impossible values: Future dates for birthdays
Detection methods:
- Z-score analysis
- Interquartile Range (IQR)
- Isolation Forest algorithm
- Pattern-based detection
Running the Pipeline
- Navigate to Data Cleaning
- Select your data source from the dropdown
- Click Start Cleaning to run all steps, or click individual steps
- Watch the real-time progress bar
- Review results for each step as they complete
Duplicate Resolution Options
When duplicates are found, you have several resolution options:
Automatic Resolution:
- Accept all AI-suggested merges with one click
- System chooses the most complete record as master
Manual Resolution:
- Review each duplicate cluster one by one
- Select which record should be the master
- Choose field-by-field which values to keep
- Reject false positives
Resolution Strategies:
- Master Record: One record becomes the source of truth
- Field-Level Merge: Combine best values from each duplicate
- Keep Both: Mark as not duplicates if incorrectly matched
Multi-Source Data Processing
Multi-Source Processing allows you to merge and enrich data from multiple sources using a hub-and-spoke architecture.
Key Concepts
Source Group: A container for multiple related data sources that should be merged together.
Relationships: Define how sources connect to each other (e.g., Customer ID in Source A matches CustomerID in Source B).
Hub (Master Dataset): The unified, merged dataset created by combining all sources.
Spokes: Individual source files enriched with data from other sources in the group.
Creating a Source Group
- Go to Data Sources
- Click Create Source Group
- Enter a group name (e.g., "Customer 360")
- Click Create
Adding Sources to a Group
Open your source group
- Click Add Source
- Either upload a new file or select an existing source
- Configure field mapping for each source
- Repeat for all sources you want to merge
Defining Relationships
Relationships tell CleanSmart how to connect records across sources.
- Go to the Relationships tab in your source group
- Review Suggested Relationships (AI-detected connections)
- Click Accept to use a suggestion, or Customize to modify it
To create a manual relationship:
1. Click Add Relationship
2. Select the first source and field (e.g., customers.customer_id)
3. Select the second source and field (e.g., orders.customer_id)
4. Choose the relationship type:
- 1-to-1: One record in Source A matches one in Source B
- 1-to-many: One record in Source A matches multiple in Source B
- Many-to-many: Multiple records can match in both directions
5. Configure matching options (exact match or fuzzy matching)
6. Click Save
Merge Strategies
Choose how conflicts between sources should be resolved:
Strategy Description Best For
- MASTER_SLAVE One source is always trusted over others When you have a primary system of record
- CONSENSUS Values appearing in multiple sources win When no single source is authoritative
- WEIGHTED Sources have trust scores that determine priority When sources have varying reliability
- RECENT Most recently updated value wins When newer data is more accurate
The 6-Step Processing Pipeline
- Schema Validation: Aligns column structures across all sources
- Relationship Detection: Verifies and applies defined relationships
- Data Merging: Combines records from multiple sources
- Conflict Resolution: Handles conflicting values using your chosen strategy
- Data Enrichment: Adds fields from other sources to each dataset
- Quality Verification: Validates the merged data quality
Running Multi-Source Processing
Go to Multi-Source Processing
- Select your source group
- Review the configuration (merge strategy, relationships)
- Click Start Processing
- Monitor progress through each step
- Review and resolve any conflicts that require manual attention
Conflict Resolution
When the same field has different values across sources, CleanSmart flags it as a conflict.
Automatic Resolution: Based on your merge strategy, most conflicts are resolved automatically.
Manual Resolution: Some conflicts may require your review:
- Click Review Conflicts
- See side-by-side comparison of conflicting values
- Choose which value to keep
- Optionally apply the same rule to similar conflicts
- Click Apply Resolution
Cross-Source Duplicate Detection
CleanSmart can find duplicates that exist across your sources:
- Customer in Source A is also in Source B with slight variations
- Identifies and suggests merging these cross-source duplicates
Change Log & Review
The Change Log provides a complete audit trail of every modification made during cleaning.
Understanding the Change Log
Every change is categorized by type:
- Duplicate_Resolved (SmartMatch): Duplicate records merged
- Format_Standardized (AutoFormat): Format corrections applied
- Value_Imputed (SmartFill): Missing values filled
- Anomaly_Detected (LogicGuard): Outliers flagged
- Conflict_Resolved (Multi-source): Cross-source conflicts resolved
Change Details
Each change shows:
- Record Number: Which row was affected
- Field Name: Which column was modified
- Original Value: What the data was before
- New Value: What the data is now
- Confidence Score: How confident the AI is (color-coded)
- Change Type: Which step made this change
Reviewing Changes
Filtering Options:
- By status (All, Pending Review, Approved, Rejected)
- By field name
- By change type
- Search by value
Actions:
- Approve: Accept the change (keeps new value)
- Reject/Revert: Reject the change (restores original value)
Bulk Actions:
- Approve All: Accept all pending changes
- Reject All: Revert all pending changes
Confidence Scores
Changes are color-coded by confidence:
- Green (90-100%): High confidence likely correct
- Yellow (70-89%): Medium confidence review recommended
- Orange (50-69%): Lower confidence careful review
- Red (below 50%): Low confidence manual verification needed
Workflow Requirement
You must address all pending reviews before proceeding to Analytics. This ensures you've verified all changes before finalizing your cleaned data.
Analytics Dashboard
The Analytics page shows the impact of your data cleaning operations.
Summary Metrics
Four main cards show your cleaning results:
- SmartMatch Merges: Number of duplicate records merged
- AutoFormat Fixes: Number of format corrections made
- SmartFill Predictions: Number of missing values filled
- LogicGuard Flags: Number of anomalies detected
Data Quality Score
A before/after comparison shows your data quality improvement:
- Before: Original data quality percentage
- After: Cleaned data quality percentage
- Visual bar chart showing the improvement
Key Insights
CleanSmart provides actionable insights about your data:
Success Insights (Green):
"Merged 45 duplicate customer records"
"Standardized 230 phone numbers"
Warning Insights (Amber):
"12 anomalies detected in 'age' field"
"5 conflicts required manual resolution"
Recommendations:
Suggestions for improving data quality
Tips for better results on future uploads
Multi-Source Analytics
For multi-source processing, additional metrics show:
- Sources processed count
- Relationships applied
- Total records enhanced
- Total fields in merged dataset
- Cross-source duplicate statistics
Exporting Your Data
The Export page allows you to download your cleaned data in various formats.
Single-Source Export
Step 1: Select Dataset
Choose which cleaned dataset you want to export.
Step 2: Choose Export Mode
- All AI Changes Applied: Exports data with all cleaning applied
- With Accept/Reject Choices: Respects your Change Log approvals/rejections
Step 3: Configure Options
- Include Change Tracking Metadata: Adds columns showing what changed
Step 4: Select Format
- CSV Comma-separated values Spreadsheets (Excel, Google Sheets)
- JSON JavaScript Object Notation Web applications, APIs
- Change Log (CSV) Detailed change history Audit trail, compliance
- Summary Log (CSV) Aggregated change summary Reports, analysis
Step 5: Download
- Click Export to generate and download your file.
Multi-Source Export
For multi-source groups, additional options are available:
Dataset Selection:
- Choose which source datasets to include
- Option to include the Customer Master Hub
Export Settings:
- Include Customer Master Hub: The merged master dataset
- Include Additional Fields: Fields enriched from other sources
- Include Audit Trail: Record of conflicts resolved and values chosen
- Include Change Tracking Metadata: Column-level change indicators
Spoke Export:
Export individual enriched source files that can be imported back into their original systems (e.g., CRM import files).
Export Preview
Before downloading, review:
- Total records to be exported
- Columns included
- File size estimate
- Sample of the data
Settings & Configuration
The Settings page allows you to customize CleanSmart's anomaly detection behavior.
Anomaly Detection Parameters
Fine-tune how LogicGuard detects outliers and anomalies:
- Statistical Threshold 2.5-5.0 Sensitivity for Modified Z-score detection. Lower = more sensitive
- IQR Multiplier 0.5-2.5 Width of acceptable range using quartiles. Lower = stricter
- Z-Score Threshold 0.5-5.0 Standard deviations from mean before flagging. Lower = more sensitive
- Isolation Contamination 0-50% Expected percentage of outliers in data
- Categorical Rare Threshold 0-10% Minimum occurrence percentage for categories
- Pattern Std Multiplier 0.5-5.0 Sensitivity for text pattern anomalies
Quick Presets
Choose a preset configuration:
- Strict (High Sensitivity): Catches more potential issues. Good for critical data where false positives are acceptable.
- Balanced (Default): Recommended for most use cases. Balances detection with avoiding false positives.
- Relaxed (Low Sensitivity): Only flags obvious outliers. Good for data where some variation is expected.
Applying Settings
- Adjust sliders or enter values directly
- Preview how changes affect detection (if available)
- Click Save Settings to apply
- Use Reset to Defaults if needed
Profile Management
The Profile page lets you manage your account information and preferences.
Personal Information
Edit your account details:
- Full Name: Your display name
- Email Address: Your login email
- Account Role: Admin or Member (read-only)
For Organization Admins:
- Organization Name: Your company name (editable)
Email Preferences
Control which notifications you receive:
- Processing Notifications: Alerts when cleaning jobs complete
- Weekly Reports: Summary of your data cleaning activity
- Product Updates: News about new features and updates
Saving Changes
- Click Save Changes to update your profile. Changes take effect immediately.
Team Management
The Team page (Pro and Business plans) lets organization admins manage team members.
Team Overview
View your team status:
- Seats Used: Number of active team members
- Seats Available: Remaining seats on your plan
- Progress Bar: Visual representation of seat usage
Team Members
Each member shows:
- Name and email
- Role (Admin or Member)
- Join date
- Actions (remove, for admins)
Inviting Team Members
- Click Invite Member
- Enter the email address
- Choose a role (Admin or Member)
- Click Send Invitation
The invitee will receive an email to join your organization.
Pending Invitations
View and manage outstanding invitations:
- See invited email addresses
- View expiration dates
- Cancel invitations if needed
Managing Members (Admin Only)
- Remove Member: Revoke access (frees up a seat)
- Change Role: Promote/demote between Admin and Member
Seat Limits
If you've used all seats:
- Click Purchase Additional Seats to add more
- Each additional seat has a monthly cost based on your plan
Subscription & Billing
The Subscription page manages your plan, billing, and usage.
Current Usage
View your current billing period usage:
- Data uploaded (MB used / limit)
- Records processed
- Team seats (used / total)
Upgrading Your Plan
- Click Upgrade on your desired plan
- Review the new features and pricing
- Complete payment via Stripe
- New features activate immediately
Adding Seats
- Go to Manage Billing
- Increase the seat count
- Review the price change
- Confirm and pay for additional seats
Downgrading
- Click Downgrade on a lower plan
- Review what features you'll lose
- Confirm the downgrade
- Change takes effect at the end of your billing period
Note: If your team size exceeds the new plan's seats, additional seats will be charged.
Billing History
View all past invoices:
- Invoice date and number
- Amount charged
- Payment status
- Download PDF or view online
Cancelling Your Subscription
- Click Cancel Subscription
- Confirm your cancellation
- Account remains active until period end
- After cancellation date, you cannot log in
- Data retained for 30 days
Reactivating
If you've scheduled a downgrade or cancellation:
- Click Keep Current Plan or Cancel Scheduled Change
- Confirm to continue with your current plan
Integrations
CleanSmart integrates with popular marketing and CRM platforms (Pro and Business plans).
Mailchimp Integration
Import your email audiences directly from Mailchimp:
Setup:
- Go to Data Sources
- Click Connect on the Mailchimp card
- Authorize CleanSmart to access your Mailchimp account
- Select the audience to import
Importing:
- Choose an audience from your Mailchimp account
- Review the fields that will be imported
- Click Import
- Contacts are added as a new data source
Synced Fields:
- Email address
- First name, Last name
- Custom fields
- Tags and segments
Klaviyo Integration
Import customer lists from Klaviyo:
Setup:
- Go to Data Sources
- Click Connect on the Klaviyo card
- Enter your Klaviyo API key
- Authorize the connection
Importing:
- Select a list from your Klaviyo account
- Review the customer profile fields
- Click Import
- Profiles are added as a new data source
Synced Fields:
- Email, phone
- Name fields
- Custom properties
- Profile attributes
Troubleshooting & FAQ
Q: How long does data cleaning take?
A: Processing time depends on your dataset size. Most files under 10,000 records process in under 2 minutes. Larger files may take longer.
Q: Can I undo changes after cleaning?
A: Yes! Use the Change Log to reject specific changes, which reverts them to original values. You can also re-export with "Accept/Reject Choices" to include only approved changes.
Q: Why weren't duplicates detected in my data?
A: Check your field mapping. Duplicate detection works best when fields are correctly identified (e.g., marking a column as "Name" helps SmartMatch compare names). Also, verify your composite key selection includes the right fields.
Q: How does SmartFill decide what values to predict?
A: SmartFill analyzes patterns in your existing data. It only fills values when it has high confidence based on relationships between fields. Low-confidence predictions are not applied automatically.
Q: Can I process the same file multiple times?
A: Yes. Upload the file again or re-run cleaning on the existing source. Previous changes in the Change Log will be preserved.
Q: What happens to my data after I cancel?
A: Your data is retained for 30 days after cancellation. After that, it is permanently deleted.
Common Issues
Upload fails:
- Ensure your file is in CSV format
- Check that the file isn't corrupted or empty
- Verify you haven't exceeded your plan's upload limit
Processing stuck or slow:
- Large files take longer to process
- Check your internet connection
- Try refreshing the page and checking job status
Field mapping incorrect:
- Re-configure field mapping on the source
- Manually set the correct field types
- Check that column headers are clear and descriptive
Duplicates not merging correctly:
- Review your composite key selection
- Adjust sensitivity thresholds
- Use manual resolution for complex cases
Export file is empty:
- Ensure cleaning has completed
- Check that you selected a dataset
- Verify changes weren't all rejected

