CleanSmartLabs Product Support

No-code tools that make data engineering accessible.

CleanSmart Support

Introduction
CleanSmart is an enterprise-grade AI-powered data cleaning platform that transforms messy, inconsistent data into clean, standardized datasets. Whether you're dealing with duplicate records, inconsistent formatting, missing values, or data from multiple sources, CleanSmart automates the tedious work of data cleaning so you can focus on what matters.

Key Features

AI-Powered Cleaning: Advanced algorithms detect duplicates, standardize formats, fill missing values, and flag anomalies
Multi-Source Merging: Combine data from multiple sources with intelligent conflict resolution
Full Audit Trail: Track every change with the ability to approve or reject modifications
Flexible Export: Download cleaned data in CSV or JSON formats with optional metadata

Two Main Workflows

Single-Source Cleaning: Upload a single CSV file and run it through our 4-step automated cleaning pipeline
Multi-Source Processing: Combine multiple data sources into a unified master dataset with enriched spoke files
Getting Started
Creating Your Account

Visit the CleanSmart registration page
Choose your plan (Starter, Pro, or Business)
Enter your details and create your account
All plans include a 7-day free trial with full access to features

Your First Data Cleaning Project
Quick Start (Single-Source):

Navigate to Data Sources
Click Upload CSV and select your file
Review the field mapping and click Confirm
Go to Data Cleaning and click Start Cleaning
Review changes in the Change Log
View results in Analytics
Download your cleaned data from Export
Data Sources
The Data Sources page is your central hub for managing all uploaded data files and source groups.

Uploading Files

To upload a single file:

Click the Upload CSV button or drag and drop a file
CleanSmart analyzes your file and displays a preview
Review the detected columns and data types
Click Confirm to add the source to your workspace

Supported file formats:

CSV (Comma-Separated Values)
Files up to your plan's size limit

Field Mapping
After uploading, CleanSmart automatically detects field types (name, email, phone, address, etc.). You can customize this mapping:

Click the Configure button on any source
Review each column's assigned field type
Use the dropdown to change field types if needed
Click Save Mapping to apply changes

Accurate field mapping ensures better cleaning results, especially for:

Phone number formatting
Email validation
Name capitalization
Address standardization

Managing Sources
Each uploaded source shows:

File name and upload date
Record count (number of rows)
Mapping status (configured or needs mapping)
Last sync time (for integrations)

Actions available:

Configure: Edit field mappings
Delete: Remove the source (this cannot be undone)
Clear All: Remove all sources from your workspace
Single-Source Data Cleaning
The single-source workflow is a 4-step automated pipeline that cleans your data using AI-powered algorithms.

The 4-Step Pipeline

Step 1: SmartMatch (Duplicate Detection & Merging)
SmartMatch uses AI-powered semantic similarity to find duplicate records that traditional exact-matching would miss.

What it detects:

Exact duplicates (identical records)
Near-duplicates (e.g., "John Smith" vs "Jon Smith")
Semantic duplicates (e.g., "IBM" vs "International Business Machines")

Configuration options:

Composite Key Selection: Choose which fields to use for matching
Data Type: Select entity, transactional, or line-item data

Results include:

Total duplicates found
Similarity scores (0-100%)
Matching reasons (why records were flagged)

Step 2: AutoFormat (Format Standardization)
AutoFormat fixes inconsistent formatting across your data.

What it fixes:

Name capitalization: "john smith" → "John Smith"
Phone numbers: Removes letters, standardizes format
Email addresses: Lowercase, typo detection
Dates: Consistent date formatting
Addresses: Standardized abbreviations (St., Ave., etc.)
Common typos: Corrects frequent spelling errors

Step 3: SmartFill (Missing Value Imputation)
SmartFill uses machine learning to intelligently fill missing values based on patterns in your existing data.

How it works:

Analyzes relationships between fields
Predicts missing values using existing patterns
Assigns confidence scores to each prediction
Never fills fields where prediction confidence is too low

Example:
If most customers from "90210" zip code are in "Beverly Hills, CA", SmartFill can predict the city/state for records with only a zip code.

Step 4: LogicGuard (Anomaly Detection)
LogicGuard identifies outliers and impossible values that may indicate data quality issues.

What it detects:

Numerical outliers: Ages over 150, negative prices
Statistical anomalies: Values far from the mean
Pattern violations: Phone numbers with wrong digit counts
Impossible values: Future dates for birthdays

Detection methods:

Z-score analysis
Interquartile Range (IQR)
Isolation Forest algorithm
Pattern-based detection

Running the Pipeline

Navigate to Data Cleaning
Select your data source from the dropdown
Click Start Cleaning to run all steps, or click individual steps
Watch the real-time progress bar
Review results for each step as they complete

Duplicate Resolution Options
When duplicates are found, you have several resolution options:

Automatic Resolution:

Accept all AI-suggested merges with one click
System chooses the most complete record as master

Manual Resolution:

Review each duplicate cluster one by one
Select which record should be the master
Choose field-by-field which values to keep
Reject false positives

Resolution Strategies:

Master Record: One record becomes the source of truth
Field-Level Merge: Combine best values from each duplicate
Keep Both: Mark as not duplicates if incorrectly matched
Multi-Source Data Processing
Multi-Source Processing allows you to merge and enrich data from multiple sources using a hub-and-spoke architecture.

Key Concepts
Source Group: A container for multiple related data sources that should be merged together.

Relationships: Define how sources connect to each other (e.g., Customer ID in Source A matches CustomerID in Source B).

Hub (Master Dataset): The unified, merged dataset created by combining all sources.

Spokes: Individual source files enriched with data from other sources in the group.

Creating a Source Group

Go to Data Sources
Click Create Source Group
Enter a group name (e.g., "Customer 360")
Click Create

Adding Sources to a Group
Open your source group

Click Add Source
Either upload a new file or select an existing source
Configure field mapping for each source
Repeat for all sources you want to merge

Defining Relationships
Relationships tell CleanSmart how to connect records across sources.

Go to the Relationships tab in your source group
Review Suggested Relationships (AI-detected connections)
Click Accept to use a suggestion, or Customize to modify it

To create a manual relationship:
1. Click Add Relationship
2. Select the first source and field (e.g., customers.customer_id)
3. Select the second source and field (e.g., orders.customer_id)
4. Choose the relationship type:
- 1-to-1: One record in Source A matches one in Source B
- 1-to-many: One record in Source A matches multiple in Source B
- Many-to-many: Multiple records can match in both directions
5. Configure matching options (exact match or fuzzy matching)
6. Click Save

Merge Strategies
Choose how conflicts between sources should be resolved:

Strategy Description Best For

MASTER_SLAVE One source is always trusted over others When you have a primary system of record
CONSENSUS Values appearing in multiple sources win When no single source is authoritative
WEIGHTED Sources have trust scores that determine priority When sources have varying reliability
RECENT Most recently updated value wins When newer data is more accurate

The 6-Step Processing Pipeline

Schema Validation: Aligns column structures across all sources
Relationship Detection: Verifies and applies defined relationships
Data Merging: Combines records from multiple sources
Conflict Resolution: Handles conflicting values using your chosen strategy
Data Enrichment: Adds fields from other sources to each dataset
Quality Verification: Validates the merged data quality

Running Multi-Source Processing

Go to Multi-Source Processing

Select your source group
Review the configuration (merge strategy, relationships)
Click Start Processing
Monitor progress through each step
Review and resolve any conflicts that require manual attention

Conflict Resolution
When the same field has different values across sources, CleanSmart flags it as a conflict.

Automatic Resolution: Based on your merge strategy, most conflicts are resolved automatically.

Manual Resolution: Some conflicts may require your review:

Click Review Conflicts
See side-by-side comparison of conflicting values
Choose which value to keep
Optionally apply the same rule to similar conflicts
Click Apply Resolution

Cross-Source Duplicate Detection
CleanSmart can find duplicates that exist across your sources:

Customer in Source A is also in Source B with slight variations
Identifies and suggests merging these cross-source duplicates
Change Log & Review
The Change Log provides a complete audit trail of every modification made during cleaning.

Understanding the Change Log
Every change is categorized by type:

Duplicate_Resolved (SmartMatch): Duplicate records merged
Format_Standardized (AutoFormat): Format corrections applied
Value_Imputed (SmartFill): Missing values filled
Anomaly_Detected (LogicGuard): Outliers flagged
Conflict_Resolved (Multi-source): Cross-source conflicts resolved

Change Details
Each change shows:

Record Number: Which row was affected
Field Name: Which column was modified
Original Value: What the data was before
New Value: What the data is now
Confidence Score: How confident the AI is (color-coded)
Change Type: Which step made this change

Reviewing Changes
Filtering Options:

By status (All, Pending Review, Approved, Rejected)
By field name
By change type
Search by value

Actions:

Approve: Accept the change (keeps new value)
Reject/Revert: Reject the change (restores original value)

Bulk Actions:

Approve All: Accept all pending changes
Reject All: Revert all pending changes

Confidence Scores
Changes are color-coded by confidence:

Green (90-100%): High confidence likely correct
Yellow (70-89%): Medium confidence review recommended
Orange (50-69%): Lower confidence careful review
Red (below 50%): Low confidence manual verification needed

Workflow Requirement
You must address all pending reviews before proceeding to Analytics. This ensures you've verified all changes before finalizing your cleaned data.
Analytics Dashboard
The Analytics page shows the impact of your data cleaning operations.

Summary Metrics
Four main cards show your cleaning results:

SmartMatch Merges: Number of duplicate records merged
AutoFormat Fixes: Number of format corrections made
SmartFill Predictions: Number of missing values filled
LogicGuard Flags: Number of anomalies detected

Data Quality Score
A before/after comparison shows your data quality improvement:

Before: Original data quality percentage
After: Cleaned data quality percentage
Visual bar chart showing the improvement

Key Insights
CleanSmart provides actionable insights about your data:

Success Insights (Green):
"Merged 45 duplicate customer records"
"Standardized 230 phone numbers"

Warning Insights (Amber):
"12 anomalies detected in 'age' field"
"5 conflicts required manual resolution"

Recommendations:
Suggestions for improving data quality
Tips for better results on future uploads

Multi-Source Analytics
For multi-source processing, additional metrics show:

Sources processed count
Relationships applied
Total records enhanced
Total fields in merged dataset
Cross-source duplicate statistics
Exporting Your Data
The Export page allows you to download your cleaned data in various formats.

Single-Source Export
Step 1: Select Dataset
Choose which cleaned dataset you want to export.

Step 2: Choose Export Mode

All AI Changes Applied: Exports data with all cleaning applied
With Accept/Reject Choices: Respects your Change Log approvals/rejections

Step 3: Configure Options

Include Change Tracking Metadata: Adds columns showing what changed

Step 4: Select Format

CSV Comma-separated values Spreadsheets (Excel, Google Sheets)
JSON JavaScript Object Notation Web applications, APIs
Change Log (CSV) Detailed change history Audit trail, compliance
Summary Log (CSV) Aggregated change summary Reports, analysis

Step 5: Download

Click Export to generate and download your file.

Multi-Source Export
For multi-source groups, additional options are available:

Dataset Selection:

Choose which source datasets to include
Option to include the Customer Master Hub

Export Settings:

Include Customer Master Hub: The merged master dataset
Include Additional Fields: Fields enriched from other sources
Include Audit Trail: Record of conflicts resolved and values chosen
Include Change Tracking Metadata: Column-level change indicators
Spoke Export: Export individual enriched source files that can be imported back into their original systems (e.g., CRM import files).

Export Preview
Before downloading, review:

Total records to be exported
Columns included
File size estimate
Sample of the data
Settings & Configuration
The Settings page allows you to customize CleanSmart's anomaly detection behavior.

Anomaly Detection Parameters
Fine-tune how LogicGuard detects outliers and anomalies:

Statistical Threshold 2.5-5.0 Sensitivity for Modified Z-score detection. Lower = more sensitive
IQR Multiplier 0.5-2.5 Width of acceptable range using quartiles. Lower = stricter
Z-Score Threshold 0.5-5.0 Standard deviations from mean before flagging. Lower = more sensitive
Isolation Contamination 0-50% Expected percentage of outliers in data
Categorical Rare Threshold 0-10% Minimum occurrence percentage for categories
Pattern Std Multiplier 0.5-5.0 Sensitivity for text pattern anomalies

Quick Presets
Choose a preset configuration:

Strict (High Sensitivity): Catches more potential issues. Good for critical data where false positives are acceptable.
Balanced (Default): Recommended for most use cases. Balances detection with avoiding false positives.
Relaxed (Low Sensitivity): Only flags obvious outliers. Good for data where some variation is expected.

Applying Settings

Adjust sliders or enter values directly
Preview how changes affect detection (if available)
Click Save Settings to apply
Use Reset to Defaults if needed
Profile Management
The Profile page lets you manage your account information and preferences.

Personal Information
Edit your account details:

Full Name: Your display name
Email Address: Your login email
Account Role: Admin or Member (read-only)

For Organization Admins:

Organization Name: Your company name (editable)

Email Preferences
Control which notifications you receive:

Processing Notifications: Alerts when cleaning jobs complete
Weekly Reports: Summary of your data cleaning activity
Product Updates: News about new features and updates

Saving Changes

Click Save Changes to update your profile. Changes take effect immediately.
Team Management
The Team page (Pro and Business plans) lets organization admins manage team members.

Team Overview

View your team status:

Seats Used: Number of active team members
Seats Available: Remaining seats on your plan
Progress Bar: Visual representation of seat usage

Team Members
Each member shows:

Name and email
Role (Admin or Member)
Join date
Actions (remove, for admins)

Inviting Team Members

Click Invite Member
Enter the email address
Choose a role (Admin or Member)
Click Send Invitation

The invitee will receive an email to join your organization.

Pending Invitations
View and manage outstanding invitations:

See invited email addresses
View expiration dates
Cancel invitations if needed

Managing Members (Admin Only)

Remove Member: Revoke access (frees up a seat)
Change Role: Promote/demote between Admin and Member

Seat Limits
If you've used all seats:

Click Purchase Additional Seats to add more
Each additional seat has a monthly cost based on your plan
Subscription & Billing
The Subscription page manages your plan, billing, and usage.

Current Usage
View your current billing period usage:

Data uploaded (MB used / limit)
Records processed
Team seats (used / total)

Upgrading Your Plan

Click Upgrade on your desired plan
Review the new features and pricing
Complete payment via Stripe
New features activate immediately

Adding Seats

Go to Manage Billing
Increase the seat count
Review the price change
Confirm and pay for additional seats

Downgrading

Click Downgrade on a lower plan
Review what features you'll lose
Confirm the downgrade
Change takes effect at the end of your billing period

Note: If your team size exceeds the new plan's seats, additional seats will be charged.

Billing History
View all past invoices:

Invoice date and number
Amount charged
Payment status
Download PDF or view online

Cancelling Your Subscription

Click Cancel Subscription
Confirm your cancellation
Account remains active until period end
After cancellation date, you cannot log in
Data retained for 30 days

Reactivating
If you've scheduled a downgrade or cancellation:

Click Keep Current Plan or Cancel Scheduled Change
Confirm to continue with your current plan
Integrations
CleanSmart integrates with popular marketing and CRM platforms (Pro and Business plans).

Mailchimp Integration
Import your email audiences directly from Mailchimp:

Setup:

Go to Data Sources
Click Connect on the Mailchimp card
Authorize CleanSmart to access your Mailchimp account
Select the audience to import

Importing:

Choose an audience from your Mailchimp account
Review the fields that will be imported
Click Import
Contacts are added as a new data source

Synced Fields:

Email address
First name, Last name
Custom fields
Tags and segments

Klaviyo Integration
Import customer lists from Klaviyo:

Setup:

Go to Data Sources
Click Connect on the Klaviyo card
Enter your Klaviyo API key
Authorize the connection

Importing:

Select a list from your Klaviyo account
Review the customer profile fields
Click Import
Profiles are added as a new data source

Synced Fields:

Email, phone
Name fields
Custom properties
Profile attributes

HubSpot Integration

Import contacts and companies from HubSpot CRM:

Setup:

Go to **Data Sources**
Click **Connect** on the HubSpot card
Sign in to your HubSpot account
Grant CleanSmart access to your CRM data

Importing:

Choose what to import: Contacts, Companies, or both
Optionally filter by list, lifecycle stage, or custom properties
Review the fields that will be imported
Click Import
Records are added as a new data source

Synced Fields:

Contact: Email, first name, last name, phone, company
Company: Name, domain, industry, size
Lifecycle stage and lead status
Custom properties
Associated company data (for contacts)

Use Cases:

Clean duplicate contacts before a marketing campaign
Standardize company names across your CRM
Merge contact records from multiple HubSpot portals

Shopify Integration

Import customer data from your Shopify store:

Setup:

Go to Data Sources
Click Connect on the Shopify card
Enter your Shopify store URL (e.g., mystore.myshopify.com)
Authorize CleanSmart to access your store data

Importing:

Choose what to import: Customers, Orders, or Products
Set optional filters (date range, customer tags, order status)
Review the fields that will be imported
Click Import
Records are added as a new data source

Synced Fields:

Customer: Email, first name, last name, phone, addresses
Orders: Order number, customer, line items, totals, dates
Products: Title, SKU, vendor, product type, tags
Customer tags and marketing consent status
Order and shipping addresses

Use Cases:

Deduplicate customer records across multiple Shopify stores
Clean and standardize customer addresses for shipping
Merge Shopify customer data with your CRM for a unified view
Troubleshooting & FAQ
Q: How long does data cleaning take?
A: Processing time depends on your dataset size. Most files under 10,000 records process in under 2 minutes. Larger files may take longer.

Q: Can I undo changes after cleaning?
A: Yes! Use the Change Log to reject specific changes, which reverts them to original values. You can also re-export with "Accept/Reject Choices" to include only approved changes.

Q: Why weren't duplicates detected in my data?
A: Check your field mapping. Duplicate detection works best when fields are correctly identified (e.g., marking a column as "Name" helps SmartMatch compare names). Also, verify your composite key selection includes the right fields.

Q: How does SmartFill decide what values to predict?
A: SmartFill analyzes patterns in your existing data. It only fills values when it has high confidence based on relationships between fields. Low-confidence predictions are not applied automatically.

Q: Can I process the same file multiple times?
A: Yes. Upload the file again or re-run cleaning on the existing source. Previous changes in the Change Log will be preserved.

Q: What happens to my data after I cancel?
A: Your data is retained for 30 days after cancellation. After that, it is permanently deleted.

Common Issues

Upload fails:

Ensure your file is in CSV format
Check that the file isn't corrupted or empty
Verify you haven't exceeded your plan's upload limit

Processing stuck or slow:

Large files take longer to process
Check your internet connection
Try refreshing the page and checking job status

Field mapping incorrect:

Re-configure field mapping on the source
Manually set the correct field types
Check that column headers are clear and descriptive

Duplicates not merging correctly:

Review your composite key selection
Adjust sensitivity thresholds
Use manual resolution for complex cases

Export file is empty:

Ensure cleaning has completed
Check that you selected a dataset
Verify changes weren't all rejected

Email Us

For general inquiries & questions,

support@cleansmartlabs.com

CleanSmartLabs Product Support

No-code tools that make data engineering accessible.

CleanSmart Support

Introduction

Getting Started

Data Sources

Single-Source Data Cleaning

Multi-Source Data Processing

Change Log & Review

Analytics Dashboard

Exporting Your Data

Settings & Configuration

Profile Management

Team Management

Subscription & Billing

Integrations

Troubleshooting & FAQ

Email Us