Shopify + Salesforce + HubSpot: A Practical Guide to Unified Customer Data
You've got three platforms. Each one holds a piece of your customer puzzle. Shopify knows what they bought. Salesforce tracks the sales conversations. HubSpot manages the marketing touches.
And none of them agree on who "John Smith" actually is.
This is the reality for most growing businesses. The tools work great individually. But getting them to share a single, accurate view of your customer? That's where things get messy.
Here's a practical guide to unifying customer data across these three platforms—without writing custom code or hiring a data engineer.

Why These Three Systems Fight Each Other
Before diving into solutions, it helps to understand why the problem exists in the first place.
Each platform was built with different priorities.
Shopify cares about transactions. A customer is defined by their email at checkout. Maybe their shipping address. It doesn't care much about company hierarchies or lead scoring. Someone buys something, Shopify captures the sale.
Salesforce lives in a different world entirely. Contacts belong to Accounts. Accounts have hierarchies. Opportunities tie to specific people who influence purchase decisions. The whole structure assumes complex B2B sales cycles.
HubSpot sits somewhere in between. Contacts have properties. Those contacts can belong to companies. Marketing campaigns create new contacts constantly—webinar signups, ebook downloads, demo requests. Volume matters here.
Three different philosophies. Three different data models. One customer trying to exist in all of them simultaneously.
The Schema Conflicts You'll Actually Hit
Let's get specific about what goes wrong.
Email as identifier (sounds simple, isn't)
- Shopify:
customer.email - Salesforce:
Contact.Email - HubSpot:
email
Easy match, right? Until someone uses their work email in HubSpot, personal email in Shopify, and their assistant's email got entered in Salesforce. Same person. Three different identities.
Name field variations
- Shopify stores
first_nameandlast_nameseparately. Clean, predictable. - Salesforce has
FirstName,LastName, plusSuffix,MiddleName, andSalutation. More fields means more opportunities for inconsistency. - HubSpot uses
firstnameandlastname(lowercase, no underscore). It also hashs_full_namethat sometimes gets populated, sometimes doesn't.
Phone number formatting
- Shopify:
+1 (555) 123-4567 - Salesforce:
555.123.4567 - HubSpot:
5551234567
Same number. Completely different strings. A naive merge will create three records for one customer.
Address components
- Shopify breaks addresses into
address1,address2,city,province,country,zip. - Salesforce has
MailingStreet,MailingCity,MailingState,MailingCountry,MailingPostalCode. Plus separate fields for "Other Address" and "Billing Address." - HubSpot stores
address,city,state,country,zip. Similar to Shopify, but the field names don't match.
Merging address data manually means mapping every field. Miss one, lose data.
Identity Resolution: Finding the Same Person Across Platforms
This is the core challenge. You have records from three systems. Some represent the same person. Some don't. How do you figure out which is which?
Method 1: Exact email matching
The simplest approach. Match records where emails are identical.
Works well when: Customers use consistent emails everywhere. B2B contexts where corporate emails are standard. Clean, maintained databases.
Falls apart when: People use multiple email addresses. Personal vs. work email situations. Typos in email entry. Partner or assistant emails entered instead of the actual contact.
Email matching will find maybe 60-70% of your true duplicates if you're lucky. It's a start, not a solution.
Method 2: Fuzzy name + company matching
When emails don't match, look at name and company combinations.
"Jon Smith at Acme Corp" and "Jonathan Smith at ACME Corporation" are probably the same person. Traditional string matching won't catch this. You need fuzzy matching that understands "Jon" and "Jonathan" are related. That "Corp" and "Corporation" mean the same thing.
This approach catches another 15-20% of duplicates that exact matching misses. But it also introduces false positives. "John Smith at Acme" and "John Smith at Acme Tools" might be different people entirely.
Method 3: Semantic similarity
The most sophisticated approach uses AI to understand meaning, not just strings.
Instead of comparing characters, semantic matching compares the overall meaning of records. It considers multiple fields together—name, company, email domain, phone area code, location. A record that matches on three of five fields might score higher than one that matches perfectly on just email.
This is how modern data cleaning tools find the duplicates humans miss. And it's the only reliable method when dealing with messy, real-world data from multiple sources.
A Practical Merge Workflow
Here's a step-by-step process that works without custom code.
Step 1: Export your data
Pull customer/contact data from all three platforms.
From Shopify: Admin → Customers → Export → CSV
From Salesforce: Reports → Contacts → Export (or Data Export if you have bulk access)
From HubSpot: Contacts → Export → All contacts
You'll end up with three files. Different columns, different formats, same underlying people (hopefully).
Step 2: Decide on your master source
Before merging, choose which system wins when data conflicts. This matters more than you'd think.
If Salesforce is your CRM of record for the sales team, make it the master for company and contact relationship data.
If HubSpot is running your marketing, it should be authoritative for email preferences and subscription status.
If Shopify tracks purchases, it's the master for transaction history and lifetime value.
You can't have three masters. Pick one primary source per field type.
Step 3: Map your fields
Create a mapping document. Here's what it might look like for basic contact info:
| Unified Field | Shopify Source | Salesforce Source | HubSpot Source |
|---|---|---|---|
| customer.email | Contact.Email | ||
| first_name | first_name | FirstName | firstname |
| last_name | last_name | LastName | lastname |
| phone | phone | Phone | phone |
| company | — | Account.Name | company |
Notice Shopify doesn't have a company field at all. That's a gap you'll need to fill from another source.
Step 4: Standardize formats before matching
This step gets skipped way too often. Before trying to find duplicates, normalize your data.
- Phone numbers should all follow the same format. E.164 international format (+15551234567) works across all three platforms and eliminates formatting discrepancies.
- Email addresses should be lowercase. "John@Company.com" and "john@company.com" should match.
- Names should have consistent capitalization. "JOHN SMITH" and "John Smith" should merge, not create duplicates.
- Dates need a standard format. Shopify uses ISO dates. Salesforce might have MM/DD/YYYY. Pick one, convert everything.
Step 5: Run duplicate detection
With standardized data, find your matches.
- Start with exact email matching. That catches the obvious duplicates.
- Then run fuzzy matching on name + company for records without email matches.
- Finally, use semantic similarity for the remaining unmatched records.
Review the suggested matches before merging. Automated matching is smart, but human review catches edge cases.
Step 6: Merge and resolve conflicts
When you find a match, combine the records using your master source hierarchy.
- Record A (Shopify): email = john@gmail.com, name = John Smith
- Record B (Salesforce): email = jsmith@acme.com, name = Jonathan Smith, company = Acme Corp
- Record C (HubSpot): email = john@gmail.com, name = Jon Smith
- Merged record (using Salesforce as name master, preserving all emails):
- Primary email: jsmith@acme.com (work)
- Secondary email: john@gmail.com (personal)
- Name: Jonathan Smith
- Company: Acme Corp
All three platform identities now point to one unified customer record.
Field Mapping Examples That Actually Work
Here are practical mappings for common scenarios.
B2B SaaS company mapping:
| Purpose | Shopify | Salesforce | HubSpot | Notes |
|---|---|---|---|---|
| Primary identifier | Match on all three | |||
| Revenue data | total_spent | Total Revenue (formula) | hs_lifecyclestage_customer_date | Shopify is authoritative |
| Engagement score | — | Lead Score | HubSpot Score | HubSpot is authoritative |
| Sales stage | — | Opportunity.Stage | lifecyclestage | Salesforce is authoritative |
E-commerce company mapping:
| Purpose | Shopify | Salesforce | HubSpot | Notes |
|---|---|---|---|---|
| Order count | orders_count | — | number_of_orders | Shopify is authoritative |
| Last purchase | last_order_date | — | recent_conversion_date | Shopify is authoritative |
| Email opt-in | accepts_marketing | HasOptedOutOfEmail (inverse) | hs_email_optout (inverse) | HubSpot is authoritative |
Testing Your Merged Data
Before pushing unified data back to any system, validate it.
Sample check (quick validation)
Pull 50 random merged records. Manually verify 10 of them against the source systems. If more than 1 has errors, your process needs adjustment.
Edge case review
Look specifically at:
- Records that matched on fuzzy criteria (not exact email)
- Customers with multiple email addresses
- High-value customers where errors cost more
- Recently created records (most likely to have issues)
Duplicate count comparison
If you started with 10,000 records across three systems and ended with 8,500 unified records, that's a 15% deduplication rate. Reasonable for moderately clean data.
If you're seeing 40%+ deduplication, either your data was really messy or your matching is too aggressive. Review the matches before proceeding.
When Things Go Wrong: Rollback Planning
Always keep your original exports. Don't delete them after merging.
Before pushing merged data back to any platform, document:
- What data existed before the merge
- What changes you're making
- How to reverse those changes if needed
Most platforms don't have a true "undo" for bulk data changes. Your rollback plan is reimporting the original data and manually fixing any records that got touched.
This is tedious. Which is why you validate before pushing.
The Faster Path
Everything I've described works. It's also time-consuming. Manual exports, spreadsheet mapping, careful review—it adds up to hours of work for a few thousand records. Days for larger datasets.
That's exactly why we built CleanSmart.
Upload your exports. CleanSmart handles the standardization, runs semantic duplicate detection across all three files, and shows you proposed matches before anything changes. You review, approve, and download a unified dataset.
The manual process takes 4-8 hours for a mid-sized dataset. CleanSmart does it in minutes.
Ready to unify your customer data?
Upload your Shopify, Salesforce, and HubSpot exports with our Business plan and see exactly how many duplicates are hiding in your data.
William Flaiz is a digital transformation executive and former Novartis Executive Director who has led consolidation initiatives saving enterprises over $200M in operational costs. He holds MIT's Applied Generative AI certification and specializes in helping pharmaceutical and healthcare companies align MarTech with customer-centric objectives. Connect with him on LinkedIn or at williamflaiz.com.











