Shopify + Salesforce + HubSpot: A Practical Guide to Unified Customer Data

William Flaiz • January 14, 2026

You've got three platforms. Each one holds a piece of your customer puzzle. Shopify knows what they bought. Salesforce tracks the sales conversations. HubSpot manages the marketing touches.


And none of them agree on who "John Smith" actually is.


This is the reality for most growing businesses. The tools work great individually. But getting them to share a single, accurate view of your customer? That's where things get messy.



Here's a practical guide to unifying customer data across these three platforms—without writing custom code or hiring a data engineer.

Diagram showing data flow from Shopify, Salesforce, and HubSpot to a verified user profile.

Why These Three Systems Fight Each Other

Before diving into solutions, it helps to understand why the problem exists in the first place.



Each platform was built with different priorities.


Shopify cares about transactions. A customer is defined by their email at checkout. Maybe their shipping address. It doesn't care much about company hierarchies or lead scoring. Someone buys something, Shopify captures the sale.


Salesforce lives in a different world entirely. Contacts belong to Accounts. Accounts have hierarchies. Opportunities tie to specific people who influence purchase decisions. The whole structure assumes complex B2B sales cycles.


HubSpot sits somewhere in between. Contacts have properties. Those contacts can belong to companies. Marketing campaigns create new contacts constantly—webinar signups, ebook downloads, demo requests. Volume matters here.


Three different philosophies. Three different data models. One customer trying to exist in all of them simultaneously.


The Schema Conflicts You'll Actually Hit

Let's get specific about what goes wrong.


Email as identifier (sounds simple, isn't)

  • Shopify:  customer.email
  • Salesforce:  Contact.Email
  • HubSpot:  email


Easy match, right? Until someone uses their work email in HubSpot, personal email in Shopify, and their assistant's email got entered in Salesforce. Same person. Three different identities.


Name field variations

  • Shopify stores  first_name   and  last_name   separately. Clean, predictable.
  • Salesforce has  FirstName LastName , plus  Suffix MiddleName , and  Salutation . More fields means more opportunities for inconsistency.
  • HubSpot uses  firstname   and  lastname   (lowercase, no underscore). It also has  hs_full_name   that sometimes gets populated, sometimes doesn't.


Phone number formatting

  • Shopify:  +1 (555) 123-4567
  • Salesforce:  555.123.4567
  • HubSpot:  5551234567


Same number. Completely different strings. A naive merge will create three records for one customer.


Address components

  • Shopify breaks addresses into  address1 address2 city province country zip .
  • Salesforce has  MailingStreet MailingCity MailingState MailingCountry MailingPostalCode . Plus separate fields for "Other Address" and "Billing Address."
  • HubSpot stores  address city state country zip . Similar to Shopify, but the field names don't match.


Merging address data manually means mapping every field. Miss one, lose data.

Identity Resolution: Finding the Same Person Across Platforms

This is the core challenge. You have records from three systems. Some represent the same person. Some don't. How do you figure out which is which?


Method 1: Exact email matching

The simplest approach. Match records where emails are identical.


Works well when: Customers use consistent emails everywhere. B2B contexts where corporate emails are standard. Clean, maintained databases.


Falls apart when: People use multiple email addresses. Personal vs. work email situations. Typos in email entry. Partner or assistant emails entered instead of the actual contact.


Email matching will find maybe 60-70% of your true duplicates if you're lucky. It's a start, not a solution.


Method 2: Fuzzy name + company matching

When emails don't match, look at name and company combinations.


"Jon Smith at Acme Corp" and "Jonathan Smith at ACME Corporation" are probably the same person. Traditional string matching won't catch this. You need fuzzy matching that understands "Jon" and "Jonathan" are related. That "Corp" and "Corporation" mean the same thing.


This approach catches another 15-20% of duplicates that exact matching misses. But it also introduces false positives. "John Smith at Acme" and "John Smith at Acme Tools" might be different people entirely.


Method 3: Semantic similarity

The most sophisticated approach uses AI to understand meaning, not just strings.


Instead of comparing characters, semantic matching compares the overall meaning of records. It considers multiple fields together—name, company, email domain, phone area code, location. A record that matches on three of five fields might score higher than one that matches perfectly on just email.


This is how modern data cleaning tools find the duplicates humans miss. And it's the only reliable method when dealing with messy, real-world data from multiple sources.

A Practical Merge Workflow

Here's a step-by-step process that works without custom code.


Step 1: Export your data

Pull customer/contact data from all three platforms.

From Shopify: Admin → Customers → Export → CSV

From Salesforce: Reports → Contacts → Export (or Data Export if you have bulk access)

From HubSpot: Contacts → Export → All contacts

You'll end up with three files. Different columns, different formats, same underlying people (hopefully).


Step 2: Decide on your master source

Before merging, choose which system wins when data conflicts. This matters more than you'd think.


If Salesforce is your CRM of record for the sales team, make it the master for company and contact relationship data.


If HubSpot is running your marketing, it should be authoritative for email preferences and subscription status.

If Shopify tracks purchases, it's the master for transaction history and lifetime value.


You can't have three masters. Pick one primary source per field type.



Step 3: Map your fields

Create a mapping document. Here's what it might look like for basic contact info:

Unified Field Shopify Source Salesforce Source HubSpot Source
email customer.email Contact.Email email
first_name first_name FirstName firstname
last_name last_name LastName lastname
phone phone Phone phone
company Account.Name company

Notice Shopify doesn't have a company field at all. That's a gap you'll need to fill from another source.


Step 4: Standardize formats before matching

This step gets skipped way too often. Before trying to find duplicates, normalize your data.

  • Phone numbers should all follow the same format. E.164 international format (+15551234567) works across all three platforms and eliminates formatting discrepancies.
  • Email addresses should be lowercase. "John@Company.com" and "john@company.com" should match.
  • Names should have consistent capitalization. "JOHN SMITH" and "John Smith" should merge, not create duplicates.
  • Dates need a standard format. Shopify uses ISO dates. Salesforce might have MM/DD/YYYY. Pick one, convert everything.


Step 5: Run duplicate detection

With standardized data, find your matches.

  • Start with exact email matching. That catches the obvious duplicates.
  • Then run fuzzy matching on name + company for records without email matches.
  • Finally, use semantic similarity for the remaining unmatched records.


Review the suggested matches before merging. Automated matching is smart, but human review catches edge cases.


Step 6: Merge and resolve conflicts

When you find a match, combine the records using your master source hierarchy.

  • Record A (Shopify): email = john@gmail.com, name = John Smith
  • Record B (Salesforce): email = jsmith@acme.com, name = Jonathan Smith, company = Acme Corp
  • Record C (HubSpot): email = john@gmail.com, name = Jon Smith
  • Merged record (using Salesforce as name master, preserving all emails):
  • Primary email: jsmith@acme.com (work)
  • Secondary email: john@gmail.com (personal)
  • Name: Jonathan Smith
  • Company: Acme Corp


All three platform identities now point to one unified customer record.

Field Mapping Examples That Actually Work

Here are practical mappings for common scenarios.



B2B SaaS company mapping:

Purpose Shopify Salesforce HubSpot Notes
Primary identifier email Email email Match on all three
Revenue data total_spent Total Revenue (formula) hs_lifecyclestage_customer_date Shopify is authoritative
Engagement score Lead Score HubSpot Score HubSpot is authoritative
Sales stage Opportunity.Stage lifecyclestage Salesforce is authoritative

E-commerce company mapping:

Purpose Shopify Salesforce HubSpot Notes
Order count orders_count number_of_orders Shopify is authoritative
Last purchase last_order_date recent_conversion_date Shopify is authoritative
Email opt-in accepts_marketing HasOptedOutOfEmail (inverse) hs_email_optout (inverse) HubSpot is authoritative

Testing Your Merged Data

Before pushing unified data back to any system, validate it.


Sample check (quick validation)

Pull 50 random merged records. Manually verify 10 of them against the source systems. If more than 1 has errors, your process needs adjustment.


Edge case review

Look specifically at:

  • Records that matched on fuzzy criteria (not exact email)
  • Customers with multiple email addresses
  • High-value customers where errors cost more
  • Recently created records (most likely to have issues)


Duplicate count comparison

If you started with 10,000 records across three systems and ended with 8,500 unified records, that's a 15% deduplication rate. Reasonable for moderately clean data.


If you're seeing 40%+ deduplication, either your data was really messy or your matching is too aggressive. Review the matches before proceeding.


When Things Go Wrong: Rollback Planning

Always keep your original exports. Don't delete them after merging.


Before pushing merged data back to any platform, document:

  • What data existed before the merge
  • What changes you're making
  • How to reverse those changes if needed


Most platforms don't have a true "undo" for bulk data changes. Your rollback plan is reimporting the original data and manually fixing any records that got touched.


This is tedious. Which is why you validate before pushing.


The Faster Path

Everything I've described works. It's also time-consuming. Manual exports, spreadsheet mapping, careful review—it adds up to hours of work for a few thousand records. Days for larger datasets.


That's exactly why we built CleanSmart.


Upload your exports. CleanSmart handles the standardization, runs semantic duplicate detection across all three files, and shows you proposed matches before anything changes. You review, approve, and download a unified dataset.


The manual process takes 4-8 hours for a mid-sized dataset. CleanSmart does it in minutes.

Ready to unify your customer data?


Upload your Shopify, Salesforce, and HubSpot exports with our Business plan and see exactly how many duplicates are hiding in your data.

Start cleaning for free →

William Flaiz is a digital transformation executive and former Novartis Executive Director who has led consolidation initiatives saving enterprises over $200M in operational costs. He holds MIT's Applied Generative AI certification and specializes in helping pharmaceutical and healthcare companies align MarTech with customer-centric objectives. Connect with him on LinkedIn or at williamflaiz.com.

Abstract illustration of connected circles and icons on a light blue and white background, representing networking or data flow.
By William Flaiz February 26, 2026
You can't guilt people into better data entry. Learn how to build a data quality culture through visibility, smart incentives, and automation.
Abstract graphic depicting a central device communicating between two devices, each with an alert symbol.
By William Flaiz February 24, 2026
Your validation rules rejected good data or let bad data through. Here's how to troubleshoot and fix your validation logic.
Data visualization showing data flowing from charts to a schedule board, all in a clean, modern style with teal and white hues.
By William Flaiz February 19, 2026
Turn scattered spreadsheets into one clean, unified dataset without code. A practical workflow for data cleaning, preview controls, audit trails, and governance.
Data transformation illustration, showing data flow from gray blocks to green blocks, passing through verification gates.
By William Flaiz February 17, 2026
Moving CRMs? The data you bring determines whether the new system works. Here's what to clean before you migrate.
Phone number with country codes and a highlighted main number.
By William Flaiz February 12, 2026
Master E.164 phone formatting for CRM data cleansing. Country code examples, a data cleaning checklist, and best practices for international contact data.
Conceptual graphic showing a data filtering process. Hexagon people icons pass through a filter, transforming into document icons.
By William Flaiz February 10, 2026
Deduplication isn't a one-time event. Here's how to handle duplicates at every stage—from prevention to detection to merge.
Abstract graphic with checkmarks and hexagon shapes, in shades of blue, green, and white.
By William Flaiz February 5, 2026
Email Validation the Right Way (Without Nuking Good Leads) — practical strategies and templates.
Map with location markers connected by lines, indicating delivery route, leading to a package detail screen.
By William Flaiz February 3, 2026
123 Main St, 123 Main Street, and 123 Main ST are the same address. Getting your systems to agree is another story.
Timeline showing project phases: start, full-time development, part-time, beta launch. 15-20% time lost to rework.
By William Flaiz February 1, 2026
A brutally honest breakdown of what AI coding tools actually require. The architecture directives, the rework, and why 20 years of experience wasn't optional.
Checklist with green checkmarks, overlaid on translucent rectangular blocks, against a white and abstract background.
By William Flaiz January 29, 2026
Cut through the marketing noise. Learn the five capabilities that actually matter when evaluating data cleaning tools, plus a ready-to-use RFP checklist.