Using Siamese Networks to Power Accurate Product Matching in eCommerce

Using Siamese Networks to Power Accurate Product Matching in eCommerce

26th Jun, 2024

Abhishek Gibbidi

By Abhishek Gibbidi

Retailers often compete on price to gain market share in high performance product categories. Brands too must ensure that their in-demand assortment is competitively priced across retailers. Commerce and digital shelf analytics solutions offer competitive pricing insights at both granular and SKU levels. Central to this intelligence gathering is a vital process: product matching.

Product matching or product mapping involves associating identical or similar products across diverse online platforms or marketplaces. The matching process leverages the capabilities of Artificial Intelligence (AI) to automatically create connections between various representations of identical or similar products. AI models create groups or clusters of products that are exactly the same or “similar” (based on some objectively defined similarity criteria) to solve different use cases for retailers and consumer brands.

Accurate product matching offers several key benefits for brands and retailers:

  • Competitive Pricing: By identifying identical products across platforms, businesses can compare prices and adjust their strategies to remain competitive.
  • Market Intelligence: Product matching enables brands to track their products’ performance across various retailers, providing valuable insights into market trends and consumer preferences.
  • Assortment Planning: Retailers can analyze their product range against competitors, identifying gaps or opportunities in their offerings.

Why Product Matching is Incredibly Hard

But product matching stands out as one of the most demanding technical processes for commerce intelligence tools. Here’s why:

Data Complexity

Product information comes in various (multimodal) formats – text, images, and sometimes video. Each format presents its own set of challenges, from inconsistent naming conventions to varying image quality.

Data Variance

The considerable fluctuations in both data quality and quantity across diverse product categories, geographical regions, and websites introduce an additional layer of complexity to the product matching process.

Industry Specific Nuances

Industry specific nuances introduce unique challenges to product matching. Exact matching may make sense in certain verticals, such as matching part numbers in industrial equipment or identifying substitute products in pharmaceuticals. But for other industries, exactly matched products may not offer accurate comparisons.

  • In the Fashion and Apparel industry, style-to-style matching, accommodating variants and distinguishing between core sizes and non-core sizes and age groups become essential for accurate results.
  • In Home Improvement, the presence of unbranded products, private labels, and the preference for matching sets rather than individual items complicates the process.
  • On the other hand, for grocery, product matching becomes intricate due to the distinction between item pricing and unit pricing. Managing the diverse landscape of different pack sizes, quantities, and packaging adds further layers of complexity.

Diverse Downstream Use Cases

The diverse downstream business applications give rise to various flavors of product matching tailored to meet specific needs and objectives.

In essence, while product matching is a critical component in eCommerce, its intricacies demand sophisticated solutions that address the above challenges.

To solve these challenges, at DataWeave, we’ve developed an advanced product matching system using Siamese Networks, a type of machine learning model particularly suited for comparison tasks.

Siamese Networks for Product Matching

Our methodology involves the use of ensemble deep learning architectures. In such cases, multiple AI models are trained and used simultaneously to ensure highly accurate matches. These models tackle NLP (natural language processing) and Computer Vision challenges specific to eCommerce. This technology helps us efficiently narrow down millions of product candidates to just 5-15 highly relevant matches.

The Tech Powering Siamese Networks

The key to our approach is creating what we call “embeddings” – think of these as unique digital fingerprints for each product. These embeddings are designed to capture the essence of a product in a way that makes similar products easy to identify, even when they look slightly different or have different names.

Our system learns to create these embeddings by looking at millions of product pairs. It learns to make the embeddings for similar products very close to each other while keeping the embeddings for different products far apart. This process, known as metric learning, allows our system to recognize product similarities without needing to put every product into a rigid category.

This approach is particularly powerful for eCommerce, where we often need to match products across different websites that might use different names or images for the same item. By focusing on the key features that make each product unique, our system can accurately match products even in challenging situations.

How Siamese Networks Work?

Imagine having a pair of identical twins who are experts at spotting similarities and differences. That’s essentially what a Siamese network is – a pair of identical AI systems working together to compare things.

How it works:

  • Twin AI systems: Two identical AI systems look at two different products.
  • Creating ‘fingerprints’ or ‘embedding’: Each system creates a unique ‘fingerprint’ of the product it’s looking at.
  • Comparison: These ‘fingerprints’ are then compared to see how similar the products are.


The architecture of a Siamese network typically consists of three main components: the shared network, the similarity metric, and the contrastive loss function.

  • Shared Network: This is the ‘brain’ that creates the product ‘fingerprints’ or ‘embeddings.’ It is responsible for extracting meaningful feature representations from the input samples. This network is composed of layers of neural units that work together. Weight sharing between the twin networks ensures that the model learns to extract comparable features for similar inputs, providing a basis for comparison.
  • Similarity Metric: After the shared network processes the inputs, a similarity metric is employed. This decides how alike two ‘fingerprints’ or ‘embeddings’ are. The selection of a similarity metric depends on the specific task and characteristics of the input data. Frequently used similarity metrics include the Euclidean distance, cosine similarity, or correlation coefficient, each chosen based on its suitability for the given context and desired outcomes.
  • Loss Function: For training the Siamese network, a specialized loss function is used. This helps the system improve its comparison skills over time. It guides and trains the network to generate akin embeddings for similar inputs and disparate embeddings for dissimilar inputs.

    This is achieved by imposing penalties on the model when the distance or dissimilarity between similar pairs surpasses a designated threshold, or when the distance between dissimilar pairs falls below another predefined threshold. This training strategy ensures that the network becomes adept at discerning and encoding the desired level of similarity or dissimilarity in its learned embeddings.

How DataWeave Uses Siamese Networks for Product Matching

At DataWeave, we use Siamese Networks to match products across different retailer websites. Here’s how it works:

Pre-processing (Image Preparation)

  • We collect product images from various websites.
  • We clean these images up to make them easier for our AI to understand.
  • We use techniques like cropping, flipping, and adjusting colors to help our AI recognize products even if the images are slightly different.

Training The AI

  • We show our AI system millions of product images, teaching it to recognize similarities and differences.
  • We use a special learning method called “Triplet Loss” to help our AI understand which products are the same and which are different.
  • We’ve tested different AI structures to find the one that works best for product matching, including ResNet, EfficientNet, NFNet, and ViT. 

Image Retrieval 

  • Once trained, our AI creates a unique “fingerprint” for each product image.
  • We store these fingerprints in a smart database.
  • When we need to find a match for a product, we:
    • Create a fingerprint for the new product.
    • Quickly search our database for the most similar fingerprints.
    • Return the top matching products.

Matches are then assigned a high or a low similarity score and segregated into “Exact Matches” or “Similar Matches.” For example, check out the image of this white shoe on the left. It has a low similarity score with the pink shoe (below) and so these SKUs are categorized as a “Similar Match.” Meanwhile, the shoe on the right is categorized as an “Exact Match.”

Similarly, in the following image of the dress for a young girl, the matched SKU has a high similarity score and so this pair is categorized as an “Exact Match.”

Siamese Networks play a pivotal role in DataWeave’s Product Matching Engine. Amid the millions of images and product descriptions online, our Siamese Networks act as an equalizing force, efficiently narrowing down millions of candidates to a curated selection of 10-15 potential matches. 

In addition, these networks also find application in several other contexts at DataWeave. They are used to train our system to understand text-only data from product titles and joint multimodal content from product descriptions.

Leverage Our AI-Driven Product Matching To Get Insightful Data

In summary, accurate and efficient product matching is no longer a luxury – it’s a necessity. DataWeave’s advanced product matching solution provides brands and retailers with the tools they need to navigate this complex landscape, turning the challenge of product matching into a competitive advantage.

By leveraging cutting-edge technology and simplifying it for practical use, we empower businesses to make informed decisions, optimize their operations, and stay ahead in the ever-evolving eCommerce market. To learn more, reach out to us today!

- Abhishek Gibbidi
26th Jun, 2024

AI Artificial Intelligence Data Engineering E Commerce Global Image recognition North America Online Marketplaces Product Visuals Retail

Book a Demo