In retail, competition is fierce, and in its ever-evolving landscape, consumer expectations are higher than ever.
For years, our AI-driven solutions have been the foundation that empowers businesses to sharpen their competitive pricing and optimize digital shelf performance. But in today’s world, evolution is constant—so is innovation. We now find ourselves at the frontier of a new era in AI. With the dawn of Generative AI and the rise of Large Language Models (LLMs), the possibilities for eCommerce companies are expanding at an unprecedented pace.
These technologies aren’t just a step forward; they’re a leap—propelling our capabilities to new heights. The insights are deeper, the recommendations more precise, and the competitive and market intelligence we provide is sharper than ever. This synergy between our legacy of AI expertise and the advancements of today positions DataWeave to deliver even greater value, thus helping businesses thrive in a fast-paced, data-driven world.
This article marks the beginning of a series where we will take you through these transformative AI capabilities, each designed to give retailers and brands a competitive edge.
In this first piece, we’ll offer a snapshot of how DataWeave aggregates and analyzes billions of publicly available data points to help businesses stay agile, informed, and ahead of the curve. These fall into four broad categories:
Dynamic pricing is an indispensable tool for eCommerce stores to remain competitive. A blessing—and a curse—of online shopping is that users can compare prices of similar products in a few clicks, with most shoppers gravitating toward the lowest price. Consequently, retailers can lose sales over minor discrepancies of $1–2 or even less.
All major eCommerce platforms compare product prices—especially their top selling products—across competing players and adjust prices to match or undercut competitors. A typical product undergoes 20.4 price changes annually, or roughly once every 18 days. Amazon takes it to the extreme, changing prices approximately every 10 minutes. It helps them maintain a healthy price perception among their consumers.
However, accurate product matching at scale is a prerequisite for the above, and that poses significant challenges. There is no standardized approach to product cataloging, so even identical products bear different product titles, descriptions, and attributes. Information is often incomplete, noisy, or ambiguous. Image data contains even more variability—the same product can be styled using different backgrounds, lighting, orientations, and quality; images can have multiple overlapping objects of interest or extraneous objects, and at times the images and the text on a single page might belong to completely different products!
DataWeave leverages advanced technologies, including computer vision, natural language processing (NLP), and deep learning, to achieve highly accurate product matching. Our pricing intelligence solution accurately matches products across hundreds of websites and automatically tracks competitor pricing data.
Here’s how it works:
It identifies relevant text features essential for accurate comparison.
Image processing algorithms use feature extraction to define visual attributes. For example, when comparing images of a red T-shirt, the algorithm might extract features such as “crew neck,” “red,” or “striped.”
Image hashing techniques create a unique representation (or “hash”) of an image, allowing for efficient comparison and matching of product images. This process transforms an image into a concise string or sequence of numbers that captures its essential features even if the image has been resized, rotated, or edited.
Before we perform these activities there is a need to preprocess images to prepare them for downstream operations. These include object detection to identify objects of interest, background removal, face/skin detection and removal, pose estimation and correction, and so forth.
We have built a hybrid or a multimodal product-matching engine that uses image features, text features, and domain heuristics. For every product we process we create and store multiple text and image embeddings in a vector database. These include a combination of basic feature vectors (e.g. tf-idf based, colour histograms, share vectors) to more advanced deep learning algorithms-based embeddings (e.g., BERT, CLIP) to the latest LLM-based embeddings.
Classification algorithms enhance product attribute tagging by designating match types. For example, the product might be identified as an “exact match”, “variant”, “similar”, or “substitute.” The algorithm can also identify identical product combinations or “baskets” of items typically purchased together.
Attribute tagging involves assigning standardized tags for product attributes, such as brand, model, size, color, or material. These naming conventions form the basis for accurate product matching. Tagging detailed attributes, such as specifications, features, and dimensions, helps match products that meet similar criteria. For example, tags like “collar” or “pockets” for apparel ensure high-fidelity product matches for hard-to-distinguish items with minor stylistic variations.
Including tags for synonyms, variants, and long-tail keywords (e.g., “denim” and “jeans”) improves the matching process by recognizing different terms used for similar products. Metadata tags categorize similar items according to SKU numbers, manufacturer details, and other identifiers.
Altogether, these capabilities provide high-quality product matches and valuable metadata for retailers to classify their products and compare their product assortment to competitors.
Customer reviews and ratings are rich sources of information, enabling brands to gauge consumer sentiment and identify shortcomings regarding product quality or service delivery. However, while informative, reviews constitute unstructured “noisy” data that is actionable only if parsed correctly.
Here’s where DataWeave’s UGC analysis capability steps in.
The final output combines the information extracted from each of these features, which looks something like this:
The algorithm also recognizes spammy reviews and distinguishes subjective reviews (i.e., those fueled by emotion) from objective ones.
Our image processing tool can interpret promotional banners and extract information regarding product highlights, discounts, and special offers. This provides insights into pricing strategies and promotional tactics used by other online stores.
For example, if a competitor offers a 20% discount on a popular product, you can match or exceed this discount to attract more customers.
The banner reader identifies successful promotional trends and patterns from competitors, such as the timing of discounts, frequently promoted product categories or brands, and the duration of sales events. Ecommerce stores can use this information to optimize their promotion strategies, ensuring they launch compelling and timely offers.
While these generalized AI tools are highly useful in various industries, we’ve created other category—and attribute-specific capabilities for specialty goods (e.g., those requiring certifications or approval by federal agencies) and food items. These use cases help our customers adhere to compliance requirements.
This detector lets retailers match items based on official certification marks. These marks represent compliance with industry standards, safety regulations, and quality benchmarks.
Example:
By detecting these certification marks, the system can accurately match products with their certified counterparts. By identifying which competitor products are certified, retailers can identify products that may benefit from certification.
Product attributes alone are insufficient for comparing food items. Differences in nutrition content can influence product category (e.g., “health food” versus regular food items), price point, and consumer choice. DataWeave’s nutrition fact table reader scans nutrition information on packaging, capturing details such as calorie count, macronutrient distribution (proteins, fats, carbohydrates), vitamins, and minerals.
The solution ensures items with similar nutritional profiles are correctly identified and grouped based on specific dietary requirements or preferences. This helps with price comparisons and enables eCommerce stores to maintain a reliable database of product information and build trust among health-conscious consumers.
Moving forward, breakthroughs in generative AI and LLMs have fueled substantial innovation, which has enabled us to introduce powerful new capabilities for our customers.
These include:
In our ongoing series, we will dive deep into each of these capabilities, exploring how DataWeave leverages cutting-edge AI technologies like Generative AI and LLMs to solve complex challenges for retailers and brands.
In the meantime, talk to us to learn more!
Thank you for Subscribing - Team DataWeave
For accounts configured with Google ID, use Google login on top. For accounts using SSO Services, use the button marked "Single Sign-on".