SHARE
product matching blog banner
BLOGHow to Classify,...

How to Classify, Match Products With Machine Learning

Billions of products are sold online, and there are numerous stores selling them. Identifying and matching particular products for various purposes like price comparison, becomes a challenge as there are no obvious global unique identifiers. This is where AI and Machine Learning come in.

So Many Products and No Way to Match Them Across Stores

There are many situations where accurately identifying a product match is essential. Stores might want to compare competitor prices for the exact same products. Customers use price comparison tools to get the best deals. A store like Amazon that allows different sellers to offer the same products wants to be sure that they are the same products before listing the sellers in a single, unique product page.

How do we approach this?

The Sea of Confusion

Product titles/descriptions do not have a standardized format. Each store and different sellers within a store(say, eBay) might have a different title and description for the same product. Attribute listings also differ in format, images for the same product are also very different.

Of course, there are standardized unique identifiers like UPC, MPN, GTIN etc. However, these may not be mentioned in the product page in all stores selling them. The attributes themselves might be described differently – for instance 7” and 7 inch.

Images may be included, but they can differ in perspective, clarity, tone etc. The Brand Name may also be referred to in different ways – GE and General Electric.

product name

It is impossible for a human to visit different seller pages and check the product pages to ensure that they are matching the same products. If the process is to be automated, how does that system make sense of it all? Well, the big guns have got it all figured out, and we’re going to let you in on the secret of how it’s done.

AI To The Rescue – Machine Learning for Product Match

In machine learning solutions for product matching first, the solution provider has to build a database of billions of products. This is done by collecting information through web crawls and feeds.

The system then has to come up with a universal taxonomy. This is a challenge because different retailers use different classifications for their products, and the same product might be listed in more than one category. For instance, a particular shoe model might be listed under sports shoe and under men’s walking shoe. The Product Match system first has to design a standardized taxonomy, irrespective of how a particular store classifies its products.

There are standard classification models like Google Taxonomy, GS1, and Amazon. However, a product match solution might devise its own taxonomy. This universal taxonomy is designed by identifying patterns and signals from titles, breadcrumbs where available, product descriptions and attributes, and from images.

Once the taxonomy is in place, then comes particular product matches. Here, there is a need for precise comparisons to ensure that it is the same unique product, despite differences in titles, images, descriptions etc.

First, there is a search for UPC, GTIN or other unique identifiers on the product page. Then, the product title has to be compared. No two product titles are the same across different stores for the same product, for example:

Google Pixel 2 GSM/CDMA Google Unlocked (Clearly White, 64GB, US warranty) – Amazon.com:

Google pixel 2 white - Amazon

 

Google Pixel 2 64GB Clearly White (Unlocked) Smartphone – eBay.com:

Google Pixel 2 ebay

 

Neural networks and deep learning techniques are used to identify and learn from similarities, to identify and learn from differences, and to create word-level embedding to create a system of representation for common words. This involves teaching the system to identify different references to a unique entity – ‘hp’ and Hewlett Packard, 7” and 7 Inch and so on, to come up with one unique representation for each entity.

A product can be identified using its title, its description, images, and attributes or specifications list. In many cases, the product title itself will yield a lot of information and the system has to learn to sort the product name (for instance brand model) from the attributes.

Samsung Galaxy Note 8 (US Version) Factory Unlocked Phone 64GB – Midnight Black (Certified Refurbished)

Samsung Galaxy Note 8 is the phone model, and the title provides additional information like the memory size, US version, Factory Unlocked Refurbished, etc.

All this information has to be extracted and sorted and put into the appropriate slots – Phone model, version, memory size, etc. Different techniques might be used to help the system learn to parse and classify the sets of information.

The next comparison might be the same product with more information in the form of tags in the title, description that contains memory and screen size information, and a specs table. These help add more knowledge about the product, and the machine will be better able to identify an exact product match or mismatch in the next comparison.

The standard identifying signals are similar results or positive matches for unique identification numbers (UPC or MPN), Classification, Brand, Title, Attributes, and Image.

For each comparison, the system goes through many steps, checks or safety valves. There is a search for a unique identification number, a test for keyword similarities, brand normalization and match (for example, HP is the same as Hewlett Packard), Attribute normalization and match (7” is the same as 7 inch, 7 in., 7 inches), image matching, etc. There is also a check for variation in attributes:

Apple iPhone 8 Plus 5.5″, 64GB, Fully Unlocked, Gold:

iphone 8 plus 5.5 inch

 

Apple iPhone 8 4.7″, 64GB, Fully Unlocked, Gold:

iphone 8 plus 4.7 inch

For the best product match result, there has to be at least 99% of positive results. Otherwise, it is a mismatch, even if it is a variation within what is essentially the same product (iPhone 8 5.5” and iPhone 8 4.7”)

This is a complicated process, and different product match solutions may employ different techniques and training methods. But the advantage is that neural networks and machine learning systems learn over time, even from their mistakes, and so get better with each use.

Find the product you want . Get instant access to the worlds largest GTIN, UPC and EAN product database with our product matching solution. Get a Demo.

 

References:

http://www.intelligencenode.com/solutions/product-matching/

https://medium.com/walmartlabs/product-matching-in-ecommerce-4f19b6aebaca

https://www.semantics3.com/blog/product-matching-you-ve-not-heard-of-it-but-its-powering-your-price-comparison-engine-5a940b1a4a52

https://www.indix.com/blog/product-information/using-al-ml-for-product-data-classification-at-indix/

https://www.indix.com/blog/product-information/use-of-machine-learning-for-product-matching-at-indix/

css.php