A web crawler (also known as a web spider or web robot) is a program or automated script which browses the World Wide Web in a methodical, automated manner.
Tangiblee's preferred method for collecting product data from your website is using our web crawler. Tangiblee's crawler is designed to simulate single-visitor activity to prevent any disruptions or performance issues with your website.
How the crawler works:
- Our bot will “scrape” data from your website by periodically crawling at an agreed upon frequency, from every 15 minutes to once a day.
- Our crawler will automatically parse the dimension data from each PDP, regardless of it’s format (text, numbers, units, etc`) or location in the page.
- The crawler reviews the product images and selects the one with the highest resolution and shooting angle that best showcases your product.
- The crawling runs completely separate from the JS snippet on the web pages used to show the actual Tangiblee UX. We do not crawl for any info upon loading a page and executing our JS snippet.
- Each crawling job for a retailer’s website is created once a day, this parameter can be modified to create one every 15 minutes if higher crawling is needed.
- Once a crawling job is created, it's added to the crawling queue of our platform, and executed as FIFO - first in first out. In most cases, the job starts immediately, as usually there is an available worker to execute the job.
- If needed there are additional parameters that can control the frequency of the page requests while a specific crawling job is running.
What does Tangiblee need from me to start crawling my website?
Depending on the security protocols in place on your website, Tangiblee usually does not need anything from you to begin crawling your website.
What if my website security protocols specifically restrict ‘bot’ traffic?
Tangiblee will provide our IP address for whitelisting on your website. Whitelisting Tangiblee’s IP address will provide access to only Tangiblee’s crawling bot while maintaining security protocol for “black bot” traffic.
Here is our crawler information for White-Listing:
User-Agent: TangibleeBot/220.127.116.11 (http://tangiblee.com/bot)
[.good]Tangiblee will never need or request access to any of your website’s original source files. This crawler will NOT impact load time nor will it create any vulnerabilities to your website or it’s performance.[.good]
What is Tangiblee collecting when crawling my website?
- Product SKU #
- Product Title
- Product URL
- Category URL
- Product Image URL
What if my images have backgrounds?
Tangiblee automatically removes the backgrounds from the image through our unique process. You do not need to provide Tangiblee with images with clear backgrounds.
How does Tangiblee know which product image to select?
Tangiblee is able to identify the specific image when crawling your website that will work with our solution.
What if I don’t allow Tangiblee to crawl my website?
Crawling is Tangiblee’s preferred method; however, if you wish for Tangiblee to not crawl your website explore your options here.