How Accurate Are AI Chatbots at Web Search? 5 Simple Questions to Test Their Search Savvy
Yuveganlife Post

How Accurate Are AI Chatbots at Web Search? 5 Simple Questions to Test Their Search Savvy

 by Yuveganlife, December 26, 2023
[ Also See Updated Version: 5 Simple Questions to Test Generative AI Chatbots' Web Search Accuracy, Updated Version ]

Introduction

The rise of AI-powered web search has been gathering momentum since the release of ChatGPT-3.5 in November 2022, especially the release of ChatGPT-4 in March 2023, leading to increased interest and adoption in this exciting new technology.

With the end of 2023 approaching, this is a great time to test the advancements of the many generative AI chatbots based on various LLMs on the web and see who can provide more accurate answers and make our web searches more convenient and relevant than traditional keyword-based web searches.

Evaluation Goals

Our evaluation goals are to assess the chatbots' ability to:
  • accurately understand user questions
  • retrieve relevant, up-to-date, and accurate information from the web
  • provide correct and proper responses through inference and reasoning
Furthermore, we aim to measure the frequency of chatbot hallucinations, which refer to instances of providing inaccurate or fake responses. These evaluation goals will help us determine which chatbots are more effective and intelligent at providing accurate and useful information to users.

Criteria for the Test Candidates

We used the following criteria to select the AI-powered chatbot candidates for testing:
  • They had to be generative, meaning they could understand and generate responses in natural language
  • They had to have a built-in search feature
  • They had to be free and widely accessible, so that a broad range of users could test them and provide feedback
These criteria allowed us to narrow our focus to chatbots that were most relevant to our research goals, while also ensuring that our testing was inclusive and representative. We focus solely on free generative AI chatbots for our testing of everyday web search tasks because: (1) they're accessible to everyone, similar to using Google or other search engines, which have been free for a long time; and (2) as a new technology, free access allows more people to try the products, which can help accelerate their advancement and refinement.

Selected AI-Powered Chatbots

Based on the above criteria, the following 15 chatbot models from 9 websites were selected for testing:
  • Bing Chat (free ChatGPT-4)
  • Bing Copilot (free ChatGPT-4)
  • iAsk
  • Komo
  • Perplexity (ChatGPT-3.5 and Free ChatGPT-4 with Copilot: 5 queries every 4 hours)
  • Phind (Phind V9 and Free ChatGPT-4: 10 queries per day)
  • Pi
  • Poe (Claude-instant; Gemini-Pro; Llama-2-70b; GPT-3.5-Turbo-Instruct; Google PaLM)
  • You (Smart)
The links to these chatbots can be founded at here.

Evaluation Scope

By focusing our evaluation on 5 simple factual questions requiring inference, we can establish a controlled and reliable baseline for assessing chatbot performance and identifying areas for improvement.

Yuveganlife.com, a new website launched in the summer of 2023, is relatively unknown compared to Eat Just, which has established itself over the past 10 years. However, despite its obscurity, Yuveganlife.com is well-indexed by search engines, while Eat Just's plant-based and cell-based food products are well-known around the world.

This comparison offers a unique opportunity to evaluate the chatbot's ability to accurately search for and infer information based on newly updated web data, while also testing for potential errors in the LLM training process.

We don't provide any feedback to the chatbots no matter if the answer is expected or not when asking these 5 questions in the testing process.

The multiple LLMs from the same website (such Poe.com) were grouped using two different user accounts in two different browsers for the testing.

Limitation of Keyword-Based Search Engines

While these questions are simple enough and could be easily handled by a well-designed chatbot, they may not be able to be answered by a traditional keyword-based search engine.

For example, when asking "Is Yuveganlife.com a vegan NPO?" to Google and Bing, both search engines simply returned some web pages that contained some of the keywords, but did not provide a clear answer to the question.

We can see from Google search cannot get relevant information:
screenshot for Google search result


We can see from Bing search cannot get relevant information too:
screenshot for Bing search result

Test Execution Summary

Ask 5 simple questions to each chatbot:
  • Q1: Is Yuveganlife.com a vegan NPO?
  • Q2: When was yuveganlife.com established?
  • Q3: How many recipe websites and blogs are listed on yuveganlife.com?
  • Q4: Where is the yuveganlife.com based?
  • Q5: Does Eat Just manufacture Just Meat, a plant-based meat substitute made from pea protein?
It will receive three points if the answer is correct, and zero points if the answer is totally wrong.

Tap the table headers to sort columns
Swipe left for more content
Chatbot Q1: What Q2: When Q3: How many Q4: Where Q5: Yes/No Total Score
Bing Chat 2 0 0 0 3 5
Bing Copilot 2 0 0 0 3 5
iAsk 3 0 / h 0 0 / h 0 / h 3
Komo 3 3 2 3 0 / h 11
Perplexity (ChatGPT-3.5) 3 3 2 3 3 14
Perplexity (ChatGPT-4) 3 0 2 0 3 8
Phind (Phind V9) 3 3 1 0 1 / h 8
Phind (ChatGPT-4) 3 3 1 3 3 13
Pi 3 0 / h 2 0 / h 0 / h 5
Poe (Claude-instant) 3 0 / h 0 / h 0 3 6
Poe (Gemini-Pro) 3 0 0 0 0 / h 3
Poe (Llama-2-70b) 3 3 0 0 / h 1 / h 7
Poe (GPT-3.5-Turbo-Instruct) 0 3 0 0 / h 0 / h 3
Poe (Google PaLM) 3 0 0 0 / h 3 6
You (Smart) 3 3 0 0 3 9
Note: Pi and Poe do not provide web search result references. " / h" means generated a hallucination.

Q1: Is Yuveganlife.com a vegan NPO?

There is no explicit statement on the Yuveganlife.com website or its social media, such as LinkedIn, that addresses this question.

Although it mentioned "To ensure the platform's sustainable operation, ..., we may seek funding through grants, donation, or other channels." on the About page, this statement alone is not sufficient to infer that Yuveganlife.com is a non-profit organization.

The answers provided by AI chatbots were as follows:
  • Bing Chat (free ChatGPT-4): Correct. But provided 2 search results from Yuveganlife.com's facebook pages, which are not the most relevant. Score: 2 screenshot
  • Bing Copilot (free ChatGPT-4): Correct. Same as Bing Chat. Score: 2 screenshot
  • iAsk: Correct. Provided 1 Yuveganlife.com page as search result; also provided 3 major Registered NPO Lookup websites. Score: 3 screenshot
  • Komo: Correct. Provided 3 Yuveganlife.com pages and 1 Linkedin page as search result. Score: 3 screenshot
  • Perplexity (ChatGPT-3.5): Correct. Provided 4 Yuveganlife.com pages and 1 Linkedin page as search result. Score: 3 screenshot
  • Perplexity (free ChatGPT-4): Correct. Provided 6 Yuveganlife.com pages and 2 Linkedin pages as search result.Score: 3 screenshot
  • Phind (Phind V9): Correct. Provided 1 Yuveganlife.com page and 1 official Registered NPO Lookup website as search result. Score: 3 screenshot
  • Phind (free ChatGPT-4): Correct. Provided 1 Yuveganlife.com page and 1 Yuveganlife.com Linkedin page as search result. Score: 3 screenshot
  • Pi: Correct. Score: 3 screenshot
  • Poe (Claude-instant): Correct. Five reasons were provided after analyzing the website. Score: 3 screenshot
  • Poe (Gemini-Pro): Correct. Score: 3 screenshot
  • Poe (Llama-2-70b): Correct. Score: 3 screenshot
  • Poe (ChatGPT-3.5-Turbo-Instruct): Wrong. Score: 0 screenshot
  • Poe (Google PaLM): Correct. Score: 3 screenshot
  • You (Smart): Correct. Provided 2 Yuveganlife.com pages as search result. Score: 3 screenshot

Q2: When was yuveganlife.com established?

Although the website does not explicitly provide this information, the About page mentions that the online directory platform was officially launched in the summer of 2023, which provides a clue to infer its established year which was 2023.

  • Bing Chat (free ChatGPT-4): Wrong. Provided 4 unrelated search results. Score: 0 screenshot
  • Bing Copilot (free ChatGPT-4): Wrong. Provided 1 search result from Youveganlife.com's facebook page. Score: 0 screenshot
  • iAsk: Wrong. Provided many Yuveganlife.com pages as search result. Generated a hallucination that concluded the established year as 2010. Score: 0 screenshot 1 and screenshot 2
  • Komo: Correct. Provided 4 Yuveganlife.com pages as search result. Score: 3 screenshot
  • Perplexity (ChatGPT-3.5): Correct. Provided 5 Yuveganlife.com pages and 1 Linkedin page as search result. Score: 3 screenshot
  • Perplexity (free ChatGPT-4): Cannot answer. Provided 8 Yuveganlife.com pages which are not related to the question. Score: 0 screenshot
  • Phind (Phind V9): Correct. Provided 1 Yuveganlife.com page and 1 official Registered NPO Lookup website as search result. Score: 3 screenshot
  • Phind (free ChatGPT-4): Correct. Provided 4 Yuveganlife.com pages. Score: 3 screenshot
  • Pi: Wrong. Generated a hallucination that concluded the launched year as 2022. Score: 0 screenshot
  • Poe (Claude-instant): Wrong. Five reasons were provided after analyzing the website and public domain name lookup. But got a hallucination on domain name registration date: 2020. The reasoning process is not efficient. Score: 0 screenshot
  • Poe (Gemini-Pro): Cannot answer. Score: 0 screenshot
  • Poe (Llama-2-70b): Correct. Score: 3 screenshot
  • Poe (ChatGPT-3.5-Turbo-Instruct): Correct. Score: 3 screenshot
  • Poe (Google PaLM): Cannot answer. Score: 0 screenshot
  • You (Smart): Correct. Provided 2 Yuveganlife.com pages as search result. Score: 3 screenshot

Q3: How many recipe websites and blogs are listed on yuveganlife.com?

Yuveganlife.com has a dedicated vegan recipe websites and blogs list resource page, which has been constantly updated over time.

There were three website update posts which mentioned the number of recipe sites/blogs included. The first update mentioned 116 sites/blogs, the second update mentioned 180+, and the third update mentioned 225+.

The candidate will receive three points if they provide the most recent numbers on the page listing recipe websites and blogs based on a real-time search, two points if they use the information from the third update (225+), and zero points if they cannot provide an accurate answer.

  • Bing Chat (free ChatGPT-4): Cannot answer. Provided 3 unrelated search results. Score: 0 screenshot
  • Bing Copilot (free ChatGPT-4): Cannot answer. Provided 3 unrelated search results. Score: 0 screenshot
  • iAsk: Cannot answer. Provided 3 unrelated search results. Score: 0 screenshot
  • Komo: Mostly Correct. Provided 3 Yuveganlife.com pages and 1 unrelated website as search result. It found the number of 225+ from the 3rd update post. Score: 2 screenshot
  • Perplexity (ChatGPT-3.5): Mostly Correct. Provided 6 Yuveganlife.com pages and 2 Linkedin pages as search result. It found the number of 225+ from the 3rd update post. Score: 2 screenshot
  • Perplexity (free ChatGPT-4): Mostly Correct. Provided 8 Yuveganlife.com pages. It found the number of 225+ from the 3rd update post. Score: 2 screenshot
  • Phind (Phind V9): Less Correct. It found the number of 185+ from Yuveganlife.com Linkedin Post page. Score: 1 screenshot
  • Phind (free ChatGPT-4): Less Correct. It found the number of 185+ from Yuveganlife.com Linkedin Post page. Score: 1 screenshot
  • Pi: Mostly Correct. It found the number of 225+ from the 3rd update post. Score: 2 screenshot
  • Poe (Claude-instant): Cannot answer. Five reasons were provided after analyzing the website without effective inference. Also got a hallucination on Point 2 Under the "Vegan Resources" page, one of the categories is "Vegan Recipes Sites and Blogs"; and Point 3: Upon clicking on this, it states "See all resources". Score: 0 screenshot
  • Poe (Gemini-Pro): Cannot answer. Score: 0 screenshot
  • Poe (Llama-2-70b): Cannot answer. Score: 0 screenshot
  • Poe (ChatGPT-3.5-Turbo-Instruct): Cannot answer. Score: 0 screenshot
  • Poe (Google PaLM): Cannot answer. Score: 0 screenshot
  • You (Smart): Cannot answer. Lookup to 6 un-relavent web sites. Score: 0 screenshot

Q4: Where is the yuveganlife.com based?

Although the website does not explicitly provide this information, the About page mentions that the online directory platform was "Crafted with love in Beautiful British Columbia", which provides a clue to infer it is based in British Columbia, Canada. The similar information can also be found on Yuveganlife's Linkedin About page: "Primary Headquarters: Greater Vancouver, Canada".

The candidate will receive three points if it can conclude Yuveganlife is based in either "BC, Canada" or "Greater Vancouver, Canada".

  • Bing Chat (free ChatGPT-4): Cannot answer. Returned 3 Yuveganlife.com's facebook links as search results. Score: 0 screenshot
  • Bing Copilot (free ChatGPT-4): Cannot answer. Returned 3 Yuveganlife.com's facebook links as search results. Score: 0 screenshot
  • iAsk: Wrong answer. Returned 1 Yuveganlife.com web page and 3 unrelated search results as Authoritative Reference. Generated a hallucination that concluded Yuveganlife.com is based in the USA. Score: 0 screenshot
  • Komo: Correct. Returned 2 Yuveganlife.com pages and 2 Yuveganlife's Linkedin Post pages as search results. Use BC as based location. Score: 3 screenshot
  • Perplexity (ChatGPT-3.5): Correct. Returned 2 Yuveganlife.com pages and other pages. Found the result on About us page. Score: 3 screenshot
  • Perplexity (free ChatGPT-4): Cannot answer. Provided 1 unrelated search result. Wrong inference that concluded Yuveganlife.com may have association with NJ USA. Also suggest user to check out Yuveganllife.com's About page to find out the answer. Score: 0 screenshot
  • Perplexity (free ChatGPT-4): Conducted a followup testing. Provided correct answer when asked "Please check for "About us" page on yuvegnalife.com to find out Where it is based?". No score. screenshot
  • Phind (Phind V9): Cannot answer. Returned 1 unrelated Yuveganlife.com web page with un-relevant inference. Score: 0 screenshot
  • Phind (free ChatGPT-4): Correct. Returned 5 Yuveganlife.com pages and found BC, Canada on About Us Page. Score: 3 screenshot
  • Pi: Wrong answer. Generated a hallucination that concluded Yuveganlife.com is based in Campbell, California, USA. Score: 0 screenshot
  • Poe (Claude-instant): Cannot answer. Six reasons were provided after analyzing the website without correct inference. Score: 0 screenshot
  • Poe (Gemini-Pro): Cannot answer. When asked "Tell me more", its inference logic was not effective. Score: 0 screenshot 1 and screenshot 2
  • Poe (Llama-2-70b): Wrong answer. Generated a hallucination that concluded Yuveganlife.com is based in the USA. Score: 0 screenshot
  • Poe (ChatGPT-3.5-Turbo-Instruct): Wrong answer. Generated a hallucination that concluded Yuveganlife.com is based in the USA. Score: 0 screenshot 1 and screenshot 2
  • Poe (Google PaLM): Wrong answer. First answer pointed to About page. After ask to tell more, generated a hallucination that concluded Yuveganlife.com is based in Anytown, CA, USA on About page. Score: 0 screenshot
  • You (Smart): Cannot answer. Lookup to 3 Yuveganlife.com's Linkedin pages. Score: 0 screenshot

Q5: Does Eat Just manufacture Just Meat, a plant-based meat substitute made from pea protein?

"Eat Just was founded in 2011 by Josh Tetrick and Josh Balk. In July 2017, it started selling a substitute for scrambled eggs called Just Egg that is made from mung beans. It released a frozen version in January 2020. In December 2020, the Government of Singapore approved cultivated meat created by Eat Just, branded as GOOD Meat. A restaurant in Singapore called 1880 became the first place to sell Eat Just's cultured meat." -- source from Wikipedia.

" Beyond Meat, Inc. is a Los Angeles–based producer of plant-based meat substitutes founded in 2009 by Ethan Brown. The company's initial products were launched in the United States in 2012... The burgers are made from pea protein isolates... " -- source from Wikipedia.

Both Eat Just and Beyond Meat's revolutionary food products were reported widely around the world and should be included in each LLMs' training dataset.

This question was designed to ask a yes/no question about an imagined product brand name, Just Meat, which was mixed with Eat Just's cultivated meat product and Beyond Meat's plant-based product, in order to test the LLM hallucination issue.

The candidate will receive three points if they correctly identify that no such information exists, and zero points if they provide incorrect information based on a hallucination.

  • Bing Chat (free ChatGPT-4): Correct. Score: 3 screenshot
  • Bing Copilot (free ChatGPT-4): Correct. Score: 3 screenshot
  • iAsk: Wrong info based on a hallucination. Score: 0 screenshot
  • Komo: Wrong with hallucination. Score: 0 screenshot
  • Perplexity (ChatGPT-3.5): Correct. Score: 3 screenshot
  • Perplexity (free ChatGPT-4): Correct. Score: 3 screenshot
  • Phind (Phind V9): Mostly Wrong. Generated a hallucination that Just Meat is owned by Good Catch. Score: 1 screenshot
  • Phind (free ChatGPT-4): Correct. Score: 3 screenshot
  • Pi: Wrong with hallucination. Score: 0 screenshot
  • Poe (Claude-instant): Correct. Five reasons were provided. Score: 3 screenshot
  • Poe (Gemini-Pro): Wrong with hallucination. Score: 0 screenshot
  • Poe (Llama-2-70b): Mostly Wrong. Wrong reasoning. Generated a hallucination that mistook Just Meat for Beyond Meat. Score: 1 screenshot 1 and screenshot 2
  • Poe (ChatGPT-3.5-Turbo-Instruct): Wrong with hallucination. Score: 0 screenshot
  • Poe (Google PaLM): Correct. Score: 3 screenshot
  • You (Smart): Correct. Score: 3 screenshot

Conclusion

Based on the results of our testing, we found that this free chatbot: "Perplexity (ChatGPT-3.5)" was the most effective in providing accurate responses to our 5 test questions, demonstrating its ability to understand user queries and retrieve relevant and reliable information. Its response speed is also fast.

There are additional observations from the test result:
  • Top 3 chatbots founded with scores: Perplexity (ChatGPT-3.5): 14/15, Phind (free ChatGPT-4: 10 queries per day): 13/15, Komo: 11/15
  • Overall accuracy of 15 AI chatbots at web search was 47.11%
  • Out of the 15 chatbots tested, 60% (9 chatbots) were found to generate 16 instances of hallucinations out of 75 total answers, accounting for 21.3% of all the questions asked.
  • There were 5 chatbots (33% of tested bots) that generated hallucinations for two to three questions: Poe (Claude-instant), Poe (Llama-2-70b), Poe (GPT-3.5-Turbo-Instruct), iAsk, Pi
  • For Question 1 (Is Yuveganlife.com a vegan NPO), there were 14 of 15 chatbots could give the correct answer.
  • For Question 2 (When was Yuveganlife.com established), there were 8 out of 15 chatbots that could not answer. 3 chatbots generated a hallucination: iAsk, Pi, and hallucination. 5 chatbots could not answer because they could not retrieve relevant web page "About us": Bing Chat, Bing Copilot, Perplexity (ChatGPT-4), Poe (Gemini-Pro), and Poe (Google PaLM).
  • For Question 3 that was designed to count the correct number on the list of the Recipe website/blog page, no chatbot could provide the exact number including the two bots ChatGPT-4 model. However, the ChatGPT-4 model from Open AI GPT PLus subscription plan can do the job.
  • For Question 4 (Where is the Yuveganlife.com based), there were only 3 of 15 chatbots that could provide correct answer. 5 chatbots generated a hallucination that point to the USA: iAsk, Pi, Poe (Llama-2-70b), Poe (GPT-3.5-Turbo-Instruct), Poe (Google PaLM)
  • For Question 5 that was solely designed for hallucination testing, there were 7 out of 15 chatbots that generated a hallucination: iAsk, Komo, Pi, Poe (Llama-2-70b), Poe (Gemini-Pro), Poe (GPT-3.5-Turbo-Instruct), Phind (Phind V9)
  • Why does Perplexity (ChatGPT-4) performance is much less than Perplexity (ChatGPT-3.5)? It could not find the When and Where question that was indicated on the About Page.
  • Google and Microsoft's chatbots can not get Yuveganlife.com's web pages although their search engines indexed the newly launched website well.
We are confident that chatbots' web search will become significantly more popular in 2024.

Disclaimer

This test study was designed and conducted independently by Yuveganlife.com, with no affiliation or involvement with any other organizations.

The test results for each chatbot are not indicative of each product's accuracy for other search performance, as they were only isolated to these five test questions.

About the Author

This test suite was authored by Bruce Yu, founder of Yuveganlife.com, as part of evaluating different chatbot's search capabilities to help automate and verify the recording of vegan companies or NPOs info in Yuveganlife.com's headless CRM system.

Bruce gained experience in evaluation of software products at Health Canada's eReview project, where he provided key technical support and advice for eReview Project Stream 1. He created technical requirements for the RFP, conducted applied COTS eCTD viewing tools setup and evaluated them against user requirements, and conducted quality verification for the interim and final candidate product. The project successfully provided technical infrastructure to replace the paper-based drug review process with an electronic review process at HC.