AINAD

Reinforcement Learning From Human Feedback Took Travel AI Tool To Near-Perfect Accuracy

By Stefan Klopp

In late 2022, just before the ChatGPT launch kicked off the current AI frenzy, our development team had the opportunity to experiment with the model. We were a travel publisher with no plans on becoming a tech company. But it was obvious to us that this technology could be used to plan and book trips in a much more efficient, enjoyable way.

Stefan Klopp of Matador Network
Stefan Klopp of Matador Network

Within months we launched an AI tool that travelers could message via WhatsApp. Its accuracy was about 85%. That might not sound terrible, but when roughly one of every six conversations includes miscommunication or hallucinations, what you have is a fun gadget, not game-changing tech.

Thanks to our travel media platform, we were able to attract a critical mass of users, which allowed us to improve performance through reinforcement learning from human feedback. Over the next 15 months, we were able to increase accuracy to 98%, which has enabled us to strike partnerships with major travel brands, win awards and draw in more than a million users. Here’s how we did it.

A helping human hand

It’s helpful when users tell AI when an answer is wrong, which is the simplest form of reinforcement learning. If someone asks for restaurant recommendations in the Pearl District in Portland and the AI includes a recommendation in the Hawthorne District, the user may point out the inaccuracy. But relying on direct user feedback isn’t enough.

We hired five people, most of whom speak multiple languages, to put reinforcement learning into high gear. To date they’ve monitored 1.5 million conversations between users and the AI. These agents catch subtle miscommunications. If a user asks for recommendations of the best kid-friendly resorts in Mexico, the AI might ask to specify the city thinking the user would like hotel rates. But they don’t know yet — they’re just looking for general information.

At this point the agent is able to intervene, manually taking over the conversation and getting it back on track. Then the agent flags and categorizes the issue for a backend fix, which improves the system for an entire category of questions.

Reframing the question

Sometimes inaccuracies are a result of the way the question is asked. To improve outcomes, we needed to improve the quality of the questions. We developed a system that categorizes and reframes questions before they are fed into the large language model. This process assures that we get the most from our extensive site indexing.

Questions about live events initially posed a challenge. A query like, “What are some events going on in Estes Park, Colorado, this weekend?” might find a page about events from two years ago that includes the phrase “this weekend,” causing a hallucination. But what is the user really asking? The timing of the question needs to be translated into a specific date, where “this weekend” becomes “Jan. 25-26, 2025.”

Another challenge is combining questions across multiple messages. Someone might ask for Airbnb recommendations in Vancouver, then follow up with “close to Yaletown.” The underlying question needs to roll in new elements as they are added — “Recommend Airbnbs in the Yaletown area of Vancouver.”

Ping the partner

Site indexing is essential. For in-depth knowledge and real-time information, you need partners and data sources you can ping behind the scenes. Once we improved the ability to accurately identify the intent of the user, we needed a network of plugins to get the data they were seeking for flight times, hotel pricing and exchange rates.

When a user asks a question, our AI categorizes it as a particular intent, sources the appropriate data, and feeds the result into the LLM to deliver the information in coherent, consistent and conversational language. There’s a lot more going on behind the scenes than the baseline ChatGPT, but the user experience is the same and responses are noticeably richer and more accurate.

Creating a plugin for every type of intent is intensive. As you work through it, it’s important to communicate to the user in a friendly way what your AI can’t yet do. A response from the AI saying, “I don’t yet have that capability,” provides a better user experience than a hallucination — and it’s a great way of maintaining accuracy while building out your product.


Stefan Klopp is the chief technology officer at Matador Network, a leading travel publisher and creator of the award-winning AI travel genius GuideGeek.

Illustration: Dom Guzman

m
Find us

86-90 Paul Street

London EC2A 4NE

 

+44(0)2033087666

Opening hours

Mon – Fri: 9am – 5pm