Have you heard the estimate about how much money Amazon could stand to lose if it were just a second slower? 1.6 billion dollars – that’s the number. It’s been echoed at numerous e-commerce conference since 2012, yet the race to improve search speeds in travel hasn’t even begun. The problem isn’t the lack of innovation, but that interstitials have become the norm for travel websites ever since Expedia set the stage for them in the late nineties.
In the two decades that have passed since then, travelers have grown from a world of patiently pre-planned trips into one where we expect our taxis at the drop of a hat and our search engines to sift through trillions of pages in milliseconds. A seven second wait for a flight search just didn’t seem good enough.
Data from Musafir.com showed that a quarter of trips in the UAE were now being booked less than a week ahead of a travel plan and over half of trips were being booked less than three weeks ahead. Furthermore, over 75% of searches came from faceless API engines, where there was little room for gimmicks.
We kicked off a pilot program in the spring of 2017 with the aim of returning at least 50% of our flight search results in under a second – all while maintaining our conversion KPIs. We began by looking at how long we retained our cached search results. Our system had been designed to be dynamic for several years, in the sense that our cache duration increased as the lead time to travel increased, but our average cache hit ratio still hovered in the low twenties. In simpler terms, it meant that three out of four times, our users had to wait a whole seven seconds to see their flight search results.
We wanted to push the envelope and double our cache hit ratio while we targeted sub-second search, but it didn’t come as easily as we’d expected. A handful of airlines had begun surge-pricing their flights and the ensuing price volatility made it impossible to maintain our conversion KPIs. After experimenting with several options, we eventually found a formula that struck a good balance between volatility and doubling our cache hit ratio. It may not have been perfect, but it did the job.
Going beyond the formula
Our second insight was recognizing that we needed to reward our best performing customers with fresher search results. By tiering our customer segments and assigning different cache formulas to different tiers, we began to offer fresher results to customers who had maintained the best look-to-book ratios and progressively more cached results to customers who had not.
If you aren’t familiar with the term, a travel website’s look-to-book ratio is a simple measure of how many searches it performs for every seat it sells. The lower it gets, the better off you are, but on the flip side, if you run too high, you’ll very likely be penalized. Now, I’ll be frank that tiering did increase our price volatility with some API customers who had cached our data, but with our overall look-to-book ratio teetering on the edge of a penalty, tiering helped rein things in and helped us breathe a sigh of relief.
The third technique we employed was a proactive replenish – and it was my personal favorite for its sheer simplicity. Instead of discarding cached search results immediately after their time-to-live had expired, we set up a hook to trigger a background search if a cached search result was retrieved in the last few minutes of its time-to-live. The placement of the hooks ensured that only frequently hit searches were replenished and the window of a few minutes during which the hook operated was, of course, optimized using our dynamic cache mechanism. In effect, if we were to see a steady stream of searches for a particular route, the cache hit ratio would be virtually 100%.
The hidden culprits
Our target was almost in sight, but we were still running a little higher than a second, so we shifted our attention to how we stored our data. The culprit? Our cache store itself was sluggish. Now it may not seem like much, but migrating from a conventional database to Redis’ purpose-built NoSQL store helped us to improve our cache IO performance by up to 60% and bought us a few dozen milliseconds. It wasn’t enough to bring us below a second, but it’s worth noting that we achieved over half of this improvement by compressing our search results down from a megabyte before they were pushed to cache.
In the end, we were only able to drop below a second by fixing an oversight in our parallel processing. You see, we had employed parallel processing since 2009, but we’d taken a few jobs, such as our flight pricing and re-pricing routines for granted as we’d grown – and boy, did they come back to haunt us. After a week of toiling, we were finally able to deliver more than half of our searches at sub-second speed.
The last mile
Our next goal is to aim for 85% sub-second search speed by experimenting with discrete caching and predictive replenish – which is different from proactive replenish. You can read more about our efforts and see the results I’d presented at the Travel Tech Middle East 2017 conference on SlideShare.