The Next Big Thing in Intranet Search? Machine Learning – Part Two

machine learning intranet search

Welcome to Part Two about the future of machine learning. Find Part One here.

In Part 1, we talked about the differences between current, or ‘classical search’ and ‘future search’ and delved deeper into what we can look forward to soon.

In this next part, we discover machine learning and what that means for intranet search.

Intranet search is often designed to do very specific things, but search queries tend to go beyond what the search engine is capable of. Oftentimes, there’s a gulf between how it’s designed to work and people’s expectations. However, that’s changing as we see new changes forthcoming.  

Machine learning is an integral part of the future of search but it’s important to understand what that really means.  

AI is misconstrued as a magic bullet that can be added to any system and everything suddenly works better. But it’s more complicated than that. 

The conceptual architecture of future search is highly modular. One that relates to an application or set of applications that can do a lot of different, distinct things. It has some sort of description of this functionality and how it fits together, like a domain or a world model.  

Classical search functionality exists within that specification. These modules individually aren’t so revolutionary, not requiring much intelligence so far. Just a lot of specification. In fact, this is just what would have once been called an expert system 

How does Machine Learning fit in? There is no such thing as general-purpose ML (yet). Instead, ML algorithms are specific solutions to specific classes of problems.  

In future search, ML configures the system in real time. It decides which modules are activated for a given request by deducing goals. The primary means for this is natural language processing executed on input, but also includes other methods like data sources, patterns of behaviour, or biometrics. 

We ended Part One by covering episodic memory of previous interactions, so your search queries are connected instead of considered new tasks. We continue from there and go further into how future search will learn better. 

If future search is all about deducing goals, then the more help it gets doing that, the more capable the system will be. It will have what could be called a “world model” — a data model of things that aren’t directly relevant to search but can be used to deduce goals more accurately.  

Consider a simple request for “interesting articles”. Currently, the word “interesting” doesn’t mean anything, it’s just another keyword to be found in content. A search engine with a world model could have access to data with which it can deduce what’s interesting based on viewing statistics or engagements.  

If the world model includes data about you personally then it could look at your viewing statistics, giving it a pretty good idea of what is interesting to you. Or it could use various definitions of “interesting” and combine them to get a result set that balances your interests with that of other people.  

Because the data involved isn’t directly relevant to any given search, just things that might be relevant, this is a big-data problem and dovetails with other technologies; like the Internet of Things (IoT) where there’s a network of devices connected and interacting in real-time, such as the current location of a tracked delivery. Or, Data Warehousing where large amount of data is captured and stored efficiently.  

How are these examples related to search? They’re involved in capturing a lot of data about a lot of things without knowing beforehand what the data is for. But in the context of searching, all that information is imperative to better search engine results. 

If future search is all about deducing goals, then the more help it gets doing that, the more capable the system will be.

Prediction & Creativity

We already covered the idea that goals can be unstated. A search engine could deduce an actual goal from a stated one. Or deduce secondary goals. If the search engine can track your actions in the application (made possible by the domain concept from above) then it can find patterns in your actions, given the right machine learning and analysis capability, and predict your intended goal with no direct input from you. At the point where you perform a search, the search engine could have already predicted what you want to do and take you there immediately or offer a shortcut. 

Where ‘classical search’ is limited to returning things that exist somewhere, ‘future search’ could go the extra step — Such as, what if the search engine could create things on the fly in response to a request? You could say “Make a spreadsheet of everyone’s email address” and the search engine would interpret your end-goal, retrieve the data, and construct the spreadsheet there and then.  

Learning From Mistakes

There are many opportunities for future search to learn from its mistakes. Ultimately only you can determine if a goal was achieved. One option is that you could directly assert when a mistake was made (“That’s not what I want”). Or the search engine decides success or failure from our behaviour by click patterns or linger times.  

But, once we get into the world of Machine Learning, we must be aware of errors. If we aren’t careful, the search engine could suffer from a form of Dunning-Kruger effect in which it makes mistakes but because of some architectural limitation it can never learn from them. This can be countered with an in-built understanding of the way it could be making mistakes. A deeper look at the possible pitfalls of ML, however, goes beyond the scope of this post, for now it’s important to realise that implementing ML often requires a shift in the design of the features you want to add ML to, possibly even changing user-experience itself. 

We must be aware of errors. If we aren’t careful, the search engine could suffer from a form of Dunning-Kruger effect of which it can never learn from.

We must be aware of errors. If we aren’t careful, the search engine could suffer from a form of Dunning-Kruger effect.

It’s worth noting that a lot of the things mentioned here and in Part One, aren’t revolutionary and are already seen through Google and Bing, along with social media search engines like Facebook and LinkedIn, though it might not be obvious to everyone. The logical next step is the intranet. 

There are a lot of logistical considerations; like getting large amounts of data into a usable world model, or how to represent a search engine’s episodic memory, or how you specify functionality in an application domain. Spotting patterns of various types are doable with widely available ML technology. The most technologically difficult part of this puzzle is interpreting intention from some natural language input.

Understanding human language is a difficult problem to interpret even for people, let alone machines. Especially when the inputs are short and terse. NLP technology is a kind of keystone that everything else mentioned above rests on and is the magic ingredient that fundamentally distinguishes a modern digital assistant (like Alexa, Siri, and soon to be Oak) from, well, a basic intranet. 

Want more? Why not try
Want to see how Oak’s features will help solve your intranet woes?
Marc Hall
Author:
Marc Hall has been at Oak since 2013, and is the Senior Data Engineer and Developer. He's involved in Machine Learning and applies it to future intranet development. Marc enjoys gaming, reading, and Muay Thai.