Most of us use them every day. Google, Yahoo!, Ask Jeeves, and Lycos are just a few of the search engines and portals that help us navigate the Web. They are gateways to news, research, videos, and a host of other information. But, the technology behind these search and navigation tools is being used to tame more than just the Web; it's also being used to find useful information within the enterprise.
LDAP And XML Make Enterprise Search Better
While Web search engines have improved navigation on the Internet, applying this functionality to the enterprise isn't as easy. "On the Internet we're all equal citizens," says Prabhakar Raghavan, VP and CTO of Verity (Sunnyvale, CA), an infrastructure software provider. "In the enterprise the rules are different. There are various levels of data security that need to be enforced as well as numerous data formats." For instance, just about everything that appears on the Internet is in HTML format. Most enterprises, on the other hand, contain data that is in any number of formats including ASCII (American Standard Code For Information Interchange), binary, and XML (extensible markup language).
Enterprises are addressing the first issue - security - by using LDAP (lightweight directory access protocol). "LDAP is like a telephone book that shows a list of users and the systems they are allowed to access," says James Speer, product manager for Jeeves Solutions (Emeryville, CA), a provider of natural language search and business intelligence solutions. "Search engines can use this protocol as a master control list and filter data based on user entitlement."
One of the ways enterprises are tackling the second issue - disparate data formats - is by using XML conversion. XML can be used to tag data sets much like the properties function within Microsoft Word allows users to define authorship, date of creation, and key words used in the document. "After the data sets are tagged using XML commands, disparate data sets can be joined through APIs [application program interfaces]," says Ron Kolb, director of technology strategy for Autonomy (San Francisco), a content management provider. APIs enable disparate applications and data sources to be joined at the most basic programming level - the business object level - rather than trying to connect directly to the database. "By connecting at the business object level, the search engine doesn't have to figure out how tables within a database are joined or deal with other business logic functions that would impair the search process," says Speer.
From Key Word To Key Concept
After enterprises modify their search engines to be able to meet their data security and formatting needs, essentially the same search engine technology used on the Internet can then be applied within the enterprise. One such technology that came about in 1995 from Ask Jeeves and has evolved over the years is natural language processing for enterprise search, a technology that allows users to interact with search tools similar to the way they would interact with other humans. For instance, a user who is looking for information about a valve might type: "Valve 2135 created after January 1999." The search engine can dissect the request and search the corporate Web site, intranet, and extranets for all documents, drawings, videos, and other data sources that match the user's request. "Unlike earlier search engine technology that forced users to use techniques such as Boolean logic and conform to specific query languages, natural language processing is making the search process more efficient," says Speer.
While many search engines can offer natural language processing functionality, not all use the same method of translating language and searching for answers. For instance, some vendors may use semantic parsing of language and others may use mathematical pattern matching.
The first group relies heavily on dictionary and thesaurus support, enabling it to match words with their root meaning as well as to comprehend word synonyms. This enables a person searching for information about a car to see results that also include "automobile." Another benefit to this technology is that it uses a technique known as tokenization to help hone in on search criteria. Tokenization recognizes key words separated by blank spaces and determines which words should be grouped together as phrases. Without this feature, a person looking for running shoes at Niketown.com would get thousands of hits on "running" and thousands more on "shoes" that have nothing to do with each other.
Another type of technology that can make search engines more powerful is known as Bayesian Inferencing or mathematical pattern matching. In this approach, language is rendered into a binary-formatted search string, which is then processed through an algorithm. The algorithm assigns certain values to the individual parts of the word search, eliminating frequently repeated words. Unlike some of the other search technologies, this method enables search engines to retrieve information based on word context as opposed to only the individual words within a phrase or sentence.
Connecting The Enterprise Through Search Technology
Ron Kolb shares a real-world example of a British aerospace company that put search engine technology to good use. "One division of the company researched rocket fin design and another side developed airplane wings," recalls Kolb. "By integrating their data sources and using search engine technology, both divisions were able to collaborate on crossover technologies such as airflow, drag, and lift. By pooling their resources, they were able to save thousands of research hours and millions of dollars." Some sources estimate that it costs companies between $25 and $100 per document to manually categorize documents throughout their lifecycle. By using search engine technology, however, this process can be automated, thereby eliminating this expense. Some search vendors report that companies that use their solutions realize a return on their investment in as short as a two-month time period. Now that's a subject worth searching.