chapter4

Conversion in Progress

Chapter 4
The Concerns of Artificial Intelligence

Artificial Intelligence (AI) / Knowledge Based Systems (KBS) methods possess the following general qualities. They are; 1) heuristic in nature, 2) concerned primarily with satisficing, and 3) utilize symbolic and qualitative processing techniques. These methods are applied within a number of AI subdisciplines including; Expert Systems (ES), Image/Vision Processing, Knowledge Representation (KR) & Heuristics, Machine Learning (aka Artificial Neural Networks) , Natural Language Processing & Automatic Speech Recognition (NLP & ASR), and Robotics.Each of these topics is a field in its own right, with its own group of scientists, conferences, publications, books, and research centers. And often each discipline has its own specially built representation schemes, computers, and programming languages. As partial proof of this, the Encyclopedia of Artificial Intelligence [AI1] has hundreds of contributors and thousands of entries. An attempt to cover all these areas, even on a gross basis in this short chapter is without merit. To filter this coverage, the concerns to be addressed here will be those areas within the very large discipline of AI which can be most aptly applied to generic problem solving and decision making. These areas include knowledge representation and heuristics, expert systems, natural language processing, and machine learning.Knowledge Representation & HeuristicsKR may be seen as the foundation area for all the other AI sub-disciplines. Systems built to process vision or natural language use structures and techniques developed within the scope of KR and heuristics. Minksy[AI8] has detailed the primary knowledge representation schemes as; Rules, Frames, Semantic Nets, Neural Networks, and Predicate Logic.Rules are an [IF x THEN y] deductive representation with inference mechanisms to control the flow of rule execution. Two general inference mechanisms exist - forward chaining and backward chaining. Frames are more of a structural representation which makes analogical reasoning easy. Procedural like code lies attached to a frame but lies fallow until the frame is accessed. At that time it may be executed if appropriate. Semantic Nets are cyclical graphs which represent relationships among elements. Neural Networks are an inductive approach which "reads" example data and then establishes strengths in a network based upon the frequency of example occurrences. The strengths are constantly updated exhibiting a "learning" mechanism as the system matures.Predicate Logic uses formal logic representation combined with a technique called resolution to perform logical inferencing. Other KR schemes growing in popularity include genetic algorithms and fuzzy logic. Fikes and Kehler[ES1] have provided an encompassing model for demonstrating the types of knowledge needing representation in a knowledge base. The diagram below shows these sources. ÉÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ» º º º º º º º º º º º º º º º º º º º º º º º º º º º º º º º º º º º º ÈÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ¼ Figure 4-1. Types of KnowledgeThe knowledge representation schemes discussed above can be shown to be effective in representing these knowledge types. For example, behavior descriptions and beliefs are well represented by frames, and by rules (using a attachment called a confidence factor), and by fuzzy logic. Objects and relationships are an ideal match for both semantic nets, and frames. Heuristics and decision rules are obviously well represented by rules, procedures by procedures, and frames for typical situations. For obvious reasons, the foundations of KR are tied closely to those methods of thinking or reasoning as outlined in Chapter 3 including deductive, inductive, analogical, formal logic, procedural, and meta reasoning. An integral part of any of these representations is the encoding of symbolic knowledge and the later extraction or search for this knowledge. Therefore both heuristics and search are key concepts in knowledge engineering and in problem solving and decision making.Heuristics and Search Heuristics could best be described as those methods which have been proven generally reliable, but are not always correct. Often they are referred to as "rules of thumb". Some examples of heuristics follow:ÉÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ»º If it ain't broke, don't fix it. ºº ºº In chess, select moves that protect the center ºº of the board. ºº ºº If it looks like rain, carry an umbrella. ºº ºº To open the door, try pulling or pushing. ºÈÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ¼ Figure 4-2. Examples of HeuristicsHeuristics often revolve around concepts that cannot be reduced to simple numbers, or any very specific data relationships. They exist to channel a problem or a goal.Douglas Lenat, in his doctoral work at Stanford, researched the application of heuristics to problem solving. In this process he gathered hundreds of fairly general heuristics, and placed them into a program to do problem solving. His program, Eurisko, proved very effective at re-discovering heretofore "well known" discoveries across diverse domains such as set theory, war gaming, and computer programming. Dr. Lenat found some heuristics to be particularly valuable. These included extreme cases, coalescence, and fortuitous accident. Extreme Cases is self explanatory - begin by paying special attention to outlyers. Why do these "worst" or "best" points exist? Can we capitalize upon this knowledge? Coalescence means "growing together". Using this concept, Eurisko formulated the approaches of self destruction for damaged war fighting machines, recursion for computer programs, and doubling and squaring in the math domain. Finally Fortuitous Accident forces continual re-examination to of results to see if somehow, while plodding along, you've bypassed an intermediate goal to reach the top goal. Search has been said to be necessary and the most important element of most artificial intelligence methods. To that end AI researchers have formulated effective methods for search and discovery. Many of these methods originated in the decision sciences and have been "borrowed" and put to use in working with knowledge bases. There are dozens of search methods and sub-methods. However the more popular ones include; - depth first - breadth first - hill climbing - british museum - best first - branch & bound - A* - minimax, and - alpha beta.The first 4 of these are simplistic algorithmic procedures, while the last 5 utilize additional information to pare down the search space at each branch of the network path. The viability of each method depends largely upon the purpose of the search. For example depth first is effective when most of the paths we are investigating do not get very deep. Deep searches without success can be seen to be wasteful. Hill climbing is effective when there is some "good" natural way of applying attributes to the remaining paths between where we are and our goal. And using the branch and bound method, originally developed in the operations research world, an evaluation is made at each iteration of the shortest path of all uncompleted paths. See Winston [AI11] or Pearl [AI9] for excellent, in depth discussions of this subject. Heuristics and search routines are only of value if they can be used in a practical application. The AI discipline of expert systems provides just such a playing field.Expert SystemsThe class of systems which deal with the application of human expertise to problem solving is known as expert systems. Frequently this definition of expert systems is extended to include all applications where knowledge is applied to problem solving. In a strict sense however, an expert, or a person with many years of knowledge and experience must be intimately involved with the system's development. Systems where a formal expert is not involved are known as knowledge based systems. ÉÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ» º º º º º º º º º º º º º º º º º º º º º º º º º º º º º º º º º º º º ÈÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ¼ Figure 4-3. Diagram of an Expert SystemExpert Systems are typically built using one or a combination of the knowledge representation schemes mentioned earlier. As can be seen in the diagram above and in the table below, the key element in the operation of an expert system is the inferencing mechanism. In expert system lingo this part is known as the inference engine.

Inputs Processing OutputsQ&A or Inferencing Expert Advice orSensors Feedback Control

Figure 4-4. Expert System I-P-O.This inferencing mechanism controls the order in which users or sensors are queried, as well as the progress toward deriving conclusions. The table below depicts some of the methods used with particular knowledge representation schemes for inferencing.

Knowledge Representation             Inference Mechanism                   

Rules                                Forward Chaining or                   
                                     Backward Chaining                     

Predicate Logic                      Resolution                            

Examples                             ID3 Algorithm

Figure 4-5. Inference MechanismsThese methods each provide a unique approach toward finding solutions to problems. Forward Chaining is termed a "data driven" approach because the system first collects substantial amounts of data, and then sifts through this data, narrowing the potential causes as it goes. This approach is best if only small amounts of data are necessary. Backward Chaining is termed "goal driven" because it begins by asking your goal, and then works to narrow the search space of combinations that could lead to that particular goal. This approach is better when many different combinations of data/clues are evident. Resolution is a very interesting approach which is effectively "proof by refutation". It works by first negating the logical statement you are trying to prove. This negation is then added to a list of axioms underlying the proof. This list is then resolved using logical equivalencies. The result is either a NIL, i.e. the theorem is true, or a FALSE, it is False. This logic "programming" may be quite involved, and therefore requires sophisticated mechanisms for converting the propositions into something called Well Formed Formulas (WFFs). For example the following WFF says "The red brick is on the table." {Vx[brick(x)->3y[on(x,y)]}This is a calculus which applies quantifiers, predicates, variables and functions to propositions. This is why it is termed propositional or predicate calculus. Since the output of an expert system is in the form of expert advice, or may used as a control mechanism, they are frequently used as support within other problem solving systems. For example American Express uses an expert system to help decide whether "unusual" credit card transactions should be approved. Before the expert system was built these decisions were made by floor managers. Other examples abound. Currently most expert systems are built using tools called expert system shells. These tools bundle one or more of the major paradigms into an easily controlled environment. Because of their relatively low cost ($200 - $10,000 for personal computers) these tools can provide significant aid for improving productivity in problem areas. Paybacks of 10 to 100 to 1 are commonplace.Natural Language Processing/Understanding (NLP/U)NLP is a subject of research and progress in AI which directly attacks the cognitive area of language and communication. In chapter 3 language and communication was addressed using a layered model - starting with phonetics and moving up through "real world" modeling. Much of the research works with this general model, either within a layer, or within the entire ladder. The purpose of most practical efforts in NLP is to create an environment where humans can easily interact with the computer - i.e. through natural language. This may mean relatively simple tasks such as English-like queries to databases. Or it may mean much more complex tasks such as interpreting the "meaning" hidden or overt in textual messages, ranging from wire service news stories to technical professional journals. The potential for helping to recognize problems should be obvious. With a capability to digest large volumes of textual material, a NLU system could alert a DM whenever unusual signs or items of interest appear, whether in internal information or from external sources. While the discussion in chapter 3 covered basic terminology this section seeks to outline some of the larger scale efforts to conquer man-machine communication. The table below provides a list of some of the more significant projects.Parsing

Shanks Conceptual Dependency DiagramsFrames & ScriptsThe Message Understanding Conference MCC's CYC ProjectCommercial Database Access Products

Figure 4-6. NLU Systems Parsing is a method of decomposing or breaking apart English statements in an attempt to better understand relationships within the text. These techniques work from the phonological level up through the syntactic and even semantic levels. Parse trees - which are similar to sentence diagrams are used to classify words and clauses in a sentence. Some variations on parsing include transitional grammars, which focus on inter and intra sentence transition, nondeterministic methods, which can use bottom up or top down analysis, deterministic methods, which delay analysis until a sufficient "look ahead" is complete, and augmented transition nets (ATNs) which force sentences into pre-arranged relationships. Conceptual Dependency Diagramming (CDD) is a method which operates primarily at the semantic level. Roger Shank, while at Yale developed his own "calculus" of objects, relationships, and symbolic manipulation. CDD's could be viewed as extensions to ATNs. The thrust of Shank's system was to place all language within a finite logical set of objects/ relationships. In this manner the underlying "meaning" of the text could be understood. Once established, a CDD could be moved into a frame/script to derive even further "general knowledge" about a situation. Another very recent effort in building theory in the NLU arena has been the Message Understanding Conferences (MUC). These have been a series of three conferences which have focused upon the interpretation, and extraction of meaning from text. Sponsored by the U.S. Naval Ocean Systems Centerin San Diego, each year a "contest" is held to determine which software best interprets (via a set of Q&A) a text message, which is unknown (other than a general domain) before the conference. The MUC efforts can be seen as "locally" oriented. They only attempt to derive meaning from the information that is provided - no additional external information is available to help interpret the text. This contrasts with other major projects, such as the one following, whose purpose is to add information. These systems help place the meaning into a wider context. Perhaps the most ambitious effort to date is the CYC (short for encyclopedia and/or psychology) Project. This is a 10 year $35 million project being administered at Microelectronics and Computer Technology Corporation (MCC) in Austin, Texas. MCC is a research consortium consisting of major computer, communications, aerospace, and other high technology companies. CYC is an effort to embed 100 million axioms of knowledge into a database. Its primary purpose is to overcome "brittleness" in current KBSs, i.e., install "Common Sense" into the computer. As humans if we do not understand a particular situation we "back up" and attempt to view the reasoning surrounding an event based upon our wider, more general knowledge. CYC is attempting to mimic this ability. Again the potential for assisting problem solvers/ decision makers is tremendous. An ability to identify otherwise unforeseen problems, to draw upon analogies not seen, and or otherwise use a vast network of heuristics already programmed into CYC would now be at a DM fingertips. On a practical "now available" note, several software developers now market natural language front ends for database products such as Oracle and DB2. Some of these products include Intellect, Ramis, and Natural. The existing limitations of these products revolve around both the structure, i.e. Noun-Verb-Object, and limited word sets for retrieval. Texas Instruments has copyrighted an approach which is pseudo-menu driven. Using this approach, a user may select any word from a list within a series of boxes on the screen. Users have found this approach satisfying and accomodating. Advances in natural language understanding should continue with the future looking at the melding of NLU with speech recognition, the ability to handle non-domain specific tasks, the incorporation of parallel computers to speed up processing and the potential of neural networks (following) for incorporating learning into the understanding equation.Neural NetworksNeural Networks (NN) are a growing technology with strong potential for aiding in human cognitive weakness areas. This is a technology modeled after human neural activity - hence the names. NNs show particular strengths in pattern recognition, classification, and adaptive processing schemes. They have alternatively been called Machine Learning Systems, Connectionist Networks, and Artificial Neural Systems. Figure 4-7 below is a diagram of a typical neural net. ÉÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ» º º º º º º º º º º º º º º º º º º º º º º º º º º º º º º º º º º º º ÈÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ¼ Figure 4-7. Diagram of a Neural NetworkAs can be seen from this diagram a NN consists of a set of inputs and outputs along with what is termed a hidden layer.Each of the circles in this diagram is called a neuron. The outputs consist of a response such as identification of a specific object or selection of a class within a classification scheme. The inputs are those set of characteristics which are indicative of the objects being identified. The hidden layer acts as a filter to accumulated the input signals and turn on the correct output neuron.As an example of a neural net application, the network outputs would consist of different types of aircraft to be identified such as Boeing 747, DC-10, B-52, or C-5A . Inputs would consist of characteristics such as wing shape, fuselage length, tail height, and number of engines. Input OutputWing Shape: Long/ThinFuselage: ElongatedTail: High B-52Engines: SixThe NN is "trained" using sets of these inputs and the correct output for each input set. All input-output combinations are fed to the network repeatedly until the network "settles in", i.e. adapts to the particular task environment. At that point it would correctly identify particular outputs based upon the set of inputs fed into the net. Input sets not seen before would nonetheless trigger an output response. In the above aircraft network it would select that aircraft which is most like the new input set. This is the most interesting attribute of neural nets and what sets them apart from a technology like ES. They can identify something that is close to, but not exactly like, an object or a class it has seen before. NN's can also provide a "proximity of fit" so that a user may be made aware of differences in previously unidentified objects. Most practical applications utilize a training approach, known as "supervised" to establish the net. However some networks can also be created without training. The purpose of these networks, termed "unsupervised", is simply to classify. The network creator would establish x number of classes, and then the network would proceed to divide the set of inputs into x output categories based upon the collective input characteristics. Non-parametric statistical methods exist which deal with this same type of problem. There are a number of approaches toward establishing the layers, the learning mechanisms, and other characteristics of a neural network. Some of the generalized network architectures created by researchers include perceptron, back propagation, adaptive resonance theory, and adaptive linear element. The diagram below details some of the elements in NN creation which are subject to fine tuning based upon the chosen architecture. ÉÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ»º Network Architecture ºº # Layers, # Clusters/Layer º º Layer Associativity/Connections ºº # Neurons/Layer ºº Directions of Feed/Recall ºº ºº Neuron Architecture ºº Initial Weighting ºº Initial Activation Level ºº Summation Function ºº Transfer Function ºº Learning Algorithm ºº ºº Training Method ºº Learning Algorithm ºº Supervised or Unsupervised ºº Weight Steps, degree changes ºº ºº Data ºº Scale Type ºº Conversion/Normalization ºÈÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ¼ Figure 4-8. Art of Neural Network CreationAs will be detailed in Chapter 6, NNs have shown considerable similarity to the purposes behind many statistical methods, mathematical curve fitting and another AI technique termed Memory Based Reasoning [AI10]. SummaryThe areas of knowledge representation and heuristics, expert systems, natural language processing, and machine learning have been examined. By being able to effectively codify knowledge through representation schemes and heuristics, the power of a DM may be substantially enhanced. This aid may be in the form of judgmental refinement or amplification, but is typically available through easier access to expert opinion resident in expert systems. The ability of natural language processing to help solve problems and aid the decision maker now remains crude. But efforts in working with large knowledge bases together with very fast computers and refined techniques will create an interface with the machine of monumental proportions, likely before the turn of the century. Finally neural networks continue to advance with potential for solving optimization type problems on a more exacting scale while also exhibiting potential for attacking the classification problem on a new level. This technology has shown considerable promise for providing solutions to continuous scale problems such as trend prediction and diagnosis problems of all types.