1.Machine Learning and ILP for Multi-Agent Systems Daniel Kudenko & Dimitar Kazakov Department of Computer Science University of York, UK ACAI-01, Prague, July 20012. Why Learning Agents? Agent designers are not able to foresee all situations that the agent will encounter.To display full autonomy Agents need to learn from and adapt to novel environments.Learning is a crucial part of intelligence.3. A Brief History Machine Learning Agents DisembodiedML Single-AgentSystem Single-AgentLearning MultipleSingle-AgentLearners MultipleSingle-Agent System SocialMulti-Agent Learners SocialMulti-Agent System4. Outline Principles of Machine Learning (ML) ML for Single Agents ML for Multi-Agent Systems Inductive Logic Programming for Agents 5. What is Machine Learning? Definition:A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E. [Mitchell 97] Example:T = “play tennis”, E = “playing matches”, P = “score” 6. Types of LearningInductive Learning (Supervised Learning) Reinforcement Learning Discovery (Unsupervised Learning) 7. Inductive Learning [An inductive learning] system aims at determining a description of a given concept from a set of concept examples provided by the teacher and from background knowledge. [Michalski et al. 98] 8. Inductive Learning Examples ofCategory C 1 Examples ofCategory C 2 Examples ofCategory C n Inductive Learning System Hypothesis (Procedure to Classify New Examples)9. Inductive Learning Example Ammo:low Monster:near Light:good Category: shoot Inductive Learning System If (Ammo = high) and(light {medium, good})then shoot; ……… .. Ammo:low Monster:far Light:medium Category:¬ shoot Ammo:high Monster:far Light:good Category: shoot10. Performance Measure Classification accuracy on unseen test set. Alternatively: measure that incorporates cost of false-positives and false-negatives (e.g. recall/precision). 11. Where’s the knowledge? Example (or Object) language Hypothesis (or Concept) language Learning bias Background knowledge 12. Example Language Feature-value vectors, logic programs. Which features are used to represent examples (e.g., ammunition left)?For agents: which features of the environment are fed to the agent (or the learning module)? Constructive Induction: automatic feature selection, construction, and generation. 13. Hypothesis Language Decision trees, neural networks, logic programs, … Further restrictions may be imposed, e.g., depth of decision trees, form of clauses.Choice of hypothesis language influences choice of learning methods and vice versa. 14. Learning bias Preference relation between legal hypotheses.Accuracy on training set. Hypothesis with zero error on training data is not necessarily the best (noise!). Occam’s razor: the simpler hypothesis is the better one.15. Inductive Learning No “real” learning without language or learning bias. IL is search through space of hypotheses guided by bias. Quality of hypothesis depends on proper distribution of training examples. 16. Inductive Learning for Agents What is the target concept (i.e., categories)?Example: do(a), ¬do(a) for specific action a. Real-valued categories/actions can be discretized.Where does the training data come from and what form does it take?17. Batch vs Incremental Learning Batch Learning: collect a set of training examples and compute hypothesis.Incremental Learning: update hypothesis with each new training example. Incremental learning more suited for agents. 18. Batch Learning for Agents When should (re-)computation of hypothesis take place?Example: after experienced accuracy of hypothesis drops below threshold. Which training examples should be used?Example: sequences of actions that led to success.19. Eager vs. Lazy learning Eager learning: commit to hypothesis computed after training.Lazy learning: store all encountered examples and perform classification based on this database (e.g. nearest neighbour). 20. Active Learning Learner decides which training data to receive (i.e. generates training examples and uses oracle to classify them).Closed Loop ML: learner suggests hypothesis and verifies it experimentally. If hypothesis is rejected, the collected data gives rise to a new hypothesis.21. Black-Box vs. White-Box Black-Box Learning: Interpretation of the learning result is unclear to a user. White-Box Learning: Creates (symbolic) structures that are comprehensible. 22. Reinforcement Learning Agent learns from environmental feedback indicating the benefit of states.No explicit teacher required. Learning target: optimal policy (i.e., state-action mapping)Optimality measure: e.g., cumulative discounted reward.23. Q Learning Value of a state: discounted cumulative rewardV (s t ) = i0 ir(s t+i ,a t+i ) 0 < 1is a discount factor ( = 0 means that only immediate reward is considered). r(s t+i,a t+i ) is the reward determined by performing actions specified by policy . Q(s,a) = r(s,a) + V*( (s,a)) Optimal Policy: *(s) = argmax aQ(s,a)24. Q Learning Initialize all Q(s,a) to 0 In some state s choose some action a. Let s’ be the resulting state.Update Q:Q(s,a) = r +max a’Q(s’,a’)25. Q Learning Guaranteed convergence towards optimum (state-action pairs have to be visited infinitely often). Exploration strategy can speed up convergence.Basic Q Learning does not generalize: replace state-action table with function approximation (e.g. neural net) in order to handle unseen states.26. Pros and Cons of RL Clearly suited to agents acting and exploring an environment. Simple. Engineering of suitable reward function may be tricky.May take a long time to converge. Learning result may be not transparent (depending on representation of Q function).27. Combination of IL and RL Relational reinforcement learning [Dzeroski et al. 98]: leads to more general Q function representation that may still be applicable even if the goals or environment change. Explanation-based learning and RL [Dietterich and Flann, 95]. More ILP and RL: see later. 28. Unsupervised Learning Acquisition of “useful” or “interesting” patterns in input data. Usefulness and interestingness are based on agent’s internal bias.Agent does not receive any external feedback.Discovered concepts are expected to improve agent performance on future tasks. 29. Learning and Verification Need to guarantee agent safety. Pre-deployment verification for non-learning agents. What to do with learning agents? 30. Learning and Verification [Gordon ’00] Verification after each self-modification step. Problem: Time-consuming. Solution 1: use property-preserving learning operators. Solution 2: use learning operators which permit quick (partial) re-verification. 31. Learning and Verification What to do if verification fails? Repair (multi)-agent plan. Choose different learning operator. 32. Learning in Multi-Agent Systems Classification Social Awareness. Communication Role Learning. Distributed Learning. 33. Types of Multi-Agent Learning [Weiss & Dillenbourg 99] Multiplied Learning:No interference in the learning process by other agents (except for exchange of training data or outputs). Divided Learning:Division of learning task on functional level.Interacting Learning:cooperation beyond the pure exchange of data.34. Social Awareness Awareness of existence of other agents and (eventually) knowledge about their behavior.Not necessary to achieve near optimal MAS behavior: rock sample collection [Steels 89]. Can it degrade performance?35. Levels of Social Awareness [Vidal&Durfee 97] 0-level agent:no knowledge about existence of other agents. 1-level agent:recognizes that other agents exist, model other agents as 0-level.2-level agent:has some knowledge about behavior of other agents and their behavior; model other agents as 1-level agents.k-level agent:model other agents as (k-1)-level. 36. Social Awareness and Q Learning 0-level agents already learnimplicitlyabout other agents.[Mundhe and Sen, 00]: study of two Q learning agents up to level 2.Two 1-level agents display slowest and least effective learning (worse than two 0-level agents).37. Agent models and Q Learning Q: SA n R ,where n is the number of agents. If other agent’s actions are not observable, need assumption for actions of other agents.Pessimistic assumption:given an agent’s action choice other agents will minimize reward.Optimistic assumption:other agents will maximize reward.38. Agent Models and Q Learning Pessimistic Assumption leads to overly cautious behavior.Optimistic Assumption guarantees convergence towards optimum [Lauer & Riedmiller ‘00]. If knowledge of other agent’s behavior available, Q value update can be based on probabilistic computation [Claus and Boutilier ‘98].But:no guarantee of optimality. 39. Q Learning and Communication [Tan 93] Types of communication:Sharing sensation Sharing or merging policies Sharing episodes Results: Communication generally helps Extra sensory information may hurt 40. Role Learning Often useful for agents to specialize in specific roles for joint tasks.Pre-defined roles: reduce flexibility, often not easy to define optimal distribution, may be expensive.How to learn roles?[Prasad et al. 96]: learn optimal distribution of pre-defined roles.41. Q Learning of roles [Crites&Barto 98]: elevator domain; regular Q learning; no specialization achieved (but highly efficient behavior).[Ono&Fukumoto 96]: Hunter-Prey domain, specialization achieved withgreatest mass merging strategy. 42. Q Learning of Roles[Balch 99] Three types of reward function: local performance-based, local shaped, global.Global reward supports specialization. Local reward supports emergence of homogeneous behaviors. Some domains benefit from learning team heterogeneity (e.g., robotic soccer), others do not (e.g., multi-robot foraging).Heterogeneity measure: social entropy.43. Distributed Learning Motivation: Agents learning a global hypothesis from local observations. Application of MAS techniques to (inductive) learning.Applications: Distributed Data Mining [Provost& Kolluri ‘99], Robotic Soccer. 44. Distributed Data Mining [Provost& Hennessy 96]: Individual learners see only subset of all training examples and compute a set of local rules based on these.Local rules are evaluated by other learners based on their data.Only rules with good evaluation are carried over to the global hypothesis.45. Bibliography [Mitchell 97]T. Mitchell.Machine Learning.McGraw Hill, 1997. [Michalski et al. 98] R.S. Michalski, I. Bratko, M. Kubat.Machine Learning and Data Mining: Methods and Applications.Wiley, 1998. [Dietterich&Flann 95] T. Dietterich and N.Flann. Explanation-based Learning and Reinforcement Learning. InProceedings of the Twelfth International Conference on Machine Learning , 1995. [Dzeroski et al. 98] S. Dzeroski, L. DeRaedt, and H. Blockeel. Relational Reinforcement Learning. In:Proceedings of the Eighth International Conference on Inductive Logic Programming ILP-98.Springer, 1998. [Gordon 00] D. Gordon: Asimovian Adaptive Agents.Journal of Artificial Intelligence Research,13, 2000. [Weiss & Dilelnbourg 99] G. Weiss and P. Dillenbourg. What is ‘Multi’ in Multi-Agent Learning? In P. Dillenbourg (ed.),Collaborative Learning. Cognitive and Computational Approaches.Pergamon Press, 1999. [Vidal & Durfee 97] J.M. Vidal and E. Durfee. Agents Learning about Agents: A Framework and Analysis. In Working Notes of the AAAI-97 workshop on Multiagent Learning, 1997.[Mundhe & Sen 00] M. Mundhe and S. Sen. Evaluating Concurrent Reinforcement Learners.Proceeding sof the Fourth International Conference on Multiagent Systems , IEEE Press, 2000. [Claus & Boutillier 98] C. Claus and C. Boutillier.The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems . AAAI 98. [Lauer & Riedmiller 00]M. Lauer and M. Riedmiller. An Algorithm for Distributed Reinforcement Learning in Cooperative Multi-Agent Systems. In Proceedings of the Seventeenth International Conference in Machine Learning, 2000. 46. Bibliography [Tan 93] M. Tan. Multi-Agent Reinforcement Learning: Independent vs. Cooperative Agents. In: Proceedings of the Tenth International Conference on Machine Learning, 1993. [Prasad et al. 96] M.V.N. Prasad, S.E. Lander and V.R. Lesser. Learning Organizational Roles for Negotiated Search. International Journal of Human-Computer Studies, 48(1), 1996. [Ono & Fukomoto 96] N. Ono and K. Fukomoto. A Modular Approach to Multi-Agent Reinforcement Learning. Proceedings of the First International Conference on Multi-Agent Systems, 1996. [Crites & Barto 98] R. Crites and A. Barto. Elevator Group Control Using Multiple Reinforcement Learning Agents. Machine Learning, 1998.[Balch 99] T. Balch. Reward and Diversity in Multi-Robot Foraging. Proceedings of the IJCAI-99 Workshop on Agents Learning About, From, and With other Agents, 1999. [Provost & Kolluri 99] F. Provost and V. Kolluri."A Survey of Methods for Scaling Up Inductive Algorithms." Data Mining and Knowledge Discovery 3 , 1999. [Provost & Hennessy 96] F. Provost and D. Hennessy. Scaling up: Distributed Machine Learning with Cooperation. AAAI 96, 1996.47. B R E A K48. Machine Learning and ILP for MAS: Part II Integration of ML and AgentsILP and its potential for MAS Agent Applications of ILP Learning, Natural Selection andLanguage 49. Machine Learning and ILP for MAS: Part II Integration of ML and Agents ILP and its potential for MAS Agent Applications of ILP Learning, Natural Selection andLanguage 50. From Machine Learning to Learning Agents Machine Learning:Learning as the only goal Classic Machine Learning ActiveLearning Learning as one of many goals:Learning Agent(s) Closed Loop Machine Learning51. Integrating Machine Learning into the Agent Architecture Time constraints on learning Synchronisation between agents’ actions Learning and Recall 52. Time Constraints on Learning Machine Learning alone:predictive accuracy matters, time doesn’t (just a price to pay) ML in Agents Soft deadlines:resources must be shared with other activities (perception, planning, control) Hard deadlines:imposed by environment:Make up your mind now! (or they’ll eat you) 53. Doing Eagervs . Lazy Learningunder Time Pressure Eager LearningTheories typically more compact… …and faster to use Takes more time to learn – do it when the agent is idleLazy Learning Knowledge acquired at (almost) no cost May be much slower when a test example comes 54. “ Clear-cut” vs. Any-time LearningConsider two types of algorithms: Running a prescribed number of steps guarantees finding a solution can use worst case complexity analysis to find an upper bound on the execution timeAny-time algorithms a longer run may result in a better solution don’t know an optimal solution when they see one example: Genetic Algorithms policies: halt learning to meet hard deadlines or when cost outweighs expected improvements of accuracy 55. Time Constraints on Learning in Simulated Environments Consider various cases: Unlimited time for learning Upper bound on time for learning Learning in real time Gradually tightening the constraints makes integration easier Not limited to simulations: real-world problems have similar setting e.g., various types of auctions 56. SynchronisationTime Constraints Multi-agent Progol (Muggleton) Asynchronous The York MA Environment (Kazakovet al .) 1-move-per-round, immediate update Logic-based MAS for conflict simulations (Kudenko, Alonso) 1-move-per-round, batch update Real time Upper bound Unlimited time57. Learning and Recall Agent must strike a balance between: Learning , which updates the model of the world Recall , which applies existing model of the world to other tasks 58. Learning and Recall (2) Update sensory information Recallcurrent model of world to choose and carry out an action Observe the action outcome Learnnew model of the world59. Learning and Recall(3) Update sensory information Recallcurrent model of world to choose and carry out an action Learnnew model of the world In theory, the two can run in parallel In practice, must share limited resources 60. Learning and Recall(4) Possible strategies: Parallel learning and recall at all times Mutually exclusive learning and recallAfter incremental, eager learning, examples are discarded… … or kept if batch or lazy learning used Cheap on-the-fly learning (preprocessing), off-line computationally expensive learning reduce raw information, change object language analogy with human learning and the role of sleep 61. Machine Learning and ILP for MAS: Part II Integration of ML and AgentsILP and its potential for MAS Agent Applications of ILP Learning, Natural Selection andLanguage 62. Machine Learning Revisited ML can be seen as the task of: taking a set of observations represented in a givenobject/data language and representing (the information in) that set in another language calledconcept/hypothesis language . A side effect of this step – the ability to deal with unseen observations. 63. Object and Concept Language Object Language: (x,y,+/-). Concept Language: any ellipse(5 param.)+ + + + _ _ _ _ 64. Machine Learning Biases The concept/hypothesis language specifies thelanguage bias , which limits the set of all concepts/hypotheses that can be expressed/considered/learned. Thepreference biasallows us to decide between two hypotheses if they both classify the training data equally. Thesearch biasdefines the order in which hypotheses will be considered. Important if one does not search the whole hypothesis space. 65. Preference Bias, Search Bias & Version Space Version space : the subset of hypotheses that have zero training error. + + + + _ _ _ _ most spec. concept most gen. concept66. Inductive Logic Programming Based on three pillars: Logic Programming (LP)to represent data and concepts (i.e., object and concept language)Background Knowledgeto extend the concept language Inductionas learning method 67. LP as ILP Object LanguageA subset of First Order Predicate Logic (FOPL) calledLogic Programming .Often limited to ground facts, i.e.,propositional logic(cf. ID3 etc.). In the latter case, data can be represented as a single table. 68. ILP Object Language Examplegbc(v8,30000,4000). + £ 4000 30,000 Audi V8 :- gbc(uno,90000,3000). - £ 3000 90,000 Fiat Uno gbc(z3,50000,5000). + £ 5000 50,000 BMW Z3 y/n price mileage model ILP representation Good bargain cars69. LP as ILP Concept LanguageThe concept language of ILP is relations expressed as Horn clauses, e.g.: equal(X,X). greater(X,Y) :- X > Y. Cf. propositional logic representation: (arg1=1&arg2=1) or (arg1=2&arg2=2) ... Tedious for finite domains and impossible otherwise. Most often there is onetarget predicate(concept) only.exceptions exist, e.g., Progol 5. 70. Modes in ILP Used to distinguish between input attributes (mode +)output attributes (mode -) of the predicate learned. Mode # used to describe attributes that must contain a constant in the predicate definition. E.g., use modecar_type(+,+, # )to learn car_type(Doors,Roof, sports_car ):- Doors =< 2,Roof = convertible. 71. Modes in ILP Used to distinguish between input attributes (mode +)output attributes (mode -) of the predicate learned. Mode # used to describe attributes that must contain a constant in the predicate definition. E.g., use modecar_type( + ,+,#)to learn car_type( Doors ,Roof,sports_car):- Doors=< 2,Roof = convertible. 72. Modes in ILP Used to distinguish between input attributes (mode +)output attributes (mode -) of the predicate learned. Mode # used to describe attributes that must contain a constant in the predicate definition. E.g., use modecar_type(+, + ,#)to learn car_type(Doors, Roof ,sports_car):- Doors =< 2,Roof= convertible. 73. Modes in ILP Used to distinguish between input attributes (mode +)output attributes (mode -) of the predicate learned. Mode # used to describe attributes that must contain a constant in the predicate definition. E.g., use modecar_type( - , - ,#)to learn car_type( Doors , Roof ,sports_car):- ( Doors= 1 ;Doors= 2),Roof= convertible. 74. Types in ILP Specify the range for each argument User-defined types represented as unary predicates: colour(blue) .colour(red) .colour(black) . Built-in types also provided: nat/1, real/1, any/1in Progol. These definitions may or may not begenerative :colour(X)instantiates X, nat(X)does not. 75. ILP Types and Modes: Examplegbc(v8,30000,4000). + 4000 30,000 Audi V8 :- gbc(uno,90000,3000). - 3000 90,000 Fiat Uno gbc(z3,50000,5000). + 5000 50,000 BMW Z3 modeh(1,gbc(+model,+mileage,+price))? y/n price mileage model ILP representation (Progol) Good bargain cars76. Positive Only LearningA way of dealing with domains where no negative examples are available. Learn the concept of non-self-destructive actions. The trivial definition “ Anything belongs to the target concept ” looks all right ! Trick: generate random examples and treat them as negative. Requires generative type definitions. 77. Background Knowledge Only very simple math. relations, such as identity and “greater than” used so far: equal(X,X). greater(X,Y) :- X > Y. These can also be easily hard-wired in the concept language of propositional learners. ILP’s big advantage: one can extend the concept language with user-defined concepts orbackground knowledge . 78. Background Knowledge (2) The use of certain BK predicates may be a necessary condition for learning the right hypothesis. Redundant or irrelevant BK slows down the learning. Example BK: prod(Miles,Price,Threshold):-Miles * Price < Threshold. Modes:modeh(1,gbc(#model,+miles,+price))? modeb(1,prod(+miles,+price,#threshold))? Th: gbc(z3,Miles,Price) :-prod(Miles,Price,250000001). 79. Choice of Background Knowledge In an ideal world one should start from a complete model of the background knowledge of the target population. In practice, even with the most intensive anthropological studies, such a model is impossible to achieve. We do not even know what it is that we know ourselves. The best that can be achieved is a study of the directly relevant background knowledge, though it is only when a solution is identified that one can know what is or is not relevant. The Critical Villager , Eric Dudley 80. ILP Preference Bias Typically a trade-off between generality and complexity: cover as many positive examples (and as few negative ones) as you can… …with as simple a theory as possible Some ILP learners allow the users to specify their own preference bias. 81. Induction in ILP Bottom-up (least general generalisation) Map a term into a variable Drop a literal from the clause body Top-down (refinement operator) Instantiate a variable Add a literal to the clause body Mixed techniques (e.g., Progol) 82. Example of Induction p(X,Y). p(b,a) :- q(b). p(X,a). p(X,Y) :- q(X). BK: q(b). q(c). Training examples: p(b,a). p(f,g). :- p(i,j).83. Induction in Progol For each training example Find the most general theory (clause)T Find the most specific theory (clause) Search the space in between in a top-down fashion: T = p(X,Y) = p(X,a) :- q(X). p(X,a). p(X,Y) :- q(X)84. Summary of ILP Basics SymbolicEager Knowledge-oriented (white-box) learner Complex, flexible hypothesis space Based on Induction 85. Learning Pure Logic Programsvs . Decision Lists Pure logic programs: the order of clauses is irrelevant, and they must not contradict each other. Decision lists: the concept language includes the predicatecut(!). The use of decision lists can make for simpler (more concise) theories. 86. Decision List Example %action(Cat,ObservedAnimal,Action). action(Cat,Animal,stay):- dog(Animal), owner(Owner,Animal), owner(Owner,Cat),!. action(Cat,Animal,run):- dog(Animal),!. action(Cat,Animal,stay). 87. Updating Decision Lists with Exceptions action(Cat,caesar,run):- !. action(Cat,Animal,stay):- dog(Animal), owner(Owner,Animal), owner(Owner,Cat),!. action(Cat,Animal,run):- dog(Animal),!. action(Cat,Animal,stay). 88. Updating Decision Lists with Exceptions Could be very beneficial in agents when immediate updating of the agent’s knowledge is important: just add the exception at the top of the list. Computationally inexpensive – does not need to modify the rest of the list. Exceptions could be compiled into rules when agent is inactive.89. Replacing Exceptions with Rules:Beforeaction(Cat,caesar,run):- !. action(Cat,rex,run):- !. action(Cat,rusty,run):- !. action(Cat,Animal,stay):- dog(Animal), owner(Owner,Animal), owner(Owner,Cat),!. … 90. Replacing Exceptions with Rules: Afteraction(Cat,Animal,run):- dog(Animal), owner(richard,Animal),!. action(Cat,Animal,stay):- dog(Animal), owner(Owner,Animal), owner(Owner,Cat),!. … 91. Eager ILPvs . Analogical Prediction Eager Learning: learn theory, dispose of observations. Lazy Learning:keep all observations compare new with old ones to classify no explanation provided. Analogical Prediction (Muggleton, Bain ‘98) Combines the often higher accuracy of lazy learning with an intelligible, explicit hypothesis typical for ILP Constructs a local theory for each new observation that is consistent with the largest number of training examples.92. Analogical Prediction Example owner(richard,caesar). action(Cat,caesar,run). owner(richard,rex). action(Cat,rex,run). owner(daniel,blackie). action(Cat,blackie,stay). owner(richard,rusty). action(Cat,rusty,?). 93. Analogical Prediction Example owner(richard,caesar). action(Cat,caesar,run). owner(richard,rex). action(Cat,rex,run). owner(daniel,blackie). action(Cat,blackie,stay). owner(richard,rusty). action(Cat,Dog,run):- owner(richard,Dog). 94. Timing Analysis of Theories Learned with ILP The more training examples, the more accurate the theory… … but how long does it take to produce an answer ? No theoretical work on the subject so far Experiment shows nontrivial behaviour (reminding of thephase transitionsobserved in SAT learning). 95. Timing Analysis of ILP Theories: Example Kazakov, PhD Thesis: left: simple theory with low coverage; succeeds or quickly fails high speed middle: medium coverage, fragmentary theory, lots of backtracking low speed right: general theory with high coverage; less backtracking high speed 96. Machine Learning and ILP for MAS: Part II Integration of ML and AgentsILP and its potential for MAS Agent Applications of ILP Learning, Natural Selection andLanguage 97. Agent Applications of ILPRelational Reinforcement Learning(D ž eroski, De Raedt, Driessens) combines reinforcement learning with ILP generalises over previous experience and goals (Q-table)to produce logical decision trees results can be used to address new situations Don’t miss the next talk (~11:40 –13:10h) ! 98. Agent Applications of ILPILP for Verification and Validation of MAS(Jacob, Driessens, De Raedt) Also uses FOPL decision trees Observes agents’ behavour and represents it as a logical decision tree The rules in the decision tree can be compared with the designers’ intentions Test domain: RoboCup 99. Agent Applications of ILPReid & Ryan 2000: ILP used to help hierarchical reinforcement learningILP constructs high-level features that help discriminate between (state,action) transitions with non-deterministic behaviour100. Agent Applications of ILPMatsuiet al.2000: Proposed an ILP agent that avoids actions which will probably fail to achieve the goal. Application domain: RoboCupAlonso & Kudenko ‘99:ILP and EBL for conflict simulations.101. The York MA Environment Species of 2D agents competing for renewable, limited resources. Agents have simple hard-coded behaviour based on the notion of drives. Each agent can optionally have an ILP (Progol) mind – a separate process receiving observations and suggesting actions.Allows to select the values of inherited features through natural selection. 102. The York MA Environment103. The York MA Environment ILP hasn’t been used in experiments yet (to come soon). A number of experiments using inheritance studiedKinship-driven Altruism among Agents . The start-up project sponsored by Microsoft. Undergraduate students involved so far: Lee Mallabone, Steve Routledge, John Barton.104. Machine Learning and ILP for MAS: Part II Integration of ML and AgentsILP and its potential for MAS Agent Applications of ILP Learning, Natural Selection andLanguage 105. Learning and Natural Selection In learning, search is trivial, choosing the right bias is hard. But, the choice of learning bias is always external to the learner ! To find the best suited bias one could combine arbitrary choices of bias ofwith evolution and natural selection of the fittest individuals. 106. Darwinianvs . Lamarckian Evolution Darwinian evolution:nothing learned by the individual is encoded in the genes and passed on to the offspring. The Baldwin effect:learning abilities (good biases) are selected in evolution because they give the individual a better chance in a dynamic environment.What is passed on to the offspring isuseful, but very general. 107. Darwinianvs . Lamarckian Evolution (2) Lamarckian Evolution:individual experience acquired in life can be inherited. Not the case in nature. Doesn’t mean we can’t use it. The inherited concepts may betoo specific and not of general importance . 108. Learning and Language Language uses concepts which are specific enough to be useful to most/all speakers of that language general enough to correspond to shared experience(otherwise, how would one know what the other is talking about !) The concepts of a language serve as a learning bias which is “inherited” not in genes but through education.109. Communication and Learning Language helps one learn (in addition to inherited biases) allows to communicate knowledge. Distinguish between Knowledge : things that one can explain by the means of a language to another.Skills : the rest, require individual learning, cannot be communicated. If watching was enough to learn, the dog would have become a butcher.Bulgarian proverb . 110. Communication and Learning (2) In NLP, forgetting [examples] may be harmful (van den Boschet al .) An expert is someone who does not think anymore – he knows.Frank Lloyd Wright. It may be difficult to communicate what one has learned because of Limited bandwidth (for lazy learning) The absence of appropriate concepts in the language (for black-box learning) 111. Communication and Learning (3) In a society of communicating agents,less accurate white-box learningmay be better thanmore accurate but expensive learning that cannot be communicatedsince the reduced performance could be outweighed by the much lower cost of learning. 112. Our Current Research Inductive Bias Selection (Shane Greenaway) Role Learning (Spiros Kapetanakis) Inductive Learning for Games (Alex Champandard) Machine Learning of Natural Language in MAS (Mark Bartlett) 113. The End