## Session 1A – Innovative Applications

1A_1
While three deployed applications of game theory for security have recently been reported at AAMAS, we as a community remain in the early stages of these deployments; there is a continuing need to understand the core principles for innovative security applications of game theory. Towards that end, this paper presents PROTECT, a game-theoretic system deployed by the United States Coast Guard (USCG) in the port of Boston for scheduling their patrols. USCG has termed the deployment of PROTECT in Boston a success, and efforts are underway to test it in the port of New York, with the potential for nationwide deployment.

PROTECT is premised on an attacker-defender Stackelberg game model and offers five key innovations. First, this system is a departure from the assumption of perfect adversary rationality noted in previous work, relying instead on a quantal response (QR) model of the adversary's behavior --- to the best of our knowledge, this is the first real-world deployment of the QR model. Second, to improve PROTECT's efficiency, we generate a compact representation of the defender's strategy space, exploiting equivalence and dominance. Third, we show how to practically model a real maritime patrolling problem as a Stackelberg game. Fourth, our experimental results illustrate that PROTECT's QR model more robustly handles real-world uncertainties than a perfect rationality model. Finally, in evaluating PROTECT, this paper for the first time provides real-world data: (i) comparison of human-generated vs PROTECT security schedules, and (ii) results from an Adversarial Perspective Team's (human mock attackers) analysis.
PROTECT: A Deployed Game Theoretic System to Protect the Ports of the United States
Eric Shieh, Bo Anhas 3 papers, Rong Yanghas 5 papers, Milind Tambehas 13 papers, Craig Baldwin, Joseph DiRenzo, Ben Maule, Garrett Meyer

1A_2
This paper describes an innovative multiagent system called
SAVES with the goal of conserving energy in commercial buildings.
We specifically focus on an application to be deployed in an existing
university building that provides several key novelties: (i)
jointly performed with the university facility management team,
SAVES is based on actual occupant preferences and schedules, actual
energy consumption and loss data, real sensors and hand-held
devices, etc.; (ii) it addresses novel scenarios that require negotiations
with groups of building occupants to conserve energy; (iii) it
focuses on a non-residential building, where human occupants do
not have a direct financial incentive in saving energy and thus requires
a different mechanism to effectively motivate occupants; and
(iv) SAVES uses a novel algorithm for generating optimal MDP
policies that explicitly consider multiple criteria optimization (energy
and personal comfort) as well as uncertainty over occupant
preferences when negotiating energy reduction - this combination
of challenges has not been considered in previous MDP algorithms.

In a validated simulation testbed, we show that SAVES substantially
reduces the overall energy consumption compared to the existing
control method while achieving comparable average satisfaction
levels for occupants. As a real-world test, we provide results of
a trial study where SAVES is shown to lead occupants to conserve
energy in real buildings.
SAVES: A Sustainable Multiagent Application to Conserve Building Energy Considering Occupants
Jun-young Kwakhas 3 papers, Pradeep Varakanthamhas 6 papers, Rajiv Maheswaranhas 7 papers, Milind Tambehas 13 papers, Farrokh Jazizadehhas 2 papers, Geoffrey Kavulyahas 2 papers, Laura Kleinhas 2 papers, Burcin Becerik-Gerberhas 2 papers, Timothy Hayeshas 2 papers, Wendy Woodhas 2 papers

1A_4
Contemporary maritime piracy presents a significant threat to the global shipping industry, with annual costs estimated at up to US\$12bn. To address the threat, commanders and policymakers need new data-driven decision-support tools that will allow them to plan and execute counter-piracy activities most effectively. So far, however, the provision of such tools has been very limited. To fill this gap, we have employed the multi-agent approach and developed a novel suite of computational tools and techniques for operational management of counter-piracy operations. A comprehensive agent-based simulation enables the stakeholders to assess the efficiency of a range of piracy counter-measures, including recommended transit corridors, escorted convoys, group transit schemes, route randomization and navy patrol deployments. Decision-theoretic and game-theoretic optimization techniques further assist in discovering counter-measure configurations that yield the best trade-off between transportation security and cost. We demonstrate our approach on two case studies based on the problems and solutions currently explored by the maritime security community. Our work is the first integrated application of agent-based techniques to high-seas maritime security and opens a wide range of directions for follow-up research and development.
Agents vs. Pirates: Multi-agent Simulation and Optimization to Fight Maritime Piracy
Michal Jakobhas 2 papers, Ondřej Vanĕkhas 2 papers, Ondřej Hrstka, Michal Pĕchoučekhas 6 papers

1A_5
Nearly 20% of total energy consumption in the United States is accounted for in heating, ventilation, and air conditioning (HVAC) systems. Smart sensing and adaptive energy management agents can greatly decrease the energy usage of HVAC systems in many building applications, for example by enabling the operator to shut off HVAC to unoccupied rooms. We implement a multi-modal sensor agent that is non-intrusive and low-cost, combining information such as motion detection, CO2 reading, sound level, ambient light, and door state sensing. We show that in our live testbed at the USC campus, these sensor agents can be used to accurately estimate the number of occupants in each room using machine learning techniques, and that these techniques can also be applied to predict future occupancy by creating agent models of the occupants. These predictions will be used by control agents to enable the HVAC system increase its efficiency by continuously adapting to occupancy forecasts of each room.
Improving Building Energy Efficiency with a Network of Sensing, Learning and Prediction Agents
Sunil Mamidi, Yu-Han Changhas 3 papers, Rajiv Maheswaranhas 7 papers

## Session 1B – Teamwork I

1B_2
We consider coalition formation problems for agents with an underlying \emph{synergistic graph}, where edges between agents represent some vital synergistic link, such as communication, trust, or physical constraints. A coalition is infeasible if its members do not form a connected subgraph, meaning parts of the coalition are isolated from others. Current state-of-the-art coalition formation algorithms are not designed for problems over synergistic graphs. They assume that \emph{all} coalitions are feasible and so involve redundant computation when this is not the case.

Accordingly, we propose algorithms, namely D-SlyCE and DyCE, to enumerate all feasible coalitions in a distributed fashion and find the optimal feasible coalition structure respectively. When evaluated on a variety of synergistic graphs, D-SlyCE is up to 660 times faster while DyCE is up to 7x$10^4$ times faster than the state-of-the-art algorithms. For particular classes of graphs, D-SlyCE is the first to enumerate valid coalition values for up to 50 agents and DyCE is the first algorithm to find the optimal coalition structure for up to 30 agents in minutes as opposed to months for previous algorithms.
On Coalition Formation with Sparse Synergies
Thomas Voice, Sarvapali Ramchurnhas 2 papers, Nick Jenningshas 11 papers

1B_3
In a wide range of emerging applications, from disaster management to intelligent sensor networks, teams of software agents can be deployed to effectively solve complex distributed problems. To achieve this, agents typically need to communicate locally sensed information to each other. However, in many settings, there are heavy constraints on the communication infrastructure, making it infeasible for every agent to broadcast all relevant information to everyone else. To address this challenge, we investigate how agents can make good local decisions about what information to send to a set of communication channels with limited bandwidths such that the overall system utility is maximised. Specifically, to solve this problem efficiently in large-scale systems with hundreds or thousands of agents, we develop a novel decentralised algorithm. This combines multi-agent learning techniques with fast decision-theoretic reasoning mechanisms that predict the impact a single agent has on the entire system. We show empirically that our algorithm consistently achieves 85% of a hypothetical centralised optimal strategy with full information, and that it significantly outperforms a number of baseline benchmarks (by up to 600%).

Decentralised Channel Allocation and Information Sharing for Teams of Cooperative Agents
Sebastian Steinhas 2 papers, Simon Williamsonhas 2 papers, Nick Jenningshas 11 papers

1B_4
In many real-life networks, such as urban structures, protein
interactions and social networks, one of the key issues is to measure
the centrality of nodes, i.e. to determine which nodes and edges are
more central to the functioning of the entire network than others. In
this paper we focus on \emph{betweenness centrality} - a metric
based on which the centrality of a node is measured involving the number
of shortest paths that pass through that node. This metric has been
shown to be well suited for many, often complex, networks. In itsstandard form, the betweenness centrality, just like other
centrality metrics, evaluates nodes based on their individual
contributions to the functioning of the network. For instance, the
importance of an intersection in a road network can be computed as the
difference between the full capacity of this network and its capacity
when the intersection is completely shut down. However, as recently
argued in the literature, such an approach is inadequate for many
real-life applications, as, for example, multiple nodes can fail
simultaneously. Thus, what would be desirable is to refine the existing
centrality metrics such that they take into account not only the
functioning of nodes as individual entities but also as members of
groups of nodes. One recently-proposed way of doing this is based on the
\emph{Shapley Value} - a solution concept in cooperative game theory
that measures in a fair way the contributions of players to all the
coalitions that they could possibly participate in. Although this approach has been used to extend various centrality metrics, such an extension to
betweenness centrality is yet to be developed. The main challenge
when developing such a refinement is to tackle the computational
complexity; the Shapely Value generally requires an exponential number of operations, making its use limited to
a small number of player (or nodes in our context). Against this
background, our main contribution in this paper is to refine the
betweenness centrality metric based on the Shapley Value: we develop an
algorithm for computing this new metric, and show that it has the same
complexity as the best known algorithm due to Brandes to compute the
standard betweenness centrality (i.e., polynomial in the size of the network). Finally, we
show that our results can be extended to another important centrality
metric called stress centrality.
A New Approach to Betweenness Centrality Based on the Shapley Value
Piotr Szczepański, Tomasz Michalak, Talal Rahwan

1B_5
Many multi-agent applications may involve a notion of spatial coherence. For instance, simulations of virtual agents often need to model a coherent group or crowd. Alternatively, robots may prefer to stay within a pre-specified communication range. This paper proposes an extension of a decentralized, reactive collision avoidance framework, which defines obstacles in the velocity space, known as Velocity Obstacles (VOs), for coherent groups of agents. The extension, referred to in this work as a Loss of Communication Obstacle (LOCO), aims to maintain proximity among agents by imposing constraints in the velocity space and restricting the set of feasible controls. If the introduction of LOCOs results in a problem that is too restrictive, then the proximity constraints are relaxed in order to maintain collision avoidance. If agents break their proximity constraints, a method is applied to reconnect them. The approach is fast and integrates nicely with the Velocity Obstacle framework. It yields improved coherence for groups of robots connected through an input constraint graph that are moving with constant velocity. Simulated environments involving a single team moving among static obstacles, as well as multiple teams operating in the same environment, are considered in the experiments and evaluated for collisions, computational cost and proximity constraint maintenance. The experiments show that improved coherence is achieved while maintaining collision avoidance, at a small computational cost and path quality degradation.
Maintaining Team Coherence under the Velocity Obstacle Framework
Andrew Kimmel, Andrew Dobson, Kostas Bekris

## Session 1C – Learning I

1C_1
Recent advances in reinforcement learning have yielded several PAC-MDP algorithms that, using the principle of optimism in the face of uncertainty, are guaranteed to act near-optimally with high probability on all but a polynomial number of samples. Unfortunately, many of these algorithms, such as R-MAX, perform poorly in practice because their initial exploration in each state, before the associated model parameters have been learned with confidence, is random. Others, such as \emph{Model-Based Interval Estimation} (MBIE) have weaker sample complexity bounds and require careful parameter tuning. This paper proposes a new PAC-MDP algorithm called \emph{V-MAX} designed to address these problems. By restricting its optimism to future visits, V-MAX can exploit its experience early in learning and thus obtain more cumulative reward than R-MAX. Furthermore, doing so does not compromise the quality of exploration, as we prove bounds on the sample complexity of V-MAX that are identical to those of R-MAX. Finally, we present empirical results in two domains demonstrating that V-MAX can substantially outperform R-MAX and match or outperform MBIE while being easier to tune, as its performance is invariant to conservative choices of its primary parameter.
V-MAX: Tempered Optimism for Better PAC Reinforcement Learning
Karun Rao, Shimon Whiteson

1C_2
Although Reinforcement Learning (RL) has been successfully deployed in a variety of tasks, learning speed remains a fundamental problem for applying RL in complex environments. Transfer learning aims to ameliorate this shortcoming by speeding up learning through the adaptation of previously learned behaviors in similar tasks. Transfer techniques often use an inter-task mapping, which
determines how a pair of tasks are related. Instead of relying on a hand-coded inter-task mapping, this paper proposes a novel transfer learning method capable of autonomously creating an inter-task mapping by using a novel combination of sparse coding, sparse projection learning and sparse Gaussian processes. We also propose two new transfer algorithms (\emph{TrLSPI} and \emph{TrFQI}) based on least squares policy iteration and fitted-Q-iteration. Experiments not only show successful transfer of information between similar tasks, inverted pendulum to cart pole, but also between two very different domains: mountain car to cart pole. This paper empirically shows that the learned inter-task mapping can be successfully used to (1) improve the performance of a learned policy on a fixed number of samples, (2) reduce the learning times needed by the algorithms to converge to a policy on a fixed number of samples, and (3) converge faster to a near-optimal policy given a large number of samples.
Reinforcement Learning Transfer via Sparse Coding
Haitham Bou Ammar, Karl Tuylshas 5 papers, Matthew Taylorhas 2 papers, Kurt Driessen, Gerhard Weisshas 2 papers

## Session 1D – Social Choice I

1D_3
We study the problem of computing possible and necessary winners for partially specified weighted and unweighted tournaments. This problem arises naturally in elections with incompletely specified votes, partially completed sports competitions, and more generally in any scenario where the outcome of some pairwise comparisons is not yet fully known. We specifically consider a number of well-known solution concepts---including the uncovered set, Borda, ranked pairs, and maximin---and show that for most of them possible and necessary winners can be identified in polynomial time. These positive algorithmic results stand in sharp contrast to earlier results concerning possible and necessary winners given partially specified preference profiles.
Possible and Necessary Winners of Partial Tournaments
Haris Azizhas 3 papers, Markus Brill, Felix Fischer, Paul Harrensteinhas 2 papers, Jérôme Langhas 3 papers, Hans Georg Seedig

1D_4
This paper considers the communication complexity of approximating common voting rules. Both upper and lower bounds are presented. For $n$ voters and $m$ alternatives. It is shown that for all $\epsilon \in (0,1)$, the communication complexity of obtaining a $1 - \epsilon$ approximation to Borda is $O(\log(\frac{1}{\epsilon}) nm)$. A lower bound of $\Omega(nm)$ is provided for small values of $\epsilon$. The communication complexity of computing the true Borda winner is $\Omega(nm\log(m))$. Thus, in the case of Borda, one can obtain arbitrarily good approximations with less communication overhead than is required to compute the true Borda winner.

For other voting rules, no such $1 \pm \epsilon$ approximation scheme exists. In particular, it is shown that the communication complexity of computing any constant factor approximation, $\rho$, to Bucklin is $\Omega(\frac{nm}{\rho^2})$. Conitzer and Sandholm show that the communication complexity of computing the true Bucklin winner is $O(nm)$. However, we show that for all $\delta \in (0,1)$, the communication complexity of computing a $m^{\delta}$ approximate winner in Bucklin elections is $O(nm^{1-\delta}\log(m))$. For $\delta \in (\frac{1}{2}, 1)$ a lower bound of $\Omega( nm^{1-2\delta} )$ is also provided.

Similar lower bounds are presented on the communication complexity of computing approximate winners in Copeland elections.
Communication Complexity of Approximating Voting Rules
Travis Servicehas 2 papers, Julie Adamshas 2 papers

## Session 1E – Game Theory I

1E_2
We introduce a measure for the level of stability against coalitional deviations, called \emph{stability scores}, which generalizes widely used notions of stability in non-cooperative games.
We use the proposed measure to compare various Nash equilibria in congestion games, and to quantify the effect of game parameters on coalitional stability.
For our main results, we apply stability scores to analyze and compare the Generalized Second Price (GSP) and Vickrey-Clarke-Groves (VCG) ad auctions.
We show that while a central result of the ad auction literatures is that the GSP and VCG auctions implement the same outcome in one of the equilibria of GSP, the GSP outcome is far more stable. Finally, a modified version of VCG is introduced, which is group strategy-proof, and thereby achieves the highest possible stability score.
Stablity Scores: Measuring Coalitional Stability
Michal Feldmanhas 2 papers, Reshef Meir, Moshe Tennenholtzhas 2 papers

1E_3
In many real-world settings, the structure of the environment constrains the formation of coalitions among agents. Therefore, examining the stability of formed coalition structures in such settings is of natural interest. We address this by considering core-stability within various models of cooperative games with structure. First, we focus on characteristic function games defined on graphs that determine feasible coalitions. In particular, a coalition $S$ can emerge only if $S$ is a connected set in the graph. We study the (now modified) core, in which it suffices to check only feasible deviations. Specifically, we investigate core non-emptiness as well as the complexity of computing stable configurations.
We then move on to the more general class of (graph-restricted) partition function games, where the value of a coalition depends on which other coalitions are present, and provide the first stability results in this domain.
Finally, we propose a ``Bayesian' extension of partition function games, in which information regarding the success of a deviation is provided in the form of a probability distribution describing the possible reactions of non-deviating agents, and provide the first core-stability results in this model also.
Coalitional Stability in Structured Environments
Georgios Chalkiadakishas 4 papers, Vangelis Markakis, Nick Jenningshas 11 papers

1E_5
A Coalition Structure Generation (CSG) problem involves partitioning a set of agents into coalitions so that the social surplus is maximized. Recently, Ohta et al. developed an efficient algorithm for solving CSG, assuming that a characteristic function is represented as a set of rules, such as marginal contribution networks (MC-nets).

In this paper, we extend the formalization of CSG in Ohta et al. so that it can handle negative value rules. Here, we assume that a characteristic function is represented by either MC-nets (without externalities) or embedded MC-nets (with externalities). Allowing negative value rules is important since it can reduce the efforts for describing a characteristic function. In particular, in many realistic situations, it is natural to assume that a coalition has negative externalities to other coalitions.

To handle negative value rules, we examine the following three algorithms: (i) a full transformation algorithm, (ii) a partial transformation algorithm, and (iii) a direct encoding algorithm.
We show that the full transformation algorithm is not scalable in MC-nets (the worst-case representation size is $\Omega(n^2)$, where n is the number of agents), and does not seem to be tractable in embedded MC-nets (representation size would be $\Omega(2^n)$). In contrast, by using the partial transformation or direct encoding algorithms, an exponential blow-up never occurs even for embedded MC-nets. For embedded MC-nets, the direct encoding algorithm creates less rules than the partial transformation algorithm.

Experimental evaluations show that the direct encoding algorithm is scalable, i.e., an off-the-shelf optimization package (CPLEX) can solve problem instances with 100 agents and rules within 10 seconds.
Handling Negative Value Rules in MC-net-based Coalition Structure Generation
Suguru Uedahas 3 papers, Takato Hasegawa, Naoyuki Hashimoto, Naoki Ohta, Atsushi Iwasakihas 4 papers, Makoto Yokoohas 4 papers

## Session 1F – Planning

1F_2
Multiagent planning under uncertainty has seen important progress in recent years. Two techniques, in particular, have substantially advanced efficiency and scalability of planning. Multiagent heuristic search gains traction by pruning large portions of the joint policy space deemed suboptimal by heuristic bounds. Alternatively, influence-based abstraction reformulates the search space of joint policies into a smaller space of influences, which represent the probabilistic effects that agents' policies may exert on one another. These techniques have been used independently, but never together, to solve larger problems (for Dec-POMDPs and subclasses) than previously possible. In this paper, we take the logical albeit nontrivial next step of combining multiagent A* search and influence-based abstraction into a single algorithm. The mathematical foundation that we provide, such as partially-specified influence evaluation and admissible heuristic definition, enables an initial investigation into whether the two techniques bring complementary gains. Our empirical results indicate that A* can provide significant computational savings on top of those already afforded by influence-space search, thereby bringing a significant contribution to the field of multiagent planning under uncertainty.
Heuristic Search of Multiagent Influence Space
Stefan Witwickihas 2 papers, Frans Oliehoekhas 2 papers, Leslie Kaelbling

1F_3
Plan generation is important in a number of agent applications, but such applications generally require elaborate domain models that include not only the definitions of the actions that an agent can perform in a given domain, but also information about the most effective ways to generate plans for the agent in that domain. Such models typically take a large amount of human effort to create.

To alleviate this problem, we have developed a hierarchical goal-based planning formalism and a planning algorithm, GDP (Goal-Decomposition Planner), that combines some aspects of both HTN planning and domain-independent planning. For example, it allows the planning agent to use domain-independent heuristic functions to guide the application of both methods and actions.

This paper describes the formalism, planning algorithm, correctness theorems, and the results of a large experimental study. The experiments show that our planning algorithm works as well as the well-known SHOP2 HTN planner, using domain models only about half the size of SHOP2's.
A Hierarchical Goal-Based Formalism and Algorithm for Single-Agent Planning
Vikas Shivashankar, Ugur Kuterhas 3 papers, Dana Nauhas 2 papers, Ron Alford

1F_5
n this paper, we investigate real-time path planning in static terrain, as needed in video games. We introduce the game time model, where time is partitioned into uniform time intervals, an agent can execute one movement during each time interval, and search and movements are done in parallel. The objective is to move the agent from its start location to its goal location in as few time intervals as possible. For known terrain, we show experimentally that Time-Bounded A* (TBA*), an existing real-time search algorithm for undirected terrain, needs fewer time intervals than two state-of-the-art real-time search algorithms and about the same number of time intervals as A*. TBA*, however, cannot be used when the terrain is not known initially. For initially partially or completely unknown terrain, we thus propose a new search algorithm. Our Time-Bounded Adaptive A* (TBAA*) extends TBA* to on-line path planning with the freespace assumption by combining it with Adaptive A*. We prove that TBAA* either moves the agent from its start location to its goal location or detects that this is impossible - an important property since many existing realtime search algorithms are not able to detect efficiently that no path exists. Furthermore, TBAA* can eventually move the agent on a cost-minimal path from its start location to its goal location if it resets the agent into its start location whenever it reaches its goal location. We then show experimentally in initially partially or completely unknown terrain that TBAA* needs fewer time intervals than several state-of-the-art complete and real-time search algorithms and about the same number of time intervals as the best compared complete search algorithm, even though it has the advantage over complete search algorithms that the agent starts to move right away.
Time Bounded Adaptive A*
Carlos Hernández, Jorge Baier, Tansel Uras, Sven Koenig

## Session 2A – Virtual Agents

2A_1
Research in the behavioral sciences suggests that emotion can serve important social functions and that, more than a simple manifestation of internal experience, emotion displays communicate one's beliefs, desires and intentions. In a recent study we have shown that, when engaged in the iterated prisoner's dilemma with agents that display emotion, people infer, from the emotion displays, how the agent is appraising the ongoing interaction (e.g., is the situation favorable to the agent? Does it blame me for the current state-of-affairs?). From these appraisals people, then, infer whether the agent is likely to cooperate in the future.

In this paper we propose a Bayesian model that captures this social function of emotion. The model supports probabilistic predictions, from emotion displays, about how the counterpart is appraising the interaction which, in turn, lead to predictions about the counterpart's intentions. The model's parameters were learned using data from the empirical study. Our evaluation indicated that considering emotion displays improved the model's ability to predict the counterpart's intentions, in particular, how likely it was to cooperate in a social dilemma. Using data from another empirical study where people made inferences about the counterpart's likelihood of cooperation in the absence of emotion displays, we also showed that the model could, from information about appraisals alone, make appropriate inferences about the counterpart's intentions. Overall, the paper suggests that appraisals are valuable for computational models of emotion interpretation. The relevance of these results for the design of multiagent systems where agents, human or not, can convey or recognize emotion is discussed.
Bayesian Model of the Social Effects of Emotion in Decision-Making in Multiagent Systems
Celso de Melo, Peter Carnevale, Stephen Read, Dimitrios Antos, Jonathan Gratchhas 2 papers

2A_4
This paper presents the intelligent virtual animals that inhabit Omosa, a virtual learning environment to help secondary school students learn how to conduct scientific inquiry and gain concepts from biology. Omosa supports multiple agents, including animals, plants, and human hunters, which live in groups of varying sizes and in a predator-prey relationship with other agent types (species). In this paper we present our generic agent architecture and the algorithms that drive all animals. We concentrate on two of our animals to present how different parameter values affect their movements and inter/intra-group interactions. Two evaluations studies are included: one to demonstrate the effect of different components of our architecture; another to provide domain expert validation of the animal behavior.
Evaluating the Models & Behaviour of 3D Intelligent Virtual Animals in a Predator-Prey Relationship
Deborah Richardshas 2 papers, Michael J. Jacobsonhas 2 papers, John Portehas 2 papers, Charlotte Taylorhas 2 papers, Meredith Taylorhas 2 papers, Anne Newsteadhas 2 papers, Iwan Kelaiahhas 2 papers, Nader Hannahas 2 papers

## Session 2B – Distributed Problem Solving

2B_1
Distributed constraint optimization problems (DCOPs) are well-suited for modeling multi-agent coordination problems where the primary interactions are between local subsets of agents. However, one limitation of DCOPs is the assumption that the constraint rewards are without uncertainty. Researchers have thus extended DCOPs to Stochastic DCOPs (SDCOPs), where rewards are sampled from known probability distribution reward functions, and introduced algorithms to find solutions with the largest expected reward.
Unfortunately, such a solution might be very \emph{risky}, that is, very likely to result in a poor reward. Thus, in this paper, we make three contributions: (1) we propose a stricter objective for SDCOPs, namely to find a solution with the most stochastically dominating probability distribution reward function; (2) we introduce an algorithm to find such solutions; and (3) we show that stochastically dominating solutions can indeed be less risky than expected reward maximizing solutions.
Stochastic Dominance in Stochastic DCOPs for Risk Sensitive Applications
Duc Thien Nguyen, William Yeohhas 3 papers, Hoong Chuin Lauhas 2 papers

2B_4
Distribution network operators face a number of challenges; capacity constrained networks, and balancing electricity demand with generation from intermittent renewable resources. Thus, there is an increasing need for scalable approaches to facilitate optimal dispatch in the distribution network. To this end, we cast the optimal dispatch problem as a decentralised agent-based coordination problem and formalise it as a DCOP. We show how this can be decomposed as a factor graph and solved in a decentralised manner using algorithms based on the generalised distributive law; in particular, the max-sum algorithm. We go on to show that max-sum applied na\"{\i}vely in this setting performs a large number of redundant computations. To address this issue, we present a novel decentralised message passing algorithm using dynamic programming that outperforms max-sum by pruning the search space. We empirically evaluate our algorithm using real data, showing that it outperforms (in terms of computational time and total size of messages sent) both a centralised approach, which uses IBM's ILOG CPLEX 12.2, and max-sum, for large networks.
Optimal Decentralised Dispatch of Embedded Generation in the Smart Grid
Sam Millerhas 2 papers, Sarvapali Ramchurnhas 2 papers, Alex Rogershas 9 papers

2B_5
Real life coordination problems are characterised by stochasticity and a lack of \emph{a priori} knowledge about the interactions between agents. However, decentralised constraint optimisation problems (DCOPs), a widely accepted framework for modelling decentralised coordination problems, assumes perfect knowledge, thus limiting its practical applicability. To address this shortcoming, we introduce the MAB-DCOP, in which the interactions between agents are modelled by multi-armed bandits (MABs). Unlike canonical DCOPs, a MAB-DCOP is not a single shot optimisation problem. Rather, it is a sequential one in which agents need to coordinate in order to strike a balance between acquiring knowledge about the \emph{a priori} unknown and stochastic interactions (exploration), and taking the currently believed optimal joint action (exploitation), so as to maximise the cumulative global utility over a finite time horizon.

We propose \textsc{Heist}, the first asymptotically optimal algorithm for coordination under stochasticity and lack of prior knowledge. \textsc{Heist} solves MAB-DCOPs in a decentralised fashion using a generalised distributive law (GDL) message passing phase to find the joint action with the highest upper confidence bound (UCB) on global utility. We demonstrate that \textsc{Heist} outperforms other state of the art techniques from the MAB and DCOP literature by up to 1.5 orders of magnitude on MAB-DCOPs in experimental settings.
DCOPs and Bandits: Exploration and Exploitation in Decentralised Coordination
Ruben Stranders, Long Tran-Thanh, Francesco Maria Delle Favehas 2 papers, Alex Rogershas 9 papers, Nick Jenningshas 11 papers

## Session 2C – Learning II

2C_1
Solving complex but structured problems in a decentralized manner via multiagent collaboration has received much attention in recent years. This is natural, as on one hand, multiagent systems usually possess a structure that determines the allowable interactions among the agents; and on the other hand, the single most pressing need in a cooperative multiagent system is to coordinate the local policies of autonomous agents with restricted capabilities to serve a system-wide goal. The presence of uncertainty makes this even more challenging, as the agents face the additional need to learn the unknown environment parameters while forming (and following) local policies in an online fashion. In this paper, we provide the first Bayesian reinforcement learning (BRL) approach for distributed coordination and learning in a cooperative multiagent system by devising two solutions to this type of problem. More specifically, we show how the Value of Perfect Information (VPI) can be used to perform efficient decentralised exploration in both model-based and model-free BRL, and in the latter case, provide a closed form solution for VPI, correcting a decade old result by Dearden, Friedman and Russell. To evaluate these solutions, we present experimental results comparing their relative merits, and demonstrate empirically that both solutions outperform an existing multiagent learning method, representative of the state-of-the-art.
Decentralized Bayesian Reinforcement Learning for Online Agent Collaboration
Luke Teacy, Georgios Chalkiadakishas 4 papers, Alessandro Farinellihas 2 papers, Alex Rogershas 9 papers, Nick Jenningshas 11 papers, Sally McClean, Gerard Parr

## Session 2D – Social Choice II

2D_2
In multiagent systems, social choice functions can help aggregate the distinct preferences that agents have over alternatives, enabling them to settle on a single choice. Despite the basic manipulability of all reasonable voting systems, it would still be desirable to find ways to reach a \emph{stable} result, i.e., a situation where no agent would wish to change its vote. One possibility is an iterative process in which, after everyone initially votes, participants may change their votes, one voter at a time. This technique, explored in previous work, converges to a Nash equilibrium when Plurality voting is used, along with a tie-breaking rule that chooses a winner according to a linear order of preferences over candidates.

In this paper, we both consider limitations of the iterative voting method, as well as expanding upon it. We demonstrate the significance of tie-breaking rules, showing that when using a general tie-breaking rule, no scoring rule (nor Maximin) need iteratively converge. However, using a restricted tie-breaking rule (such as the linear order rule used in previous work) does not by itself \emph{ensure} convergence. We demonstrate that many scoring rules (such as Borda) need not converge, regardless of the tie-breaking rule. On a more encouraging note, we prove that Iterative Veto does converge - but that voting rules "between" Plurality and Veto, $k$-approval rules, do not.
Convergence of Iterative Voting
Omer Lev, Jeffrey Rosenscheinhas 2 papers

2D_3
Complexity of voting manipulation is a prominent research topic in computational social choice. In this paper, we study the complexity of {\em optimal} manipulation, i.e., finding a manipulative vote that achieves the manipulator's goal yet deviates as little as possible from her true ranking. We study this problem for three natural notions of closeness, namely, swap distance, footrule distance, and maximum displacement distance, and a variety of voting rules, such as scoring rules, Bucklin, Copeland, and Maximin. For all three distances, we obtain poly-time algorithms for all scoring rules and Bucklin and hardness results
for Copeland and Maximin.
Optimal Manipulation of Voting Rules
Svetlana Obraztsova, Edith Elkindhas 3 papers

2D_5
We develop a formal model of opinion polls in elections and study how they influence the voting behaviour of the participating agents, and thereby election outcomes. This approach is particularly relevant to the study of collective decision making by means of voting in multiagent systems, where it is reasonable to assume that we can precisely model the amount of information available to agents and where agents can be expected to follow relatively simple rules when adjusting their behaviour in response to polls. We analyse two settings, one where a single agent strategises in view of a single poll, and one where multiple agents repeatedly update their voting intentions in view of a sequence of polls. In the single-poll setting we vary the amount of information a poll provides and examine, for different voting rules, when an agent starts and stops having an incentive to manipulate the election. In the repeated-poll setting, using both analytical and experimental methods, we study how the properties of different voting rules are affected under different sets of assumptions on how agents will respond to polling information. Together, our results clarify under which circumstances sharing information via opinion polls can improve the quality of election outcomes and under which circumstances it may have negative effects, due to the increased opportunities for manipulation it provides.

Voter Response to Iterated Poll Information
Annemieke Reijngoud, Ulle Endriss

## Session 2E – Game Theory II

2E_3
In Stackelberg games, a "leader" player first chooses a mixed strategy to commit to, then a "follower" player responds based on the observed leader strategy. Notable strides have been made in scaling up the algorithms for such games, but the problem of finding optimal leader strategies spanning multiple rounds of the game, with a Bayesian prior over unknown follower preferences, has been left unaddressed. Towards remedying this shortcoming we propose a first-of-a-kind tractable method to compute an optimal plan of leader actions in a repeated game against an unknown follower, assuming that the follower plays myopic best-response in every round. Our approach combines Monte Carlo Tree Search, dealing with leader exploration/exploitation tradeoffs, with a novel technique for the identification and pruning of dominated leader strategies. The method provably finds asymptotically optimal solutions and scales up to real world security games spanning double-digit number of rounds.

Playing Repeated Stackelberg Games with Unknown Opponents
Janusz Mareckihas 2 papers, Gerry Tesauro, Richard Segal

2E_4
When a zero-sum game is played once, a risk-neutral player will want to maximize his expected outcome in that single play. However, if that single play instead only determines how much one player must pay to the other, and the same game must be played again, until either player runs out of money, optimal play may differ. Optimal play may require using different strategies depending on how much money has been won or lost. Computing these strategies is rarely feasible, as the state space is often large. This can be addressed by playing the same strategy in all situations, though this will in general sacrifice optimality. Purely maximizing expectation for each round in this way can be arbitrarily bad. We therefore propose a new solution concept that has guaranteed performance bounds, and we provide an efficient algorithm for computing it. The solution concept is closely related to the Aumann-Serrano index of riskiness, that is used to evaluate different gambles against each other. The primary difference is that instead of being offered fixed gambles, the game is adversarial.
Repeated zero-sum games with budget
Troels Sørensen

2E_5
Recently, there has been considerable progress towards algorithms for approximating Nash equilibrium strategies in extensive games. One such algorithm, Counterfactual Regret Minimization (CFR), has proven to be effective in two-player, zero-sum poker domains. While the basic algorithm is iterative and performs a full game traversal on each iteration, sampling based approaches are possible. For instance, chance-sampled CFR considers just a single chance outcome per traversal, resulting in faster but less precise iterations. While more iterations are required, chance-sampled CFR requires less time overall to converge. In this work, we present new sampling techniques that consider sets of chance outcomes during each traversal to produce slower, more accurate iterations. By sampling only the public chance outcomes seen by all players, we take advantage of the imperfect information structure of the game to (i) avoid recomputation of strategy probabilities, and (ii) achieve an algorithmic speed improvement, performing O($n^2$) work at terminal nodes in O($n$) time. We demonstrate that this new CFR update converges more quickly than chance-sampled CFR in the large domains of poker and Bluff.
Efficient Nash Equilibrium Approximation through Monte Carlo Counterfactual Regret Minimization
Michael Johanson, Nolan Bard, Marc Lanctot, Richard Gibson, Michael Bowling

## Session 2F – Knowledge Representation & Reasoning

2F_1
Memory enables past experiences to be remembered and acquired as useful knowledge to support decision making, especially when perception and computational resources are limited. This paper presents a neuropsychological-inspired dual memory model for agents, consisting of an episodic memory that records the agent's experience in real time and a semantic memory that captures factual knowledge through a parallel consolidation process. In addition, the model incorporates a natural forgetting mechanism that prevents memory overloading by removing transient memory traces. Our experimental study based on a real-time first-person-shooter video
game has indicated that the memory consolidation and forgetting processes are not only able to extract valuable knowledge and regulate the memory capacity, but they can mutually improve the effectiveness of learning the knowledge for the given task in hand. Interestingly, a moderate level of forgetting may even improve the task performance rather than disadvantaging it. We suggest that the interplay between rapid memory formation, consolidation, and forgetting processes points to a practical and effective approach for learning agents to acquire and maintain useful knowledge from
experiences in a scalable manner.
Memory Formation, Consolidation, and Forgetting in Learning Agents
Budhitama Subagdjahas 2 papers, Wenwen Wang, Ah-Hwee Tanhas 2 papers, Yuan-Sin Tan, Loo-Nin Teow

2F_3
In this paper we provide a neural-symbolic framework to model, reason about and learn norms in multi-agent systems. To this purpose, we define a fragment of Input/Output (I/O) logic that can be embedded into a neural network. We extend d'Avila Garcez et al. Connectionist Inductive Learning and Logic Programming System (CILP) to translate an I/O logic program into a Neural Network (NN) that can be trained further with examples: we call this new system Normative-CILP (N-CILP). We then present a new algorithm to handle priorities between rules in order to cope with normative issues like Contrary to Duty (CTD), Priorities, Exceptions and Permissions. We illustrate the applicability of the framework on a case study based on RoboCup rules: within this working example, we compare the learning capacity of a network built with N-CILP with a non symbolic neural network, we explore how the initial knowledge impacts on the overall performance, and we test the NN capacity of learning norms, generalizing new Contrary to Duty rules from examples.
Learning and Reasoning about Norms using Neural-Symbolic Systems
Guido Boella, Silvano Colombo Tosatto, Artur d'Avila Garcez, Valerio Genovese, Perotti Alan, Leendert van der Torrehas 3 papers

2F_5
Policy iteration algorithms for partially observable Markov decision processes (POMDP) offer the benefits of quick convergence and the ability to operate directly on the solution, which usually takes the form of a finite state controller. However, the controller tends to grow quickly in size across iterations due to which its evaluation and improvement become costly. Bounded policy iteration provides a way of keeping the controller size fixed while improving it monotonically until convergence, although it is susceptible to getting trapped in local optima. Despite these limitations, policy iteration algorithms are viable alternatives to value iteration.

In this paper, we generalize the bounded policy iteration technique to problems involving multiple agents. Specifically, we show how we may perform policy iteration in settings formalized by the interactive POMDP framework. Although policy iteration has been extended to decentralized POMDPs, the context there is strictly cooperative. Its generalization here makes it useful in non-cooperative settings as well. As interactive POMDPs involve modeling others, we ascribe nested controllers to predict others' actions, with the benefit that the controllers compactly represent the model space. We evaluate our approach on multiple problem domains, and demonstrate its properties and scalability.
Generalized and Bounded Policy Iteration for Finitely-Nested Interactive POMDPs: Scaling Up
Ekhlas Sonuhas 2 papers, Prashant Doshihas 3 papers

## Session 3A – Robotics I

3A_2
A central problem in environmental sensing and monitoring is to classify/label the hotspots in a large-scale environmental field. This paper presents a novel \emph{decentralized active robotic exploration} (DARE) strategy for probabilistic classification/labeling of hotspots in a \emph{Gaussian process} (GP)-based field. In contrast to existing state-of-the-art exploration strategies for learning environmental field maps, the time needed to solve the DARE strategy is independent of the map resolution and the number of robots, thus making it practical for in situ, real-time active sampling. Its exploration behavior exhibits an interesting formal trade-off between that of boundary tracking until the hotspot region boundary can be accurately predicted and wide-area coverage to find new boundaries in sparsely sampled areas to be tracked. We provide a theoretical guarantee on the active exploration performance of the DARE strategy: under reasonable conditional independence assumption, we prove that it can optimally achieve two formal cost-minimizing exploration objectives based on the misclassification and entropy criteria. Importantly, this result implies that the uncertainty of labeling the hotspots in a GP-based field is greatest at or close to the hotspot region boundaries. Empirical evaluation on real-world plankton density and temperature field data shows that, subject to limited observations, DARE strategy can achieve more superior classification of hotspots and time efficiency than state-of-the-art active exploration strategies.
Decentralized Active Robotic Exploration and Mapping for Probabilistic Field Classification in Environmental Sensing
Kian Hsiang Lowhas 3 papers, Jie Chen, John Dolanhas 2 papers, Steve Chien, David Thompson

3A_5
This paper presents the architecture and key components of a simulated humanoid robot soccer team, UT Austin Villa, which was designed to compete in the RoboCup 3D simulation competition. These key components include (1) an omnidirectional walk engine and associated walk parameter optimization framework, (2) an inverse kinematics based kicking architecture, and (3) a dynamic role assignment and positioning system. UT Austin Villa won the RoboCup 2011 3D simulation competition in convincing fashion by winning all 24 games it played. During the course of the competition the team scored 136 goals while conceding none. We analyze the effect of each component in isolation and show through extensive experiments that the complete team significantly outperforms all the other teams from the competition.
UT Austin Villa 2011: A Champion Agent in the RoboCup 3D Soccer Simulation Competition
Patrick MacAlpinehas 2 papers, Daniel Urieli, Samuel Barretthas 2 papers, Shivaram Kalyanakrishnan, Francisco Barrera, Adrian Lopez-Mobilia, Nicolae Ştiurcă, Victor Vu, Peter Stonehas 5 papers

## Session 3C – Human-agent Interaction

3C_4
As computational agents are increasingly used beyond research labs, their success will depend on their ability to learn new skills and adapt to their dynamic, complex environments. If human users - without programming skills - can transfer their task knowledge to agents, learning can accelerate dramatically, reducing costly trials. The \textsc{tamer} framework guides the design of agents whose behavior can be shaped through signals of approval and disapproval, a natural form of human feedback. More recently, \textsc{tamer+rl} was introduced to enable human feedback to augment a traditional reinforcement learning (RL) agent that learns from a Markov decision process's (MDP) reward signal. We address limitations of prior work on \textsc{tamer} and \textsc{tamer+rl}, contributing in two critical directions. First, the four successful techniques for combining human reward with RL from prior \textsc{tamer+rl} work are tested on a second task, and these techniques' sensitivities to parameter changes are analyzed. Together, these examinations yield more general and prescriptive conclusions to guide others who wish to incorporate human knowledge into an RL algorithm. Second, \textsc{tamer+rl} has thus far been limited to a \emph{sequential} setting, in which training occurs before learning from MDP reward. In this paper, we introduce a novel algorithm that shares the same spirit as \textsc{tamer+rl} but learns \emph{simultaneously} from both reward sources, enabling the human feedback to come at any time during the reinforcement learning process. We call this algorithm simultaneous \textsc{tamer+rl}. To enable simultaneous learning, we introduce a new technique that appropriately determines the magnitude of the human model's influence on the RL algorithm throughout time and state-action space.
Reinforcement Learning from Simultaneous Human and MDP Reward
W. Bradley Knox, Peter Stonehas 5 papers

## Session 3D – Economies & Markets I

3D_2
Many agent-based models of financial markets have been able to reproduce certain stylized facts that are observed in actual empirical time series data by using "zero-intelligence" agents whose behaviour is largely random in order to ascertain whether certain phenomena arise from market micro-structure as opposed to strategic behaviour. Although these models have been highly successful, it is not surprising that they are unable to explain \emph{every} stylized fact, and indeed it seems plausible that although some phenomena arise purely from market micro-structure, other phenomena arise from the behaviour of the participating agents, as suggested by more complex agent-based models which use agents endowed with various forms of strategic behaviour. Given that both zero-intelligence and strategic models are each able to explain various phenomena, an interesting question is whether there are hybrid, "zero-intelligence plus" models containing a minimal amount of strategic behaviour that are simultaneously able to explain all of the stylized facts. We conjecture that as we gradually increase the level of strategic behaviour in a zero-intelligence model of a financial market we will obtain an increasingly good fit with the stylized facts of empirical financial time-series data. We test this hypothesis by systematically evaluating several different experimental treatments in which we incrementally add minimalist levels of strategic behaviour to our model, and test the resulting time series of returns for the following statistical features: fat tails, volatility clustering, persistence and non-Gaussianity. Surprisingly, the resulting "zero-intelligence plus" models do mph{not} introduce more realism to the time series, thus supporting other research which conjectures that some phenomena in the financial markets are indeed the result of more sophisticated learning, interaction and adaptation.
Can a Zero-Intelligence Plus Model Explain the Stylized Facts of Financial Time Series Data?
Imon Palit, Steve Phelps, Wing Lon Ng

3D_4
We introduce a novel online mechanism that schedules the allocation of an expiring and continuously-produced resource to self-interested agents with private preferences. A key application of our mechanism is the charging of pure electric vehicles, where owners arrive dynamically over time, and each owner requires a minimum amount of charge by its departure to complete its next trip. To truthfully elicit the agents' preferences in this setting, we introduce the new concept of pre-commitment: Whenever an agent is selected, our mechanism pre-commits to charging the vehicle by its reported departure time, but maintains flexibility about \emph{when} the charging takes place and at \emph{what rate}. Furthermore, to make effective allocation decisions we use a model-based approach by modifying Consensus, a well-known online optimisation algorithm. We show that our pre-commitment mechanism with modified Consensus incentivises truthful reporting. Furthermore, through simulations based on real-world data, we show empirically that the average utility achieved by our mechanism is 93% or more of the offline optimal.
A Model-Based Online Mechanism with Pre-Commitment and its Application to Electric Vehicle Charging
Sebastian Steinhas 2 papers, Enrico Gerdinghas 3 papers, Valentin Robuhas 2 papers, Nick Jenningshas 11 papers

3D_5
A principal seeks production of a good within a limited time-frame with a hard deadline, after which any good procured has no value. There is inherent uncertainty in the production process, which in light of the deadline may warrant simultaneous production of multiple goods by multiple producers despite there being no marginal value for extra goods beyond the \emph{maximum quality} good produced. This motivates a \emph{crowdsourcing} model of procurement. We address efficient execution of such procurement from a social planner's perspective, taking account of and optimally balancing the value to the principal with the costs to producers (modeled as effort expenditure) while, crucially, contending with self-interest on the part of all players. A solution to this problem involves both an algorithmic aspect that determines an optimal effort level for each producer given the principal's value, and also an incentive mechanism that equilibrium implementation of the socially optimal policy despite the principal privately observing his value, producers privately observing their skill levels and effort expenditure, and all acting selfishly to maximize their own individual welfare. In contrast to popular "winner take all" contests, the efficient mechanism we propose involves a payment to every producer that expends non-zero effort in the efficient policy.
Efficient Crowdsourcing Contests
Ruggiero Cavallo, Shaili Jain

## Session 3E – Game Theory III

3E_1
To step beyond the first-generation deployments of attacker-defender security games -- for LAX Police, US FAMS and others -- it is critical that we relax the assumption of perfect rationality of the human adversary. Indeed, this assumption is a well-accepted limitation of classical game theory and modeling human adversaries' bounded rationality is critical. To this end, quantal response (QR) has provided very promising results to model human bounded rationality. However, in computing optimal defender strategies in real-world security games against a QR model of attackers, we face difficulties including (1) solving a nonlinear non-convex optimization problem efficiently for massive real-world security games; and (2) addressing constraints on assigning security resources, which adds to the complexity of computing the optimal defender strategy.

This paper presents two new algorithms to address these difficulties: \textsc{GOSAQ} can compute the globally optimal defender strategy against a QR model of attackers when there are no resource constraints and gives an efficient heuristic otherwise; \textsc{PASAQ} in turn provides an efficient approximation of the optimal defender strategy with or without resource constraints. These two novel algorithms are based on three key ideas: (i) use of a binary search method to solve the fractional optimization problem efficiently, (ii) construction of a convex optimization problem through a non-linear transformation, (iii) building a piecewise linear approximation of the non-linear terms in the problem. Additional contributions of this paper include proofs of approximation bounds, detailed experimental results showing the advantages of extsc{GOSAQ} and \textsc{PASAQ} in solution quality over the benchmark algorithm (\textsc{BRQR}) and the efficiency of \textsc{PASAQ}. Given these results, \textsc{PASAQ} is at the heart of the PROTECT system, which is deployed for the US Coast Guard in the port of Boston, and is now headed to other ports.
Computing Optimal Strategy against Quantal Response in Security Games
Rong Yanghas 5 papers, Fernando Ordóñezhas 2 papers, Milind Tambehas 13 papers

3E_2
Given their existing and potential real-world security applications, Bayesian Stackelberg games have received significant research interest. In these games, the defender acts as a leader, and the many different follower types model the uncertainty over discrete attacker types. Unfortunately since solving such games is an NP-hard problem, scale-up has remained a difficult challenge.

This paper scales up Bayesian Stackelberg games, providing a novel unified approach to handle uncertainty not only over discrete follower types but also other key continuously distributed real world uncertainty, due to the leader's execution error, the follower's observation error, and continuous payoff uncertainty. To that end, this paper provides contributions in two parts. First, we present a new algorithm for Bayesian Stackelberg games, called \textsc{hunter}, to scale up the number of types. \textsc{hunter} combines the following five key features: i) efficient pruning via a best-first search of the leader's strategy space; ii) a novel linear program for computing tight upper bounds for this search; iii) using Bender's decomposition for solving the upper bound linear program efficiently; iv) efficient inheritance of Bender's cuts from parent to child; v) an efficient heuristic branching rule. Our experiments show that \textsc{hunter} provides orders of magnitude speedups over the best existing methods to handle discrete follower types. In the second part, we show \textsc{hunter}'s efficiency for Bayesian Stackelberg games can be exploited to also handle the continuous uncertainty using sample average approximation. We experimentally show that our \textsc{hunter}based approach also outperforms latest robust solution methods under continuously distributed uncertainty.
A Unified Method for Handling Discrete and Continuous Uncertainty in Bayesian Stackelberg Games
Zhengyu Yinhas 4 papers, Milind Tambehas 13 papers

3E_3
The burgeoning area of security games has focused on real-world domains where security agencies protect critical infrastructure from a diverse set of adaptive adversaries. There are security domains where the payoffs for preventing the different types of adversaries may take different forms (seized money, reduced crime, saved lives, etc) which are not readily comparable. Thus, it can be difficult to know how to weigh the different payoffs when deciding on a security strategy. To address the challenges of these domains, we propose a fundamentally different solution concept, multi-objective security games (MOSG), which combines security games and multi-objective optimization. Instead of a single optimal solution, MOSGs have a set of Pareto optimal (non-dominated) solutions referred to as the Pareto frontier. The Pareto frontier can be generated by solving a sequence of constrained single-objective optimization problems (CSOP), where one objective is selected to be maximized while lower bounds are specified for the other objectives. Our contributions include: (i) an algorithm, Iterative $\epsilon$-Constraints, for generating the sequence of CSOPs; (ii) an exact approach for solving an MILP formulation of a CSOP (which also applies to multi-objective optimization in more general Stackelberg games); (iii) heuristics that achieve speedup by exploiting the structure of security games to further constrain a CSOP; (iv) an approximate approach for solving an algorithmic formulation of a CSOP, increasing the scalability of our approach with quality guarantees. Additional contributions of this paper include proofs on the level of approximation and detailed experimental evaluation of the proposed approaches.
Multi-Objective Optimization for Security Games
Matthew Brown, Bo Anhas 3 papers, Christopher Kiekintveld, Fernando Ordóñezhas 2 papers, Milind Tambehas 13 papers

3E_4
There has been significant recent interest in computing effective strategies for playing large imperfect-information games. Much prior work involves computing an approximate equilibrium strategy in a smaller abstract game, then playing this strategy in the full game (with the hope that it also well approximates an equilibrium in the full game). In this paper, we present a family of modifications to this approach that work by constructing non-equilibrium strategies in the abstract game, which are then played in the full game. Our new procedures, called \emph{purification} and \emph{thresholding}, modify the action probabilities of an abstract equilibrium by preferring the higher-probability actions. Using a variety of domains, we show that these approaches lead to significantly stronger play than the standard equilibrium approach. As one example, our program that uses purification came in first place in the two-player no-limit Texas Hold'em total bankroll division of the 2010 Annual Computer Poker Competition. Surprisingly, we also show that purification significantly improves performance (against the full equilibrium strategy) in random 4 x 4 matrix games using random 3 x 3 abstractions. We present several additional results (both theoretical and empirical). Overall, one can view these approaches as ways of achieving robustness against overfitting one's strategy to one's lossy abstraction. Perhaps surprisingly, the performance gains do not necessarily come at the expense of worst-case exploitability.
Strategy Purification and Thresholding: Effective Non-Equilibrium Approaches for Playing Large Games
Sam Ganzfried, Tuomas Sandholmhas 4 papers, Kevin Waugh

## Session 3F – Agent-based Software Development

3F_1
In Belief Desire Intention (BDI) agent systems it is usual for goals to have a number of plans that are possible ways of achieving the goal, applicable in different situations, usually captured by a \emph{context condition}. In Agent Oriented Software Engineering it has been suggested that a designer should be conscious of whether a goal has \emph{complete coverage}, that is, is there some plan that is applicable for every situation. Similarly a designer should be conscious of \emph{overlap}, that is, for a given goal, are there situations where more than one plan could be applicable for achieving that goal. In this paper we further develop these notions in two ways, and then describe how they can be used both in agent reasoning and agent system development. Firstly, we replace the boolean value for basic coverage and overlap with numerical measures, and explain how these may be calculated. Secondly, we describe a measure that combines these basic measures, with the characteristics of the coverage/overlap in the goal-plan tree below a given goal. We then describe how these domain independent mesures can be used for both plan selection and intention selection, as well as for guidance in agent system design and development.
Measuring Plan Coverage and Overlap for Agent Reasoning
John Thangarajahhas 4 papers, Sebastian Sardinahas 2 papers, Lin Padghamhas 4 papers

3F_2
Normative organisations provide a means to coordinate the activities of individual agents in multiagent settings. The coordination is realized at run time by creating obligations and prohibitions (norms) for individual agents. If an agent cannot meet an obligation or violates a prohibition, the organisation imposes a sanction on the agent. In this paper, we consider \emph{norm-aware} agents that deliberate on their goals, norms and sanctions before deciding which plan to select and execute. A norm-aware agent is able to violate norms (accepting the resulting sanctions) if it is in the agent's overall interests to do so, e.g., if meeting an obligation would result in an important goal of the agent becoming unachievable. Programming norm-aware agents in conventional BDI-based agent programming languages is difficult, as they lack support for deliberating about goals, norms and sanctions and deadlines. We present the norm-aware agent programming language N-2APL. N-2APL is based on 2APL and provides support for beliefs, goals, plans, norms, sanctions and deadlines. We give the syntax and semantics of N-2APL, and show that N-2APL agents are rational in the sense of committing to a set of plans that will achieve the agent's most important goals and obligations by their deadlines while respecting its most important prohibitions.
Programming Norm-Aware Agents
Natasha Alechinahas 3 papers, Mehdi Dastanihas 4 papers, Brian Loganhas 2 papers

## Session 4A – Robotics II

4A_1
In this paper, we propose a novel top-down design method for the development ofcollective behaviors of swarm robotics systems called \emph{property-driven design}. Swarm robotics systems are usually designed and developed using a \emph{code-and-fix} approach, that is, the developer devises, tests and modifies the individual robot behaviors until a desired collective behavior is obtained. The code-and-fix approach can be very time consuming and relies completely on the ingenuity and expertise of the designer. The idea of property-driven design is that a swarm robotics system can be described by specifying formally a set of desired properties. In an iterative process similar to test-driven development, the developer produces a model of the system that satisfies the desired properties. Subsequently, the system is implemented in simulation and using real robots. Property-driven design helps to minimize the risk of developing a system that does not satisfy the required properties, and to promote the reuse of hardware-independent models. In this paper, we start by giving a general description of the method. We then present a possible way to apply it by using Discrete Time Markov Chains (DTMC) and Probabilistic Computation Tree Logic* (PCTL*). Finally, we conclude by presenting the application of the proposed method to the design and development of a swarm robotics system performing aggregation.
Property-driven design for swarm robotics
Manuele Brambilla, Carlo Pinciroli, Mauro Birattari, Marco Dorigohas 2 papers

4A_4
When a mixture of particles with different attributes undergoes vibration, a segregation pattern is often observed. For example, in muesli cereal packs, the largest particles - the Brazil nuts - tend to end up at the top. For this reason, the phenomenon is known as the Brazil nut effect. In previous research, an algorithm inspired by this effect was designed to produce segregation patterns in swarms of simulated agents that move on a horizontal plane.

In this paper, we adapt this algorithm for implementation on robots with directional vision. We use the e-puck robot as a platform to test our implementation. In a swarm of e-pucks, different robots mimic disks of different sizes (larger than their physical dimensions). The motion of every robot is governed by a combination of three components: (i) attraction towards a point, which emulates the effect of a gravitational pull, (ii) random motion, which emulates the effect of vibration, and (iii) repulsion from nearby robots, which emulates the effect of collisions between disks. The algorithm does not require robots to discriminate between other robots; yet, it is capable of forming annular structures where the robots in each annulus represent disks of identical size.

We report on a set of experiments performed with a group of 20 physical e-pucks. The results obtained in 100 trials of 20 minutes each show that the percentage of incorrectly-ordered pairs of disks from different groups decreases as the size ratio of disks in different groups is increased. In our experiments, this percentage was, on average, below 0.5% for size ratios from 3.0 to 5.0. Moreover, for these size ratios, all segregation errors observed were due to mechanical failures that caused robots to stop moving.
Segregation in Swarms of e-puck Robots Based On the Brazil Nut Effect
Jianing Chenhas 2 papers, Melvin Gauci, Michael J. Price, Roderich Groß

4A_5
Modern model-driven engineering and Agent-Oriented Software Engineering (AOSE) methods are rarely utilized in developing robotic software. In this paper, we show how a Model-Driven AOSE methodology can be used for specifying the behavior of multi-robot teams. Specifically, the Agent Systems Engineering Methodology (ASEME) was used for developing the software that realizes the behavior of a physical robot team competing in the Standard Platform League of the RoboCup competition (the robot soccer world cup). The team consists of four humanoid robots, which play soccer autonomously in real time utilizing the on-board sensing, processing, and actuating capabilities, while communicating and coordinating with each other in order to achieve their common goal of winning the game. Our work focuses on the challenges of coordinating the base functionalities (object recognition, localization, motion skills) within each robot (intra-agent control) and coordinating the activities of the robots towards a desired team behavior (inter-agent control). We discuss the difficulties we faced and present the solutions we gave to a number of practical issues, which, in our view, are inherent in applying any AOSE methodology to robotics. We demonstrate the added value of using an AOSE methodology in the development of robotic systems, as ASEME allowed for a platform-independent team behavior specification, automated a large part of the code generation process, and reduced the total development time.
Model-Driven Behavior Specification for Robotic Teams
Alexandros Paraschos, Nikolaos Spanoudakis, Michail Lagoudakis

## Session 4B – Agent Societies

4B_2
We propose a novel method for assessing the reputation of agents in multiagent systems that is capable of exploiting the structure and semantics of rich agent interaction protocols and agent communication languages. Our method is based on using so-called \emph{conversation models}, i.e. succinct, qualitative models of agents' behaviours derived from the application of data mining techniques on protocol execution data in a way that takes advantage of the semantics of inter-agent communication available in many multiagent systems. Contrary to existing systems, which only allow for querying agents regarding their assessment of others' reputation in an \emph{outcome-based} way (often limited to distinguishing between "successful" and "unsuccessful" interactions), our method allows for contextualised queries regarding the structure of past interactions, the values of content variables, and the behaviour of agents across different protocols. Moreover, this is achieved while preserving maximum privacy for the reputation querying agent and the witnesses queried, and without requiring a common definition of reputation, trust or reliability among the agents exchanging reputation information. A case study shows that, even with relatively simple reputation measures, our qualitative method outperforms quantitative approaches, proving that we can meaningfully exploit the additional information afforded by rich interaction protocols and agent communication semantics.
A qualitative reputation system for multiagent systems with protocol-based communication
Emilio Serranohas 2 papers, Michael Rovatsos, Juan Botia

## Session 4C – Argumentation & Negotiation

4C_1
An argumentation framework can be seen as expressing, in an abstract way, the conflicting information of an underlying logical knowledge base. This conflicting information often allows for the presence of more than one possible reasonable position (extension/labelling) which one can take. A relevant question, therefore, is how much these positions differ from each other. In the current paper, we will examine the issue of how to define meaningful measures of distance between the (complete) labellings of a given argumentation framework. We provide concrete distance measures based on argument-wise label difference, as well as based on the notion of critical sets, and examine their properties.
Quantifying Disagreement in Argument-based Reasoning
Richard Booth, Martin Caminada, Mikolaj Mikołaj, Iyad Rahwanhas 2 papers

4C_4
Agents in open multi-agent systems must deal with the difficult problem of selecting interaction partners in the face of uncertainty about their behaviour. This is especially problematic if they have to interact with an agent they have not interacted with before. In this case they can turn to their peers for information about this potential partner. However, in scenarios where agents may be evaluated according to many different criteria for many different purposes, their peers' evaluations may be mismatched with regards to their own expectations. In this paper we present a novel method, using an argumentation framework, that allows agents to discuss and adapt their trust model. This allows agents to provide, and receive, personalized trust evaluations, better suited to the agent in need, as is shown in a prototypical experiment.
Personalizing Communication about Trust
Andrew Koster, Jordi Sabater-Mirhas 2 papers, Marco Schorlemmer

## Session 4D – Economies & Markets II

4D_2
We consider a setting in which a worker and a manager may each have
information about the likely completion time of a task, and the worker
also affects the completion time by choosing a level of effort. The
task itself may further be composed of a set of subtasks, and the
worker can also decide how many of these subtasks to split out into
an explicit prediction task. In addition, a worker can learn about the likely completion time of a task as work on subtasks completes. We characterize a family of scoring rules for the worker and manager that provide three properties: information is truthfully reported, best effort is exerted by the worker in completing tasks as quickly as possible; and collusion is not possible. We also study the factors influencing when a worker will split a task into subtasks, each forming a separate prediction target.
Predicting Your Own Effort
David F. Bacon, Yiling Chenhas 2 papers, Ian Kash, David Parkeshas 2 papers, Malvika Rao, Manu Sridharan

4D_4
Kidney exchange, where needy patients swap incompatible donors with each other, offers a lifesaving alternative to waiting for an organ from the deceased-donor waiting list. Recently, \emph{chains} -- sequences of transplants initiated by an altruistic kidney donor -- have shown marked success in practice, yet remain poorly understood. We provide a theoretical analysis of the efficacy of chains in the most widely used kidney exchange model, proving that long chains do not help beyond chains of length of 3 in the large. This completely contradicts our real-world results gathered from the budding nationwide kidney exchange in the United States; there, solution quality improves by increasing the chain length cap to 13 or beyond. We analyze reasons for this gulf between theory and practice, motivated by our experiences running the only nationwide kidney exchange. We augment the standard kidney exchange model to include a variety of real-world features. Experiments in the static setting support the theory and help determine how large is really "in the large". Experiments in the dynamic setting cannot be conducted in the large due to computational limitations, but with up to 460 candidates, a chain cap of 4 was best (in fact, better than 5).
Optimizing Kidney Exchange with Transplant Chains: Theory and Reality
John Dickerson, Ariel Procaccia, Tuomas Sandholmhas 4 papers

4D_5
We consider the age-old problem of allocating items among different
agents in a way that is efficient and fair. Two papers, by
Dolev et al. and Ghodsi et al., have recently studied this problem in
the context of computer systems. Both papers had similar models
for agent preferences, but advocated different notions of fairness.
We formalize both fairness notions in economic terms, extending
them to apply to a larger family of utilities. Noting that in settings
with such utilities efficiency is easily achieved in multiple ways, we
study notions of fairness as criteria for choosing between different
efficient allocations. Our technical results are algorithms for finding
fair allocations corresponding to two fairness notions: Regarding
the notion suggested by Ghodsi et al., we present a polynomialtime
algorithm that computes an allocation for a general class of
fairness notions, in which their notion is included. For the other,
suggested by Dolev et al., we show that a competitive market equilibrium
achieves the desired notion of fairness, thereby obtaining a
polynomial-time algorithm that computes such a fair allocation and
solving the main open problem raised by Dolev et al.
Fair Allocation Without Trade
Avital Gutman, Noam Nisan

## Session 4E – Game Theory IV

4E_2
We consider multi-player games, and the guarantees that a master player that plays on behalf of a set of players can offer them, without making any assumptions on the rationality of the other players. Our model consists of an $(n+1)$-player game, with $m$ strategies per player, in which a \emph{master} player $M$ forms a coalition with nontransferable utilities among $n$ players, and the remaining player is called the {\em independent} player. Existentially, it is shown that every game admits a \emph{product-minimax-safe} strategy for $M$ -- a strategy that guarantees for every player in $M$'s coalition an expected value of at least her \emph{product minimax value} (which is at least as high as her minimax value and is often higher). Algorithmically, for any given vector of values for the players, one can decide in polytime whether it can be ensured by $M$, and if so, compute a mixed strategy that guarantees it. In symmetric games, a product minimax strategy for $M$ can be computed efficiently, even without being given the safety vector. We also consider the performance guarantees that $M$ can offer his players in repeated settings. Our main result here is the extension of the oblivious setting of Feldman, Kalai and Tennenholtz, showing that in every symmetric game, a master player who never observes a single payoff can guarantee for each of its players a {\em similar} performance to that of the independent player, even if the latter gets to choose the payoff matrix after the fact.
Mastering multi-player games
Yossi Azar, Uriel Feige, Michal Feldmanhas 2 papers, Moshe Tennenholtzhas 2 papers

4E_4
We analytically study the role played by the network topology in sustaining cooperation in a society of myopic agents in an evolutionary setting. In our model, each agent plays the Prisoner's Dilemma (PD) game with its neighbours, as specified by a network. Cooperation is the incumbent strategy, whereas defectors are the mutants. Starting with a population of cooperators, some agents are switched to defection. The agents then play the PD game with their neighbours and compute their fitness. After this, an evolutionary rule, or
imitation dynamic is used to update the agent strategy. A defector switches back to cooperation if it has a cooperator neighbour with higher fitness. The network is said to sustain cooperation if almost all defectors switch to cooperation. Earlier work on the sustenance of cooperation has largely consisted of simulation studies, and we seek to complement this body of work by providing analytical insight for the same.

We find that in order to sustain cooperation, a network should satisfy some properties such as small average diameter, densification, and irregularity. Real-world networks have been empirically shown to exhibit these properties, and are thus candidates for the sustenance of cooperation. We also analyze some specific graphs to determine whether or not they sustain cooperation. In particular, we find that scale-free graphs belonging to a certain family sustain cooperation, whereas Erdos-Renyi random graphs do not. To the best of our knowledge, ours is the first analytical attempt to determine which networks sustain cooperation in a population of myopic agents in an evolutionary setting.
Sustaining Cooperation on Networks: An Analytical Study based on Evolutionary Game Theory
Raghunandan Ananthasayanam, Subramanian Chandrasekarapuram

## Session 4F – Logics for Agency

4F_1
We consider semantic structures and logics that differentiate between being uncertain about a proposition, being unaware of a proposition, becoming aware of a proposition and getting to know the truth value of a proposition. In this paper we give a unified setting to model all this variety of static and dynamic aspects of awareness and knowledge, without any constraints on the modal properties of knowledge (or belief -- such as introspection) or on the interaction between awareness and knowledge (such as awareness introspection). Our primitive epistemic operator is called \emph{speculative knowledge}. This is different from the better known \emph{implicit knowledge}, now definable, which plays a more restricted role. Some dynamic semantic primitives that are elegantly definable in our setting are the actions of 'becoming aware of a propositional variable', 'implicit knowledge', 'adressing a novel issue in an announcement', and also more complex ways in which an agent can become aware of a novel issue by way of increasing the complexity of the epistemic model.
Action models for knowledge and awareness
Hans van Ditmarschhas 2 papers, Tim French, Fernando R. Velázquez-Quesada

4F_4
The last decade has been witness to a rapid growth of interest in logics intended to support reasoning about the interactions between knowledge and action. Typically, logics combining dynamic and epistemic components contain ontic actions (which change the state of the world, e.g., switching a light on) or epistemic actions (which affect the information possessed by agents, e.g., making an announcement). We introduce a new logic for reasoning about the interaction between knowledge and action, in which each agent in a system is assumed to perceive some subset of the overall set of Boolean variables in the system; these variables give rise to epistemic indistinguishability relations, in that two states are considered indistinguishable to an agent if all the variables visible to that agent have the same value in both states. In the dynamic component of the logic, we introduce actions $r(p, i)$ and $c(p, i)$: the effect of $r(p, i)$ is to reveal variable $p$ to agent $i$; the effect of $c(p, i)$ is to conceal $p$ from $i$. By using these dynamic operators, we can represent and reason about how the knowledge of agents changes when parts of their environment are concealed from them, or by revealing parts of their environment to them. Our main technical result is a sound and complete axiomatisation for our logic.
A Logic of Revelation and Concealment
Wiebe van der Hoek, Petar Iliev, Michael Wooldridgehas 2 papers

## Session 5A – Robotics III

## Session 5B – Teamwork II

5B_1
The growing use of autonomous agents in practice may require agents to cooperate as a team in situations where they have limited prior knowledge about one another, cannot communicate directly, or do not share the same world models. These situations raise the need to design \emph{ad hoc} team members, i.e., agents that will be able to cooperate without coordination in order to reach an optimal team behavior. This paper considers the problem of leading $N$-agent teams by an agent toward their optimal joint utility, where the agents compute their next actions based only on their most recent observations of their teammates' actions. We show that compared to previous results in two-agent teams, in larger teams the agent might not be able to lead the team to the action with maximal joint utility, thus its optimal strategy is to lead the team to the best possible \emph{reachable} cycle of joint actions. We describe a graphical model of the problem and a polynomial time algorithm for solving it. We then consider other variations of the problem, including leading teams of agents where they base their actions on longer history of past observations, leading a team by more than one ad hoc agent, and leading a teammate while the ad hoc agent is uncertain of its behavior.
Leading Ad Hoc Agents in Joint Action Settings with Multiple Teammates
Noa Agmonhas 2 papers, Peter Stonehas 5 papers

5B_2
This paper is concerned with evaluating different multiagent learning (MAL) algorithms in problems where individual agents may be heterogenous, in the sense of utilising different learning strategies, without the opportunity for prior agreements or information regarding coordination. Such a situation arises in \emph{ad hoc team} problems, a model of many practical multiagent systems applications. Prior work in multiagent learning has often been focussed on homogeneous groups of agents, meaning that all agents were identical and a priori aware of this fact. Also, those algorithms that are specifically designed for ad hoc team problems are typically evaluated in teams of agents with fixed behaviours, as opposed to agents which are adapting their behaviours. In this work, we empirically evaluate five MAL algorithms, representing major approaches to multiagent learning but originally developed with the homogeneous setting in mind, to understand their behaviour in a set of ad hoc team problems. All teams consist of agents which are continuously adapting their behaviours. The algorithms are evaluated with respect to a comprehensive characterisation of repeated matrix games, using performance criteria that include considerations such as attainment of equilibrium, social welfare and fairness. Our main conclusion is that there is no clear winner. However, the comparative evaluation also highlights the relative strengths of different algorithms with respect to the type of performance criteria, e.g., social welfare vs. attainment of equilibrium.
Comparative Evaluation of MAL Algorithms in a Diverse Set of Ad Hoc Team Problems
Stefano Albrecht, Subramanian Ramamoorthyhas 2 papers

## Session 5C – Emergence

5C_2
In this paper we present an approach for improving the accuracy of shared opinions in a large decentralised team. Specifically, our solution optimises the opinion sharing process in order to help the majority of agents to form the correct opinion about a state of a common subject of interest, given only few agents with noisy sensors in the large team. We build on existing research that has examined models of this opinion sharing problem and shown the existence of optimal parameters where incorrect opinions are filtered out during the sharing process. In order to exploit this collective behaviour in complex networks, we present a new decentralised algorithm that allows each agent to gradually regulate the importance of its neighbours' opinions (their social influence). This leads the system to the optimised state in which agents are most likely to filter incorrect opinions, and form a correct opinion regarding the subject of interest. Crucially, our algorithm is the first that does not introduce additional communication over the opinion sharing itself. Using it 80-90% of the agents form the correct opinion, in contrast to 60-75% with the existing message-passing algorithm DACOR proposed for this setting. Moreover, our solution is adaptive to the network topology and scales to thousands of agents. Finally, the use of our algorithm allows agents to significantly improve their accuracy even when deployed by only half of the team.
Efficient Opinion Sharing in Large Decentralised Teams
Oleksandr Pryymak, Alex Rogershas 9 papers, Nick Jenningshas 11 papers

5C_3
In recent years, social networking sites and social media have become a very important part of peoples' lives, driving everything from family relationships to revolutions. In this work, we study the different patterns of interaction behavior seen in an online social network. We investigate the difference in the relative time people allocate to their friends versus that which their friends allocate to them, and propose a measure for this difference in time allocation. The distribution of this measure is used to identify classes of social agents through agglomerative hierarchical clustering. These classes are then characterized in terms of two important structural attributes: Degree distributions and clustering coefficients.

We demonstrate our approach on two large social networks obtained from Facebook. For each network we have the list of all social interactions that took place over six months. The total number of people in the two networks is 939,453, and 841,456 with 1.4 million and 8.4 million interactions, respectively. Our results show that, based on the interaction behavior, there are four main classes of agents in both networks, and that they are consistent across the two networks. Furthermore, each class is characterized by a specific profile of degree distributions and clustering coefficients, which are also consistent across both networks.

We speculate that agents corresponding to the four classes play different roles in the social network. To test this, we developed an opinion propagation model where opinions are represented as m-bit strings communicated from agent to agents. An agent receiving an opinion then selectively modifies its own opinions depending on the social and informational value it places upon communications from the sending agent, its overall agreement with the sending agent, and its own propensities. Opinions are injected into the system by agents of specific classes and their spread is tracked by propagating tags. The resulting data is used to analyze the influence of agents from each class in the viral spread of ideas under various conditions. The analysis also shows what behavioral factors at the agent level have the most significant impact on the spreading of ideas.
Agents of Influence in Social Networks
Amer Ghanem, Srinivasa Vedanarayanan, Ali Minai

5C_4
Agents make commitments towards others in order to influence others in a certain way, often by dismissing more profitable options. Most commitments depend on some incentive that is necessary to ensure that the action is in the agent's interest and thus, may be carried out to avoid eventual penalties. The capacity for using commitment strategies effectively is so important that natural selection may have shaped specialized capacities to make this possible. Evolutionary explanations for commitment, particularly its role in the evolution of cooperation, have been actively sought for and discussed in several fields, including Psychology and Philosophy. In this paper, using the tools of evolutionary game theory, we provide a new model showing that individuals tend to engage in commitments, which leads to the emergence of cooperation even without assuming repeated interactions. The model is characterized by two key parameters: the punishment cost of failing commitment imposed on either side of a commitment, and the cost of managing the commitment deal. Our analytical results and extensive computer simulations show that cooperation can emerge if the punishment cost is large enough compared to the management cost.
The Emergence of Commitments and Cooperation
The Anh Han , Luís Moniz Pereira , Francisco C. Santos

## Session 5D – Auction & Mechanism Design

5D_1
Revenue maximization in multi-item settings is notoriously elusive. This paper studies a class of two-item auctions which we call a \emph{mixed-bundling auction with reserve prices (MBARP)}. It calls VCG on an enlarged set of agents by adding the seller -- who has reserve valuations for each bundle of items -- and a fake agent who receives nothing nor has valuations for any item or bundle, but has a valuation for pure bundling allocations, i.e., allocations where the two items are allocated to a single agent. This is a strict subclass of several known classes of auctions, including the affine maximizer auction (AMA), $\lambda$-aution, and the virtual valuations combinatorial auction (VVCA). As we show, a striking feature of MBARP is that its revenue can be represented in a simple closed form as a function of the parameters. Thus, we can solve first-order conditions on the parameters and obtain the optimal MBARP. The optimal MBARP yields significantly higher revenue than prior auctions for which the revenue-maximizing parameters could be solved for in closed form: separate Myerson auctions, pure-bundling Myerson auction, VCG, and mixed-bundling auction without reserve prices. Its revenue even exceeds that obtained via simulation within broader classes: VVCA and AMA.
Mixed-bundling auctions with reserve prices
Pingzhong Tang, Tuomas Sandholmhas 4 papers

5D_3
Many important problems in multiagent systems involve the allocation of multiple resources among the agents. For resource allocation problems, the well-known VCG mechanism satisfies a list of desired properties, including efficiency, strategy-proofness, individual rationality, and the non-deficit property. However, VCG is generally not budget-balanced. Under VCG, agents pay the VCG payments, which reduces social welfare. To offset the loss of social welfare due to the VCG payments, VCG redistribution mechanisms were introduced. These mechanisms aim to redistribute as much VCG payments back to the agents as possible, while maintaining the aforementioned desired properties of the VCG mechanism.

We continue the search for worst-case optimal VCG redistribution mechanisms -- mechanisms that maximize the fraction of total VCG payment redistributed in the worst case. Previously, a worst-case optimal VCG redistribution mechanism (denoted by WCO) was characterized for multi-unit auctions with nonincreasing marginal values. Later, WCO was generalized to settings involving heterogeneous items, resulting in the HETERO mechanism. This \emph{conjectured} that HETERO is feasible and worst-case optimal for heterogeneous-item auctions with unit demand. In this paper, we propose a more natural way to generalize the WCO mechanism. We prove that our generalized mechanism, though represented differently, actually coincides with HETERO. Based on this new representation of HETERO, we prove that HETERO is indeed feasible and worst-case optimal in heterogeneous-item auctions with unit demand. Finally, we conjecture that HETERO remains feasible and worst-case optimal in the even more general setting of combinatorial auctions with gross substitutes.
Worst-Case Optimal Redistribution of VCG Payments in Heterogeneous-Item Auctions with Unit Demand
Mingyu Guo

5D_4
In real electronic markets, each bidder arrives and departs over time. Thus, such a mechanism that must make decisions dynamically without knowledge of the future is called an \emph{online mechanism}. In an online mechanism, it is very unlikely that the mechanism designer knows the number of bidders beforehand or can verify the identity of all of them. Thus, a bidder can easily submit multiple bids (\emph{false-name bids}) using different identifiers (e.g., different e-mail addresses). In this paper, we formalize false-name manipulations in online mechanisms and identify a simple property called (value, time, identifier)-monotonicity that characterizes the allocation rules of false-name-proof online auction mechanisms. To the best of our knowledge, this is the first work on false-name-proof online mechanisms. Furthermore, we develop a new false-name-proof online auction mechanism for $k$ identical items. When $k$=1, this mechanism corresponds to the optimal stopping rule of the secretary problem where the number of candidates is unknown. We show that the competitive ratio of this mechanism for efficiency is 4 and independent from $k$ by assuming that only the distribution of bidders' arrival times is known and that the bidders are impatient.
False-name-proofness in Online Mechanisms
Taiki Todo, Takayuki Mouri, Atsushi Iwasakihas 4 papers, Makoto Yokoohas 4 papers

## Session 5E – Game & Agent Theories

5E_2
Boolean games are a compact and expressive class of games, based on propositional logic. However, Boolean games are computationally complex: checking for the existence of pure Nash equilibria in Boolean games is $\Sigma^p_2$-complete, and it is co-NP-complete to check whether given a given outcome for a Boolean game is a pure Nash equilibrium. In this paper, we consider two possible avenues to tractability in Boolean games. First, we consider the development of alternative solution concepts for Boolean games. We introduce the notion of $k$-bounded Nash equilibrium, meaning that no agent can benefit from deviation by altering fewer than $k$ variables. After motivating and discussing this notion of equilibrium, we give a logical characterisation of a class of Boolean games for which $k$-bounded equilibria correspond to Nash equilibria. That is, we precisely characterise a class of Boolean games for which all Nash equilibria are in fact $k$-bounded Nash equilibria. Second, we consider classes of Boolean games for which computational problems related to Nash equilibria are easier than in the general setting. We first identify some restrictions on games that make checking for beneficial deviations by individual players computationally tractable, and then show that certain types of \emph{socially desirable} equilibria can be hard to compute even when the standard decision problems for Boolean games are easy. We conclude with a discussion of related work and possible future work.
Towards Tractable Boolean Games
Paul Dunne, Michael Wooldridgehas 2 papers

5E_4
In many multiagent domains, no single observation event is sufficient to determine that the behavior of individuals is suspicious. Instead, suspiciousness must be inferred from a combination of multiple events, where events refer to the individual's interactions with other individuals. Hence, a detection system must employ a detector that combines evidence from multiple events, in contrast to most previous work, which focuses on the detection of a single, clearly suspicious event. This paper proposes a two-step detection system, where it first detects trigger events from multiagent interactions, and then combines the evidence to provide a degree of suspicion. The paper provides three key contributions: (i) proposes a novel detector that generalizes a utility-based plan recognition with arbitrary utility functions, (ii) specifies conditions that any reasonable detector should satisfy, and (iii) analyzes three detectors and compares them with the proposed approach. The results on a simulated airport domain and a dangerous-driver domain show that our new algorithm outperforms other approaches in several settings.
Detection of Suspicious Behavior from a Sparse Set of Multiagent Interactions
Boštjan Kalužahas 2 papers, Gal Kaminkahas 3 papers, Milind Tambehas 13 papers

## Session 5F – Logic and Verification

5F_1
Emotions is a cognitive mechanism that directs an agent's thoughts and attentions to what is relevant, important, and significant. Such a mechanism is crucial for the design of resource-bounded agents that must operate in highly-dynamic, semi-predictable environments and which need mechanisms for allocating their computational resources efficiently. The aim of this work is to propose a logical analysis of emotions and their influences on an agent's behaviour. We focus on four emotion types (viz, hope, fear, joy, and distress) and provide their logical characterizations in a model logic framework. As the intensity of emotion is essential for its influence on an agent's behaviour, the logic is devised to represent and reason about graded beliefs, graded goals and intentions. The belief strength and the goal strength determine the intensity of emotions. Emotions trigger different types of coping strategy which are aimed at dealing with emotions either by forming or revising an intention to act in the world, or by changing the agent's interpretation of the situation (by changing beliefs or goals).
A logic of emotions: from appraisal to coping
Mehdi Dastanihas 4 papers, Emiliano Lorini