1 Optimization 8-Queens Problem Solution by Local Search
Transcription
1 Optimization 8-Queens Problem Solution by Local Search
Parameter Estimation as Optimization Gradient Descent and Related Methods: Hill-climbing & Optimization Holger Schultheis Nov 4th, 2014 • Recall from last session: – Free parameters of models are parameters, whose values can not be fixed a-priori • Parameter estimation tries to determine “good” values for these parameters – What is “good” is context dependent • Finding good values is essentially an optimization problem Optimization • • • • an AI search technique goal: find optimal solution to a problem path to solution does not matter e.g. in integrated-circuit design, vehicle routing, network optimization, layout problems, … 8-Queens Problem • place 8 queens on a chessboard such that no queen attacks any other • example: 8-queens problem … Solution by Local Search • • • • Objective Function use current state, move to neighboring states no multiple-path search no systematic search advantages: – constant amount of memory – reasonable solutions in large/infinite/continuous state spaces • states are evaluated using an objective function State Space Landscape 1 Hill-Climbing Search • • • • continually moves in direction of increasing value terminates when no neighbor has higher value no look-ahead beyond immediate neighbors algorithm: In the 8-Queens Example... • all queens on board, one per column • successor function returns all possible successor states – moving one queen to another square in her column – 8 * 7 = 56 successors • cost function: number of pairs of queens attacking each other • global minimum: 0 Cost Function & Local Minimum Hill Climbing • "greedy" local search – grabs good neighbor without thinking ahead – often rapid progress towards a solution – e.g. just 5 steps here: this state: h=17 best successors: h=12 typically random selection of successor local minimum with h=1 every successor has higher cost Hill-Climbing Problems • local maxima – gets stuck in sub-optimal solution • ridges – sequence of local maxima • plateaux – flat local maximum or shoulder – however... Performance in 8-Queens Problem • hill climbing gets stuck in 86% when starting from random constellation • takes 3-4 steps on average to get stuck or to succeed • good performance for state space of approx. 17 million states • improvement: using sideways moves on plateaux raises success from 14% to 94% 2 Hill-Climbing: Variants • stochastic hill-climbing – choses randomly among potential successors – sometimes better than steepest ascent Hill-Climbing: Conclusion & Beyond • hill-climbing depends on the shape of the state-space landscape • first-choice hill-climbing – generates successors randomly and picks first – good for many successors • random-restart hill-climbing – restarts from randomly generated initial state when failed – roughly 7 iterations with 8-queens problem An Example • planning of 3 new airports in a country • distances from each city in the country to its nearest airport should be minimal • State space defined by coordinates of airports • so far: discrete environments • however: most real-world environments are continuous – infinitely many states in state space An Example, cnt'd • moving in state space = moving airports on the map • objective function f(x1, y1, x2, y2, x3, y3) – easy to compute for particular state – hard to describe in general – (x1, y1), (x2, y2), (x3, y3) – 6 variables – 6-dimensional space • this is a mathematical optimization problem! – 6-dimensional vector of variables: x Mathematical Optimization Objective Function, 2D Examples • formulation and solution of a constrained optimization problem: 3 Contour Representations with constraints Optimization Techniques • how to deal with complex higherdimensional objective functions? • solution: use of gradient of the landscape of state spaces • what does that mean? • compare to 1st derivative in the 1D case Gradient Vector of f(x) • For a function f(x) there is at any point x a vector of first order partial derivatives (gradient vector): 1st partial derivative w.r.t. x1 Gradient Vector of f(x) • The gradient of the objective function is a vector that gives the magnitude and direction of the steepest slope • ... or visualized: 1st derivative operator objective function Gradient Vector, depicted • The gradient vector g(x) is perpendicular to the contours and in the direction of maximum increase of f(x). Unconstrained Minimization • Considering the unconstrained problem • Questions: – What are the conditions for a minimum? – Is the minimum unique? – Are there any relative minima? • different types of minima... 4 Types of minima, single variable Types of minima, two variables General Method Newton Method Illustrated • for computing local minima: solve systems of equations (necessary condition) • solution by Newton's method (also called Newton-Raphson method) • Iterative computation of successively better approximations of roots (zeroes) of a function 28 Optimization by Newton • We are looking for the roots of the gradient • => We need the second derivative Hessian Matrix of f(x) • After German mathematician Ludwig Otto Hesse (1811-74) • Square matrix of 2nd order partial derivatives of a function • This is given as the Hessian Matrix. 5 Solution by Newton's Method Problems with Newton's Method • The method is not always convergent, even if x0 is close to x* • The method requires the computation of the Hessian matrix at each iteration • for i=0,1,2,... • Hopefully • Thus, in practice the simple basic Newton method is not recommended... Better: Line Search Descent Algorithm • Line search descent methods • Use initial estimate x0 to the optimum point • Generate sequences of better estimates by successively searching directly in a direction of descent • Terminate if no further progress or if the necessary condition is sufficiently accurately satisfied directional derivative of f(xi) in the direction ui+1 Descent Condition & Illustration • Descent Condition: (ui+1 denotes a descent direction, i.e. the directional derivative is negative) • Sequence of line search descent directions and steps: Even Better: • Method of steepest descent: line search in the direction of steepest descent • Steepest descent direction: • Successive steepest descent directions are orthogonal: 6 Convergence Criteria • Stop, if one or a combination of the following criteria is fulfilled: Empirical Gradient Descent • What if the (partial) derivatives of the objective function are unknown? • Empirical gradient: Evaluating the change in the objective function for small changes in each coordinate • Empirical Gradient Descent: hill climbing in a disvretized version of the state space. To Conclude... • optimization in AI: ad hoc hill-climbing & improvements • objective function & contour representations • in general: mathematical optimization • gradient descent by line search • there is more... 7