1 Optimization 8-Queens Problem Solution by Local Search
Parameter Estimation as Optimization
Gradient Descent
and Related Methods:
Hill-climbing & Optimization
Holger Schultheis
Nov 4th, 2014
•  Recall from last session:
–  Free parameters of models are parameters,
whose values can not be fixed a-priori
•  Parameter estimation tries to determine
“good” values for these parameters
–  What is “good” is context dependent
•  Finding good values is essentially an
optimization problem
an AI search technique
goal: find optimal solution to a problem
path to solution does not matter
e.g. in integrated-circuit design, vehicle
routing, network optimization, layout
problems, …
8-Queens Problem
•  place 8 queens on a
chessboard such that
no queen attacks any
•  example: 8-queens problem …
Solution by Local Search
Objective Function
use current state, move to neighboring states
no multiple-path search
no systematic search
–  constant amount of memory
–  reasonable solutions in large/infinite/continuous
state spaces
•  states are evaluated using an objective
State Space Landscape
Hill-Climbing Search
continually moves in direction of increasing value
terminates when no neighbor has higher value
no look-ahead beyond immediate neighbors
In the 8-Queens Example...
•  all queens on board, one per column
•  successor function returns all possible
successor states
–  moving one queen to another square in her
–  8 * 7 = 56 successors
•  cost function: number of pairs of queens
attacking each other
•  global minimum: 0
Cost Function & Local Minimum
Hill Climbing
•  "greedy" local search
–  grabs good neighbor without thinking ahead
–  often rapid progress towards a solution
–  e.g. just 5 steps here:
this state: h=17
best successors: h=12
typically random selection of successor
local minimum with h=1
every successor has higher cost
Hill-Climbing Problems
•  local maxima
–  gets stuck in sub-optimal solution
•  ridges
–  sequence of local maxima
•  plateaux
–  flat local maximum or shoulder
–  however...
Performance in 8-Queens Problem
•  hill climbing gets stuck in 86% when
starting from random constellation
•  takes 3-4 steps on average to get stuck or
to succeed
•  good performance for state space of
approx. 17 million states
•  improvement: using sideways moves on
plateaux raises success from 14% to 94%
Hill-Climbing: Variants
•  stochastic hill-climbing
–  choses randomly among potential successors
–  sometimes better than steepest ascent
Hill-Climbing: Conclusion & Beyond
•  hill-climbing depends on the shape of the
state-space landscape
•  first-choice hill-climbing
–  generates successors randomly and picks first
–  good for many successors
•  random-restart hill-climbing
–  restarts from randomly generated initial state when
–  roughly 7 iterations with 8-queens problem
An Example
•  planning of 3 new airports in a country
•  distances from each city in the country to
its nearest airport should be minimal
•  State space defined by coordinates of
•  so far: discrete environments
•  however: most real-world environments
are continuous
–  infinitely many states in state space
An Example, cnt'd
•  moving in state space = moving airports
on the map
•  objective function f(x1, y1, x2, y2, x3, y3)
–  easy to compute for particular state
–  hard to describe in general
–  (x1, y1), (x2, y2), (x3, y3)
–  6 variables
–  6-dimensional space
•  this is a mathematical optimization
–  6-dimensional vector of variables: x
Mathematical Optimization
Objective Function, 2D Examples
•  formulation and solution of a constrained
optimization problem:
Contour Representations
with constraints
Optimization Techniques
•  how to deal with complex higherdimensional objective functions?
•  solution: use of gradient of the landscape
of state spaces
•  what does that mean?
•  compare to 1st derivative in the 1D case
Gradient Vector of f(x)
•  For a function f(x) there is at any point x a
vector of first order partial derivatives
(gradient vector):
1st partial derivative
w.r.t. x1
Gradient Vector of f(x)
•  The gradient of the objective function is a
vector that gives the magnitude and
direction of the steepest slope
•  ... or visualized:
1st derivative operator
objective function
Gradient Vector, depicted
•  The gradient vector g(x) is perpendicular
to the contours and in the direction of
maximum increase of f(x).
Unconstrained Minimization
•  Considering the unconstrained problem
•  Questions:
–  What are the conditions for a minimum?
–  Is the minimum unique?
–  Are there any relative minima?
•  different types of minima...
Types of minima, single variable
Types of minima, two variables
General Method
Newton Method Illustrated
•  for computing local minima:
solve systems of equations
(necessary condition)
•  solution by Newton's method
(also called Newton-Raphson method)
•  Iterative computation of successively
better approximations of roots (zeroes)
of a function
Optimization by Newton
•  We are looking for the roots
of the gradient
•  => We need the second derivative
Hessian Matrix of f(x)
•  After German mathematician Ludwig Otto Hesse
•  Square matrix of 2nd order partial derivatives of a
•  This is given as the Hessian Matrix.
Solution by Newton's Method
Problems with Newton's Method
•  The method is not
always convergent,
even if x0 is close
to x*
•  The method requires
the computation of
the Hessian matrix
at each iteration
•  for i=0,1,2,...
•  Hopefully
•  Thus, in practice the simple basic Newton
method is not recommended...
Line Search Descent Algorithm
•  Line search descent methods
•  Use initial estimate x0 to the optimum point
•  Generate sequences of better estimates
by successively searching directly in a
direction of descent
•  Terminate if no further progress or if the
necessary condition is sufficiently
accurately satisfied
directional derivative
of f(xi) in the direction ui+1
Descent Condition & Illustration
•  Descent Condition:
(ui+1 denotes a descent direction, i.e. the directional derivative is negative)
•  Sequence of line search
descent directions
and steps:
Even Better:
•  Method of steepest descent:
line search in the direction of steepest
•  Steepest descent direction:
•  Successive steepest descent
directions are orthogonal:
Convergence Criteria
•  Stop, if one or a combination of the
following criteria is fulfilled:
Empirical Gradient Descent
•  What if the (partial) derivatives of the
objective function are unknown?
•  Empirical gradient: Evaluating the change
in the objective function for small changes
in each coordinate
•  Empirical Gradient Descent: hill climbing in
a disvretized version of the state space.
To Conclude...
•  optimization in AI:
ad hoc hill-climbing & improvements
•  objective function & contour
•  in general: mathematical optimization
•  gradient descent by line search
•  there is more...