Sample Problem Set - SOLUTIONS

Transcription

Sample Problem Set - SOLUTIONS
EGGN 512
Computer Vision
Sample Problem Set - SOLUTIONS
The exam will be closed book, but handwritten notes are allowed. The problems below are
representative of exam problems (although there may be more problems than would appear on
the actual exam). Some of the problems below are drawn from previous exams.
For reference, here are some equations (these will be provided on the exam):

A
B
The rotation matrix for XYZ fixed angles is:
RXYZ  X , Y ,  Z   RZ ( Z ) RY (Y ) RX ( X )
0 
 cz  sz 0   cy 0 sy   1 0




  sz cz 0   0 1 0   0 cx  sx 
0
0 1    sy 0 cy   0 sx cx 

where
cx  cos( X ), sy  sin(Y ), etc

The matrix for a rotation about the axis k by an angle  is
 k x k x v  c k x k y v  k z s k x k z v  k y s 


Rk     k x k y v  k z s k y k y v  c k y k z v  k x s 
 k k v  k s k k v  k s
k z k z v  c 
y
y z
x
 x z
where
c  cos  , s  sin  , v  1  cos 
T
kˆ  k , k , k 
x
y
z
1
EGGN 512
Computer Vision
1. Describe what kind of image the following Matlab code will generate, and draw a
sketch.
I=zeros(128,128);
for i=1:128
for j=1:128
if (i-64)*(i-64) + (j-64)*(j-64) < 40*40
I(i,j)=j;
else
I(i,j)=i;
end
end
end
Solution:
The background increases from dark at the top of the 128x128 image to light at the bottom of the
image. In the middle is a circle of radius 40, whose intensity increases from left to right.
2. Write a Matlab program that will generate an image of a checkerboard, consisting of
black and white squares in a 8x8 pattern. Each square is 10x10 pixels.
Solution:
There are many possible ways to do this. Here is one:
N = 10;
% Size of each square, in pixels
I = false(8*N,8*N);
for i=1:2*N:8*N
I(:, i:i+N-1) = true;
end
I = xor(I, I');
imshow(I, []);
2
EGGN 512
Computer Vision
3. A digital camera is modeled as a pinhole camera with focal length f. It has 512x512
sensor elements (pixels), where the center pixel (x,y) = (256,256) corresponds to the
optical axis. A point P has 3-D coordinates (1m, 2m, 8m) in camera coordinates, and
projects to pixel (x,y) = (356,456). Find the pixel projection of point Q, if Q has 3-D
coordinates (-3m, -1m, 16m) in camera coordinates.
Solution:
We can solve for focal length since we know from point P, (356-256) = f (1 m/ 8 m). So f = 800
pixels.
For Q:
(x-x0) = f X/Z = (800) (-3)/(16) = -150, or x = -150+256 = 106
(y-y0) = f Y/Z = (800) (-1)/(16) = -50, or y = -50+256 = 206
4. Correlate the horizontal Sobel filter with the image below. You may assume that the
image is padded with zeros beyond the visible boundaries of the image.
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
0
0
0
0
1
1
1
1
0
0
0
0
1
1
1
1
0
0
0
0
1
1
1
1
0
0
0
0
1
1
1
1
0
0
Solution:
The horizontal Sobel filter is
-1 0
+1
-2 0
+2
-1 0
+1
Correlation is defined as g ( x, y) 
m /2
n /2
 
h( s, t ) f ( x  s, y  t )  h  f . We will get
s  m /2 t  n /2
0
0
0
0
0
0
0
0
3
EGGN 512
Computer Vision
0
0
0
0
0
0
0
0
0
1
1
0
0
-1
-1
0
0
3
3
0
0
-3
-3
0
0
4
4
0
0
-4
-4
0
0
4
4
0
0
-4
-4
0
0
4
4
0
0
-4
-4
0
0
3
3
0
0
-3
-3
0
5. A square pixel camera looks down at a workbench. Points P1 and P2 have (x,y) image
coordinates (100,100) and (100,200), respectively. The locations of points P1 and P2 on
the X,Y plane of the workbench are (500mm, 300mm) and (1000mm, 300mm),
respectively. What is the X,Y location of point P3 on the workbench if its image
coordinates are (250,400)?
We will treat this as a rotation, followed by a scaling, followed by a translation.
The rotation angle is w – i = 0 – 90 = -90.
Next determine the scale factor: The distance between P1 and P2 is 100 pixels in the image, and
500 mm on the workbench. So the scale s = 500 mm/100 pixels = 5 mm/pixel.
 x w   1 0 x0  s 0 0  cos 
  


 y w    0 1 y 0  0 s 0  sin 
 1   0 0 1  0 0 1  0
  


 sin 
cos 
0
0  xi 
 
0  yi 
1  1 
x w  xi s cos   yi s sin   x0
y w  xi s sin   yi s cos   y 0
Using the coordinates of P1:
500 mm = (100 pix)(5 mm/pix) 0 – (100 pix)(5 mm/pix) (-1) + x0 = 500 mm + x0
300 mm = (100 pix)(5 mm/pix) (-1) + (100 pix)(5 mm/pix) (0) + y0 = -500 mm + y0
or
x0 = 0, y0 = 800 mm
For P3:
xw = (250 pix)(5 mm/pix) 0 – (400 pix)(5 mm/pix) (-1) + 0 = 2000 mm
4
EGGN 512
Computer Vision
yw = (250 pix)(5 mm/pix) (-1) + (400 pix)(5 mm/pix) 0 + 800 mm = -450 mm
6. A sphere of radius = 1 meter is projected onto an image. The camera has a focal length
of 10 mm, and the each pixel corresponds to 0.01 mm on the image plane. The sphere
projects to a circle of radius = 20 pixels on the image plane. How far away is the sphere
from the camera (ie., what is its Z coordinate)?
Solution:
The sphere has a projected radius of 20 pixels or 0.2 mm. By similar triangles, 0.2 mm/10 mm
= 1 meter / Z, or Z = 50 meters
7. A camera is mounted on the end effector of a robot arm as shown below. The end
effector coordinate system {E} has its Z axis pointing up, its Y axis pointing to the right
in the figure, and the X axis pointing out of the page. The origin of the camera {C} is
located at a position of (X,Y,Z) = (0, 10 cm, 5 cm) with respect to {E}. The X axis of the
camera is aligned with the X axis of the end effector. The Z axis of {C}is tilted down at
an angle of 45 degrees with respect to the Y axis of {E}. Write the 4x4 transformation
matrix relating the pose of the camera with respect to the end effector, ECT .
{C}
y
z
z
{E}
y
Solution:
The columns of the rotation matrix CE R are the unit vectors of {C} with respect to {E}.
The X axis of the camera is [ 1, 0, 0 ]T with respect to {E}.
The Y axis of the camera is [ 0, -0.707, -0.707 ]T with respect to {E}.
The Z axis of the camera is [ 0, 0.707, -0.707 ]T with respect to {E}.
So
0
0
0
1


 0  0.707 0.707 10 
E
CT  
0  0.707  0.707 5 


0
0
1 
0
5
EGGN 512
Computer Vision
It is also possible to do this one by plugging into the formula for the rotation matrix for XYZ
angles … we have a single rotation of -135 degrees about the x axis.
8. Image A is modified by the affine transform below. Sketch image “B”.
 xB   1 0.1 10  xA 
  
 
20  y A 
 yB    0.2 1
1  0

0
1 
  
 1 
Image A (the x,y coordinates of the corners of the square are given):
10,10
10,50
50,10
50,50
The corners of the square map as follows:
xB =
xA – 0.1 yA + 10
yB = 0.2 xA +
yA + 20
So
(10,10) -> (10-1+10, 2+10+20) = (19, 32)
(10,50) -> (10-5+10, 2+50+20) = (15, 72)
(50,10) -> (50-1+10, 10+10+20) = (59, 40)
(50,50) -> (50-5+10, 10+50+20) = (55, 80)
Image B
19,32
15,72
59,40
55,80
6
EGGN 512
Computer Vision
9. The perspective projection matrix for a camera is given below. This matrix models
both the intrinsic parameters of the camera and the extrinsic parameters (i.e., its pose
in the world).
 0 100 500 1500 


M   500 100 0
0 
 0
1
0
10 

A point is located at (X=1, Y=0, Z=1) in world coordinates. What image point does it
project to?
Solution:
p = MP = [2000; 500; 10]. We divide by the third element to get the x,y image coordinates:
x = 2000/10 = 200
y = 500/10 = 50
10. A pinhole camera (with focal length = 1) observes a flat wall. The camera is oriented
with its optical axis 45 degrees from the normal to the wall. The distance to the wall
(along the optical axis) is 10 meters. A top-down view is shown below.
d=10 m
=45°
The camera points directly at the origin of the wall plane. The X axis of the wall points
to the right along the wall, and the Y axis points down.
The mapping of points from the wall plane to the image plane can be described by a
projective transform, or homography, such that
 x1   a11 a12 a13  X Wall 
  


 x2    a21 a22 a23  YWall  , ximg  x1 / x3 , yimg  x2 / x3
 x  a


 3   31 a32 a33  1 
Which of the following matrices is the correct projective transform?
0 10 
1 0 0
 0.707 0 0 
 .707
0 1 1 








1 0  (c)  0 .707 0  (d)  1 0 0 
(a)  0 1 0  (b)  0
0 0 1
 0.707 0 10 
 0
 0 0 10 
0
1 







7
EGGN 512
Computer Vision
Solution:
First transform the wall points to the camera’s coordinate system,
C
W
 CR
H   W
 0
Now,
C
W
C
Cam
Cam
P  Wall
H Wall P , where
t Worg 
.
1 
R is the rotation matrix that describes the orientation of frame {W} with respect to
frame {C} (see slide 6 of Lecture 3). The two frames differ by a rotation about the Y axis.
Wz
Wy
Wx
Cz
Cy
Cx
The rotation is -45 degrees (see slide 9 of Lecture 4): Start with the {W} frame aligned with the
{C} frame. Point your right thumb in the direction of Wy. Then rotate the {W} frame until you
get to the desired orientation. You will have to rotate -45 degrees; i.e., Y = -45°.
Wz
Cz
The arrow shows the
direction of rotation for
a positive angle
Wx
+Y
Cx
The resulting rotation matrix is
 cos  Y 0 sin Y   0.707 0  0.707 

 

C
0
1
0  0
1
0 
W R 
  sin  0 cos    0.707 0 0.707 
Y
Y 



This makes sense because the columns of the rotation matrix are the unit vectors of the {W}
frame, expressed in the representation of the {C} frame. So the first column is the X axis of
8
EGGN 512
Computer Vision
{W}. As you can see in the figure, it points in the +X and +Z direction in the {C} frame. The
third column is the Z axis of {W}, and it indeed points in the –X and +Z direction in the {C}
frame.
The full transformation matrix is a rotation about the Y axis by 45 degrees, plus a translation of
10.
 0.707 0 0.707 0 


0
1
0
0
Cam

Wall H 
 0.707 0 0.707 10 


0
0
1
 0
So
XCam = (0.707) XWall – (0.707) ZWall
YCam = YWall
ZCam = (0.707) XWall + (0.707) ZWall + 10
But the wall is a plane, so all its Z values are zero:
XCam = (0.707) XWall
YCam = YWall
ZCam = (0.707) XWall + 10
Under image perspective projection,
ximg = f XCam / ZCam = (0.707) XWall / ( (0.707) XWall + 10 )
yimg = f YCam / ZCam =
YWall / ( (0.707) XWall + 10 )
We can accomplish this with the matrix
 x1   0.707 0 0  X Wall 
  


1 0  YWall  ,
 x2    0
 x   0.707 0 10  1 
 3 


ximg  x1 / x3 , yimg  x2 / x3
11. Write MATLAB code to shrink the size (ie., total number of pixels) of an image I by a
factor of four by subsampling. Assume that the dimensions of I are 512x512. (Note for this problem, you are not allowed to use the MATLAB function “imresize”).
Solution:
for i=1:512
for j=1:512
I2(i/2, j/2) = I(i,j);
end
end
9
EGGN 512
Computer Vision
or
for i=1:256
for j=1:256
I2(i, j) = I(2*i,2*j);
end
end
or
I2 = I(1:2:end, :);
I3 = I2(:, 1:2:end);
12. Image A is mapped to image B using an affine transformation. A square in image A is
mapped to parallelogram in image B. In image A, the square has dimension LxL, with
its top left corner of the square is at location (L,L). In image B, the parallelogram has
the dimensions shown. Its top left corner is also at location (L,L). Give the affine
transformation that maps A to B.
A
B
L
L
L
L
L/2
Solution:
We want to find the elements of the matrix aij such that
 xB   a11 a12 t x  x A 
  
 
 yB    a21 a22 t y  y A 
1  0

0 1 
  
 1 
The corners of the square in image A are
(L,L) (2L,L)
(L,2L) (2L,2L)
The corresponding points in the image B are:
(L,L) (2L,L)
10
EGGN 512
Computer Vision
(3L/2,2L) (5L/2,2L)
Obviously yB = yA, so a21 = 0, a22 = 1, ty = 0.
To find the rest of the values, write the equations for xB = a11 xA + a12 yA + tx
Writing down this equation at each tiepoint yields
L = a11 (L) + a12 (L) + tx
2L = a11 (2L) + a12 (L) + tx
3L/2 = a11 (L) + a12 (2L) + tx
Solving for these yields a11 = 1, a12 = 1/2, tx = -L/2
The affine transformation is
 xB   1 1/ 2  L / 2  x A 
  
 
0  y A 
 yB    0 1
 1  0 0

1 
  
 1 
11