Mine too, at first. But actually it is almost like simple algebra. One just needs to set up an equation system that needs to be solved. I'm sure many around here are able to solve two equations with two unknowns. Finding the camera position basically just becomes solving maybe 8 equations with 8 unknowns. Sure it's a lot more work, but the principle is the same :)
I show the camera a pattern of points, then i define some points on the patterns as being at some specific location in 3D space, e.g. the first point on the pattern could be at location 0,0,0 the second at 1,0,0 etc. Objects are projected into the camera frame (image) with the equation:
(1)
\begin{equation} x' = P * X \end{equation}
Where $x'$ is the point in the image, $X$ is a 3D location (these i define), and $P$ is the camera matrix (see also projection) which both includes the camera's 3D position and its orientation. I "just" need enough points to solve $P$ because i know $X$ and $x'$.
That sounds easy enough, but of course there is noise in the image, so what is on the image won't be perfect coordinates. The points are also quantized to the screen resolution which also adds noise, so in reality the equations are solved using singular value decomposition and some smart way of excluding outliers (points which have too much noise and only degrade the estimation of $P$).
This was just a little reading for those that are bored :) Btw, the reason we were stuck was not in estimating the position of the camera itself because this was actually just a function call in the OpenCV library, but we couldn't make sense of the coordinates it provided. Another problem we had was actually to find the 3D points in the camera image. Suffice to say it took a long time to find although it shouldn't have.