In industrial assembly tasks, the in-hand pose of grasped objects needs to be known with high precision for subsequent manipulation tasks such as insertion. This problem (in-hand-pose estimation) has traditionally been addressed using visual recognition or tactile sensing. On the one hand, while visual recognition can provide efficient pose estimates, it tends to suffer from low precision due to noise, occlusions and calibration errors. On the other hand, tactile fingertip sensors can provide precise complementary information, but their low durability significantly limits their use in real-world applications. To get the best of both worlds, we propose an efficient method for in-hand pose estimation using off-the-shelf cameras and robot wrist force sensors, which requires no precise camera calibration. The key idea is to utilize visual and contact information adaptively to maximally reduce the uncertainty about the in-hand object pose in a Bayesian state estimation framework. As most of the uncertainty can be resolved from visual observations, our approach reduces the number of physical environment interactions while keeping a high pose estimation accuracy. Our experimental evaluation demonstrates that our approach can estimate object poses with sub-mm precision with an off-the-shelf camera and force-torque sensor.