Leveraging Multi-Modal Saliency and Fusion for Gaze Target Detection