Accurate fault defection of bearing is critical in condition-based maintenance to improve system reliability and reduce operational cost. This paper introduces a deep transfer learning-based approach for bearing fault diagnosis by fusing heterogeneous information from multiple sources. Convolutional neural networks (CNN) are first designed to extract critical features by mapping extremely high-dimensional signals such as vibration and images to a much lower dimensional latent space. By partially retaining the resultant CNN architectures and parameters, it becomes possible to transfer and fuse the knowledge gained from multiple heterogeneous sources to improve the robustness and accuracy of fault diagnosis of bearings. With the prior knowledge, a deep transfer learning (DTL) architecture is designed to incorporate the heterogeneous data and trained to detect bearing faults. To future improve the performance of bearing fault diagnosis, a performance-driven optimization approach is developed to optimize the validation accuracy of bearing diagnosis by successively designing the architectures of the deep transfer networks. The CWRU experimental data is utilized to demonstrate the performance of the proposed approach.