簡易檢索 / 詳目顯示

研究生: 陳文柏
Daniel Stanley Tan
論文名稱: 基於合成分析之小樣本視覺深度學習技術
Analysis-by-synthesis for solving visual tasks under data constraints
指導教授: 花凱龍
Kai-Lung Hua
口試委員: 莊永裕
Yung-Yu Chuang
陳駿丞
Jun-Cheng Chen
陳永昇
Yong-Sheng Chen
蘇順豐
Shun-Feng Su
學位類別: 博士
Doctor
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2020
畢業學年度: 109
語文別: 英文
論文頁數: 146
中文關鍵詞: defect detectionanomaly detectionmemory auto-encoderimage-to-image translationimage synthesisgenerative adversarial networks
外文關鍵詞: defect detection, anomaly detection, memory auto-encoder, image-to-image translation, image synthesis, generative adversarial networks
相關次數: 點閱:354下載:8
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • In the past decade, the computer vision community witnessed the deep learning revolution that fueled several breakthroughs in the field, solving various visual tasks that were previously intractable. However, these breakthroughs were mostly a product of very deep networks with millions of parameters and large amounts of annotated data required to train it. This requirement makes it difficult to adopt these models in domains where several constraints hinder the collection of a large and diverse annotated data sets. In this thesis, I present novel solutions to the problems on anomaly detection and semantic image manipulation under three different data constraint settings: (1) In the setting where the data have limited coverage, I propose paMAE, an anomaly detection model that incorporates an external memory and a deep spatial perceptual distance that is more robust to shifts and small inaccuracies, which can stem from the lack of data coverage. This makes it more suitable for determining anomalies especially in highly textured surfaces. (2) In the setting where data are noisy, I propose TrustMAE, an anomaly detection model that uses a novel trust-region memory updating scheme to keep noise from polluting the model. (3) In the setting where the data are not available at all times, I propose IncrementalGAN, a model for image translation that can incrementally learn new domains while still retaining old domains without requiring any data from the old domains. Our proposed methods follow the framework of analysis-by-synthesis, wherein we leverage synthesizing and reconstructing images to provide ``annotation-free'' supervision to the model in learning meaningful representations that are useful for the target visual task. Our proposed methods perform competitively compared to existing approaches and achieve state-of-the-art performance in select benchmark tasks, despite the data constraints.


    In the past decade, the computer vision community witnessed the deep learning revolution that fueled several breakthroughs in the field, solving various visual tasks that were previously intractable. However, these breakthroughs were mostly a product of very deep networks with millions of parameters and large amounts of annotated data required to train it. This requirement makes it difficult to adopt these models in domains where several constraints hinder the collection of a large and diverse annotated data sets. In this thesis, I present novel solutions to the problems on anomaly detection and semantic image manipulation under three different data constraint settings: (1) In the setting where the data have limited coverage, I propose paMAE, an anomaly detection model that incorporates an external memory and a deep spatial perceptual distance that is more robust to shifts and small inaccuracies, which can stem from the lack of data coverage. This makes it more suitable for determining anomalies especially in highly textured surfaces. (2) In the setting where data are noisy, I propose TrustMAE, an anomaly detection model that uses a novel trust-region memory updating scheme to keep noise from polluting the model. (3) In the setting where the data are not available at all times, I propose IncrementalGAN, a model for image translation that can incrementally learn new domains while still retaining old domains without requiring any data from the old domains. Our proposed methods follow the framework of analysis-by-synthesis, wherein we leverage synthesizing and reconstructing images to provide ``annotation-free'' supervision to the model in learning meaningful representations that are useful for the target visual task. Our proposed methods perform competitively compared to existing approaches and achieve state-of-the-art performance in select benchmark tasks, despite the data constraints.

    Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . v Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xviii 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 2 Perceptual Attention Memory Auto-Encoder . . . . . . . . . . . 4 2.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2 Perceptual Attention Memory Auto-Encoder (paMAE) . . 9 2.2.1 Reconstructing from Memory . . . . . . . . . . . 10 2.2.2 Perceptual Attention . . . . . . . . . . . . . . . . 12 2.2.3 Multi-Thresholding for Localization . . . . . . . . 13 2.3 Unsupervised to Semi-supervised Extension . . . . . . . . 14 2.4 Experimental Results . . . . . . . . . . . . . . . . . . . . 16 2.4.1 Experimental Setup . . . . . . . . . . . . . . . . . 17 2.4.2 Comparison with Baselines . . . . . . . . . . . . 18 2.4.3 Evaluating Perceptual Attention . . . . . . . . . . 20 2.4.4 Evaluating Multi-Thresholding . . . . . . . . . . . 21 2.4.5 Evaluating Semi-supervised Setting . . . . . . . . 22 2.4.6 Results on Brain MRI . . . . . . . . . . . . . . . 24 2.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . 25 3 TrustMAE: A Noise-Resilient Defect Classification Framework using Memory Augmented Auto-Encoders with Trust Regions . 26 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.2 Related Works . . . . . . . . . . . . . . . . . . . . . . . . 29 3.3 Trust Memory Auto-Encoder (TrustMAE) . . . . . . . . . 30 3.3.1 Memory-Augmented Auto-Encoder . . . . . . . . 31 3.3.2 Trust Region Memory Updates . . . . . . . . . . . 34 3.3.3 Spatial Perceptual Distance . . . . . . . . . . . . . 37 3.4 Experimental Results . . . . . . . . . . . . . . . . . . . . 39 3.4.1 Implementation Details . . . . . . . . . . . . . . . 40 3.4.2 Varying Noise Levels . . . . . . . . . . . . . . . . 41 3.4.3 Comparison with Baselines . . . . . . . . . . . . 42 3.4.4 Ablation Study . . . . . . . . . . . . . . . . . . . 43 3.4.5 Memory Access Patterns . . . . . . . . . . . . . . 46 3.4.6 Effect of Trust Threshold δ2 . . . . . . . . . . . . 47 3.4.7 Effect of Down-Sampling Layers . . . . . . . . . 48 3.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . 48 4 IncrementalGAN: Incremental Learning of Multi-Domain Imageto- Image Translations . . . . . . . . . . . . . . . . . . . . . . . 50 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 50 4.2 Related works . . . . . . . . . . . . . . . . . . . . . . . . 54 4.2.1 Image-to-Image Translation . . . . . . . . . . . . 54 4.2.2 Incremental Learning . . . . . . . . . . . . . . . . 55 4.3 Proposed Method . . . . . . . . . . . . . . . . . . . . . . 57 4.3.1 Multi-domain Image-to-Image Translation . . . . 57 4.3.2 Incrementally Learning New Domains . . . . . . . 60 4.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . 65 4.4.1 Datasets . . . . . . . . . . . . . . . . . . . . . . . 65 4.4.2 Training Details . . . . . . . . . . . . . . . . . . . 66 4.4.3 Comparison against baselines . . . . . . . . . . . 67 4.4.4 Ablation Study . . . . . . . . . . . . . . . . . . . 69 4.4.5 Label matching vs Logit matching . . . . . . . . . 71 4.4.6 Domain Embedding Size . . . . . . . . . . . . . . 72 4.4.7 Amount of Generated Psuedo-real Data . . . . . . 73 4.4.8 Order of Domains . . . . . . . . . . . . . . . . . 75 4.4.9 Varying Thresholds . . . . . . . . . . . . . . . . . 77 4.4.10 Comparing Training Time . . . . . . . . . . . . . 78 4.4.11 User Study . . . . . . . . . . . . . . . . . . . . . 79 4.4.12 Frechet Inception Distance (FID) . . . . . . . . . 80 4.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . 81 5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 1.1 Additional Implementation Details . . . . . . . . . . . . . 93 1.2 Baselines Implementation Details . . . . . . . . . . . . . 96 1.3 Additional Results . . . . . . . . . . . . . . . . . . . . . 97

    [1] S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-time object detection with region proposal networks,” in Advances in neural information processing systems, pp. 91–99, 2015.
    [2] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in 2009 IEEE conference on computer vision and pattern recognition, pp. 248–255, Ieee, 2009.
    [3] M. Raghu, C. Zhang, J. Kleinberg, and S. Bengio, “Transfusion: Understanding transfer learning for medical imaging,” in Advances in neural information processing systems, pp. 3347–3357, 2019.
    [4] A. R. Zamir, A. Sax, W. Shen, L. J. Guibas, J. Malik, and S. Savarese, “Taskonomy: Disentangling task transfer learning,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3712–3722, 2018.
    [5] S. Kornblith, J. Shlens, and Q. V. Le, “Do better imagenet models transfer better?,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2661–2671, 2019.
    [6] Y. Zhang and B. D. Davison, “Impact of imagenet model selection on domain adaptation,” in Proceedings of the IEEE Winter Conference on Applications of Computer Vision Workshops, pp. 173–182, 2020.
    [7] P. Bergmann, M. Fauser, D. Sattlegger, and C. Steger, “Mvtec ad–a comprehensive real-world dataset for unsupervised anomaly detection,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9592–9600, 2019.
    [8] C. Steger, M. Ulrich, and C. Wiedemann, Machine vision algorithms and applications. John Wiley & Sons, 2018.
    [9] P. Bergmann, S. L¨owe, M. Fauser, D. Sattlegger, and C. Steger, “Improving unsupervised defect segmentation by applying structural similarity to autoencoders,” in Proceedings of the 14th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP), pp. 372–380, 2019.
    [10] M. Haselmann, D. P. Gruber, and P. Tabatabai, “Anomaly detection using deep learning based image completion,” in 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA), pp. 1237–1242, IEEE, 2018.
    [11] D. T. Nguyen, Z. Lou, M. Klar, and T. Brox, “Anomaly detection with multiple-hypotheses predictions,” in International Conference on Machine Learning, pp. 4800–4809, 2019.
    [12] B. Zong, Q. Song, M. R. Min, W. Cheng, C. Lumezanu, D. Cho, and H. Chen, “Deep autoencoding gaussian mixture model for unsupervised anomaly detection,” in International Conference on Learning Representations, 2018. 85
    [13] Y.-H. Yoo, U.-H. Kim, and J.-H. Kim, “Convolutional recurrent reconstructive network for spatiotemporal anomaly detection in solder paste inspection,” arXiv preprint arXiv:1908.08204, 2019.
    [14] Y. Qu, M. He, J. Deutsch, and D. He, “Detection of pitting in gears using a deep sparse autoencoder,” Applied Sciences, vol. 7, no. 5, p. 515, 2017.
    [15] D. Gong, L. Liu, V. Le, B. Saha, M. R. Mansour, S. Venkatesh, and A. v. d. Hengel, “Memorizing normality to detect anomaly: Memory-augmented deep autoencoder for unsupervised anomaly detection,” arXiv preprint arXiv:1904.02639, 2019.
    [16] A. Graves, G. Wayne, and I. Danihelka, “Neural turing machines,” arXiv preprint arXiv:1410.5401, 2014.
    [17] V. Natarajan, T.-Y. Hung, S. Vaikundam, and L.-T. Chia, “Convolutional networks for voting-based anomaly classification in metal surface inspection,” in 2017 IEEE International Conference on Industrial Technology (ICIT), pp. 986–991, IEEE, 2017.
    [18] B. Staar, M. L¨utjen, and M. Freitag, “Anomaly detection with convolutional neural networks for industrial surface inspection,” Procedia CIRP, vol. 79, pp. 484–489, 2019.
    [19] L. Ruff, R. Vandermeulen, N. Goernitz, L. Deecke, S. A. Siddiqui, A. Binder, E. M¨uller, and M. Kloft, “Deep one-class classification,” in International Conference on Machine Learning, pp. 4393–4402, 2018.
    [20] B. Sch¨olkopf, R. C. Williamson, A. J. Smola, J. Shawe-Taylor, and J. C. Platt, “Support vector method for novelty detection,” in Advances in neural information processing systems, pp. 582–588, 2000.
    [21] L. Ruff, R. A. Vandermeulen, N. Grnitz, A. Binder, E. Mller, K.-R. Mller, and M. Kloft, “Deep semisupervised anomaly detection,” in International Conference on Learning Representations, 2020.
    [22] M. Ferguson, R. Ak, Y.-T. T. Lee, and K. H. Law, “Automatic localization of casting defects with convolutional neural networks,” in 2017 IEEE International Conference on Big Data (Big Data), pp. 1726–1735, IEEE, 2017.
    [23] X. Sun, J. Gu, R. Huang, R. Zou, and B. Giron Palomares, “Surface defects recognition of wheel hub based on improved faster r-cnn,” Electronics, vol. 8, no. 5, p. 481, 2019.
    [24] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 779–788, 2016.
    [25] M. K. Ferguson, A. Ronay, Y.-T. T. Lee, and K. H. Law, “Detection and segmentation of manufacturing defects with convolutional neural networks and transfer learning,” Smart and sustainable manufacturing systems, vol. 2, 2018.
    [26] K. He, G. Gkioxari, P. Doll´ar, and R. Girshick, “Mask r-cnn,” in Proceedings of the IEEE international conference on computer vision, pp. 2961–2969, 2017. 86
    [27] D. Tabernik, S. ˇ Sela, J. Skvarˇc, and D. Skoˇcaj, “Segmentation-based deep-learning approach for surface-defect detection,” Journal of Intelligent Manufacturing, pp. 1–18, 2019.
    [28] X. Tao, D. Zhang, W. Ma, X. Liu, and D. Xu, “Automatic metallic surface defect detection and recognition with convolutional neural networks,” Applied Sciences, vol. 8, no. 9, p. 1575, 2018.
    [29] D. P. Kingma and M. Welling, “Auto-encoding variational bayes,” in 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14-16, 2014, Conference Track Proceedings, 2014.
    [30] T. Schlegl, P. Seeb¨ock, S. M. Waldstein, U. Schmidt-Erfurth, and G. Langs, “Unsupervised anomaly detection with generative adversarial networks to guide marker discovery,” in International Conference on Information Processing in Medical Imaging, pp. 146–157, Springer, 2017.
    [31] X. Chen and E. Konukoglu, “Unsupervised detection of lesions in brain mri using constrained adversarial auto-encoders,” in International Conference on Medical Imaging with Deep Learning, MIDL 2018, 4-6 July 2018, Amsterdam, Netherlands, 2018.
    [32] T. Schlegl, P. Seeb¨ock, S. M. Waldstein, G. Langs, and U. Schmidt-Erfurth, “f-anogan: Fast unsupervised anomaly detection with generative adversarial networks,” Medical image analysis, vol. 54, pp. 30–44, 2019.
    [33] S. Akcay, A. Atapour-Abarghouei, and T. P. Breckon, “Ganomaly: Semi-supervised anomaly detection via adversarial training,” in Asian Conference on Computer Vision, pp. 622–637, Springer, 2018.
    [34] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in Advances in neural information processing systems, pp. 2672–2680, 2014.
    [35] M. M. R. Siddiquee, Z. Zhou, N. Tajbakhsh, R. Feng, M. B. Gotway, Y. Bengio, and J. Liang, “Learning fixed points in generative adversarial networks: From image-to-image translation to disease detection and localization,” in Proceedings of the IEEE International Conference on Computer Vision, pp. 191–200, 2019.
    [36] R. Zhang, P. Isola, A. A. Efros, E. Shechtman, and O. Wang, “The unreasonable effectiveness of deep features as a perceptual metric,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 586–595, 2018.
    [37] T. Park, M.-Y. Liu, T.-C. Wang, and J.-Y. Zhu, “Semantic image synthesis with spatially-adaptive normalization,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019.
    [38] J. Johnson, A. Alahi, and L. Fei-Fei, “Perceptual losses for real-time style transfer and superresolution,” in European conference on computer vision, pp. 694–711, Springer, 2016. 87
    [39] T.-C. Wang, M.-Y. Liu, J.-Y. Zhu, A. Tao, J. Kautz, and B. Catanzaro, “High-resolution image synthesis and semantic manipulation with conditional gans,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 8798–8807, 2018.
    [40] X. Xu, D. Sun, J. Pan, Y. Zhang, H. Pfister, and M.-H. Yang, “Learning to super-resolve blurry face and text images,” in Proceedings of the IEEE International Conference on Computer Vision, pp. 251–260, 2017.
    [41] F. N. Iandola, S. Han, M. W. Moskewicz, K. Ashraf, W. J. Dally, and K. Keutzer, “Squeezenet: Alexnet-level accuracy with 50x fewer parameters and¡ 0.5 mb model size,” arXiv preprint arXiv:1602.07360, 2016.
    [42] J. Canny, “A computational approach to edge detection,” IEEE Transactions on pattern analysis and machine intelligence, no. 6, pp. 679–698, 1986.
    [43] D. Hendrycks and T. Dietterich, “Benchmarking neural network robustness to common corruptions and perturbations,” in International Conference on Learning Representations, 2019.
    [44] R. Geirhos, P. Rubisch, C. Michaelis, M. Bethge, F. A. Wichmann, and W. Brendel, “Imagenettrained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness.,” in International Conference on Learning Representations, 2019.
    [45] M. Kistler, S. Bonaretti, M. Pfahrer, R. Niklaus, and P. B¨uchler, “The virtual skeleton database: An open access repository for biomedical research and collaboration,” J Med Internet Res, vol. 15, p. e245, Nov 2013.
    [46] B. Menze, A. Jakab, S. Bauer, J. Kalpathy-Cramer, K. Farahani, J. Kirby, Y. Burren, N. Porz, J. Slotboom, R. Wiest, L. Lanczi, E. Gerstner, M.-A. Weber, T. Arbel, B. Avants, N. Ayache, P. Buendia, L. Collins, N. Cordier, J. Corso, A. Criminisi, T. Das, H. Delingette, C. Demiralp, C. Durst, M. Dojat, S. Doyle, J. Festa, F. Forbes, E. Geremia, B. Glocker, P. Golland, X. Guo, A. Hamamci, K. Iftekharuddin, R. Jena, N. John, E. Konukoglu, D. Lashkari, J. Antonio Mariz, R. Meier, S. Pereira, D. Precup, S. J. Price, T. Riklin-Raviv, S. Reza, M. Ryan, L. Schwartz, H.-C. Shin, J. Shotton, C. Silva, N. Sousa, N. Subbanna, G. Szekely, T. Taylor, O. Thomas, N. Tustison, G. Unal, F. Vasseur, M.Wintermark, D. Hye Ye, L. Zhao, B. Zhao, D. Zikic, M. Prastawa, M. Reyes, and K. Van Leemput, “The Multimodal Brain Tumor Image Segmentation Benchmark (BRATS),” IEEE Transactions on Medical Imaging, p. 33, 2014.
    [47] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Doll´ar, and C. L. Zitnick, “Microsoft coco: Common objects in context,” in European conference on computer vision, pp. 740– 755, Springer, 2014.
    [48] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C. Berg, “Ssd: Single shot multibox detector,” in European conference on computer vision, pp. 21–37, Springer, 2016. 88
    [49] V. Dumoulin, I. Belghazi, B. Poole, A. Lamb, M. Arjovsky, O. Mastropietro, and A. C. Courville, “Adversarially learned inference,” in 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings, 2017.
    [50] I. Golan and R. El-Yaniv, “Deep anomaly detection using geometric transformations,” in Advances in Neural Information Processing Systems, pp. 9758–9769, 2018.
    [51] D. Racki, D. Tomazevic, and D. Skocaj, “A compact convolutional neural network for textured surface anomaly detection,” in 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1331–1339, IEEE, 2018.
    [52] Y. Huang, C. Qiu, Y. Guo, X.Wang, and K. Yuan, “Surface defect saliency of magnetic tile,” in 2018 IEEE 14th International Conference on Automation Science and Engineering (CASE), pp. 612–617, IEEE, 2018.
    [53] X. Dong, C. J. Taylor, and T. F. Cootes, “Small defect detection using convolutional neural network features and random forests,” in Proceedings of the European Conference on Computer Vision (ECCV), pp. 0–0, 2018.
    [54] J. Lei, X. Gao, Z. Feng, H. Qiu, and M. Song, “Scale insensitive and focus driven mobile screen defect detection in industry,” Neurocomputing, vol. 294, pp. 72–81, 2018.
    [55] B. Dai and D. Wipf, “Diagnosing and enhancing vae models,” arXiv preprint arXiv:1903.05789, 2019.
    [56] D. Dehaene, O. Frigo, S. Combrexelle, and P. Eline, “Iterative energy-based projection on a normal data manifold for anomaly localization,” arXiv preprint arXiv:2002.03734, 2020.
    [57] P. Napoletano, F. Piccoli, and R. Schettini, “Anomaly detection in nanofibrous materials by cnnbased self-similarity,” Sensors, vol. 18, no. 1, p. 209, 2018.
    [58] C. Huang, F. Ye, J. Cao, M. Li, Y. Zhang, and C. Lu, “Attribute Restoration Framework for Anomaly Detection,” arXiv e-prints, p. arXiv:1911.10676, Nov. 2019.
    [59] W. Liu, R. Li, M. Zheng, S. Karanam, Z. Wu, B. Bhanu, R. J. Radke, and O. Camps, “Towards visually explaining variational autoencoders,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8642–8651, 2020.
    [60] J. Rae, J. J. Hunt, I. Danihelka, T. Harley, A. W. Senior, G. Wayne, A. Graves, and T. Lillicrap, “Scaling memory-augmented neural networks with sparse reads and writes,” in Advances in Neural Information Processing Systems, pp. 3621–3629, 2016.
    [61] T. Richter, J. Seiler, W. Schnurrer, and A. Kaup, “Robust super-resolution for mixed-resolution multiview image plus depth data,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 26, no. 5, pp. 814–828, 2015. 89
    [62] H. Wu, Z. Zou, J. Gui, W. Zeng, J. Ye, J. Zhang, H. Liu, and Z. Wei, “Multi-grained attention networks for single image super-resolution,” IEEE Transactions on Circuits and Systems for Video Technology, 2020.
    [63] S. Paul, S. Bhattacharya, and S. Gupta, “Spatiotemporal colorization of video using 3d steerable pyramids,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 27, no. 8, pp. 1605–1619, 2016.
    [64] J. Johnson, A. Gupta, and L. Fei-Fei, “Image generation from scene graphs,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1219–1228, 2018.
    [65] Y.-C. Chen, D. S. Tan, W.-H. Cheng, and K.-L. Hua, “3d object completion via class-conditional generative adversarial network,” in International Conference on Multimedia Modeling, pp. 54–66, Springer, 2019.
    [66] A. Talavera, D. S. Tan, A. Azcarraga, and K.-L. Hua, “Layout and context understanding for image synthesis with scene graphs,” in 2019 IEEE International Conference on Image Processing (ICIP), pp. 1905–1909, IEEE, 2019.
    [67] Y. Liu, X. Du, H. Shen, and S. Chen, “Estimating generalized gaussian blur kernels for out-of-focus image deblurring,” IEEE Transactions on Circuits and Systems for Video Technology, 2020.
    [68] H. Zhang, V. Sindagi, and V. M. Patel, “Image de-raining using a conditional generative adversarial network,” IEEE transactions on circuits and systems for video technology, 2019.
    [69] Q. Wu, L. Wang, K. N. Ngan, H. Li, F. Meng, and L. Xu, “Subjective and objective de-raining quality assessment towards authentic rain image,” IEEE Transactions on Circuits and Systems for Video Technology, 2020.
    [70] Y. Pang, J. Xie, and X. Li, “Visual haze removal by a unified generative adversarial network,” IEEE Transactions on Circuits and Systems for Video Technology, 2018.
    [71] L. A. Gatys, A. S. Ecker, and M. Bethge, “Image style transfer using convolutional neural networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2414–2423, 2016.
    [72] J. J. Virtusio, A. Talavera, D. S. Tan, K.-L. Hua, and A. Azcarraga, “Interactive style transfer: Towards styling user-specified object,” in 2018 IEEE Visual Communications and Image Processing (VCIP), pp. 1–4, IEEE, 2018.
    [73] P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros, “Image-to-image translation with conditional adversarial networks,” in IEEE conference on computer vision and pattern recognition, pp. 1125–1134, 2017.
    [74] J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-image translation using cycleconsistent adversarial networks,” in IEEE International Conference on Computer Vision, pp. 2223– 2232, 2017. 90
    [75] S.-A. Rebuffi, A. Kolesnikov, G. Sperl, and C. H. Lampert, “icarl: Incremental classifier and representation learning,” in IEEE Conference on Computer Vision and Pattern Recognition, pp. 2001– 2010, 2017.
    [76] M.-Y. Liu, T. Breuel, and J. Kautz, “Unsupervised image-to-image translation networks,” in Advances in Neural Information Processing Systems, pp. 700–708, 2017.
    [77] J.-Y. Zhu, R. Zhang, D. Pathak, T. Darrell, A. A. Efros, O. Wang, and E. Shechtman, “Toward multimodal image-to-image translation,” in Advances in Neural Information Processing Systems, pp. 465–476, 2017.
    [78] X. Huang, M.-Y. Liu, S. Belongie, and J. Kautz, “Multimodal unsupervised image-to-image translation,” in European Conference on Computer Vision (ECCV), pp. 172–189, 2018.
    [79] Y. Choi, M. Choi, M. Kim, J.-W. Ha, S. Kim, and J. Choo, “StarGAN: Unified generative adversarial networks for multi-domain image-to-image translation,” in IEEE Conference on Computer Vision and Pattern Recognition, pp. 8789–8797, 2018.
    [80] B. Zhao, B. Chang, Z. Jie, and L. Sigal, “Modular generative adversarial networks,” in European Conference on Computer Vision (ECCV), pp. 150–165, 2018.
    [81] S. Yang, P. Luo, C.-C. Loy, and X. Tang, “From facial parts responses to face detection: A deep learning approach,” in IEEE International Conference on Computer Vision, pp. 3676–3684, 2015.
    [82] M.-E. Nilsback and A. Zisserman, “Automated flower classification over a large number of classes,” in Indian Conference on Computer Vision, Graphics & Image Processing, pp. 722–729, IEEE, 2008.
    [83] T. Kim, M. Cha, H. Kim, J. K. Lee, and J. Kim, “Learning to discover cross-domain relations with generative adversarial networks,” in International Conference on Machine Learning, pp. 1857–1865, JMLR. org, 2017.
    [84] J. H. Soeseno, D. S. Tan,W.-Y. Chen, and K.-L. Hua, “Faster, smaller, and simpler model for multiple facial attributes transformation,” IEEE Access, vol. 7, pp. 36400–36412, 2019.
    [85] M.-Y. Liu and O. Tuzel, “Coupled generative adversarial networks,” in Advances in neural information processing systems, pp. 469–477, 2016.
    [86] D. Lopez-Paz et al., “Gradient episodic memory for continual learning,” in Advances in Neural Information Processing Systems, pp. 6467–6476, 2017.
    [87] T. Xiao, J. Zhang, K. Yang, Y. Peng, and Z. Zhang, “Error-driven incremental learning in deep convolutional neural network for large-scale image classification,” in ACM international conference on Multimedia, pp. 177–186, ACM, 2014.
    [88] H. Jung, J. Ju, M. Jung, and J. Kim, “Less-forgetting learning in deep neural networks,” arXiv preprint arXiv:1607.00122, 2016. 91
    [89] J. Kirkpatrick, R. Pascanu, N. Rabinowitz, J. Veness, G. Desjardins, A. A. Rusu, K. Milan, J. Quan, T. Ramalho, A. Grabska-Barwinska, et al., “Overcoming catastrophic forgetting in neural networks,” Proceedings of the national academy of sciences, vol. 114, no. 13, pp. 3521–3526, 2017.
    [90] Z. Li and D. Hoiem, “Learning without forgetting,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 40, no. 12, pp. 2935–2947, 2018.
    [91] G. Hinton, O. Vinyals, and J. Dean, “Distilling the knowledge in a neural network,” in NIPS Deep Learning and Representation Learning Workshop, 2015.
    [92] K. Shmelkov, C. Schmid, and K. Alahari, “Incremental learning of object detectors without catastrophic forgetting,” in IEEE International Conference on Computer Vision, pp. 3400–3409, 2017.
    [93] A. A. Rusu, N. C. Rabinowitz, G. Desjardins, H. Soyer, J. Kirkpatrick, K. Kavukcuoglu, R. Pascanu, and R. Hadsell, “Progressive neural networks,” arXiv preprint arXiv:1606.04671, 2016.
    [94] C. Fernando, D. Banarse, C. Blundell, Y. Zwols, D. Ha, A. A. Rusu, A. Pritzel, and D. Wierstra, “Pathnet: Evolution channels gradient descent in super neural networks,” arXiv preprint arXiv:1701.08734, 2017.
    [95] A. Mallya and S. Lazebnik, “Packnet: Adding multiple tasks to a single network by iterative pruning,” in IEEE Conference on Computer Vision and Pattern Recognition, pp. 7765–7773, 2018.
    [96] A. Jolicoeur-Martineau, “The relativistic discriminator: a key element missing from standard GAN,” in International Conference on Learning Representations, 2019.
    [97] M. Arjovsky, S. Chintala, and L. Bottou, “Wasserstein generative adversarial networks,” in International Conference on Machine Learning, pp. 214–223, 2017.
    [98] I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. C. Courville, “Improved training of wasserstein gans,” in Advances in Neural Information Processing Systems, pp. 5767–5777, 2017.
    [99] X. Mao, Q. Li, H. Xie, R. Y. Lau, Z. Wang, and S. Paul Smolley, “Least squares generative adversarial networks,” in IEEE International Conference on Computer Vision, pp. 2794–2802, 2017.
    [100] S. Reed, Z. Akata, H. Lee, and B. Schiele, “Learning deep representations of fine-grained visual descriptions,” in IEEE Conference on Computer Vision and Pattern Recognition, pp. 49–58, 2016.
    [101] S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, and H. Lee, “Generative adversarial text to image synthesis,” in International Conference on Machine Learning, pp. 1060–1069, 2016.
    [102] H. Zhang, I. Goodfellow, D. Metaxas, and A. Odena, “Self-attention generative adversarial networks,” in International Conference on Machine Learning, pp. 7354–7363, 2019.
    [103] D. S. Tan, W.-Y. Chen, and K.-L. Hua, “Deepdemosaicking: Adaptive image demosaicking via multiple deep fully convolutional networks,” IEEE Transactions on Image Processing, vol. 27, no. 5, pp. 2408–2419, 2018. 92
    [104] T.-C. Lin, D. S. Tan, H.-L. Tang, S.-C. Chien, F.-C. Chang, Y.-Y. Chen, W.-H. Cheng, and K.-L. Hua, “Pedestrian detection from lidar data via cooperative deep and hand-crafted features,” in 2018 25th IEEE International Conference on Image Processing (ICIP), pp. 1922–1926, IEEE, 2018.
    [105] D. S. Tan, R. N. Leong, A. F. Laguna, C. A. Ngo, A. Lao, D. M. Amalin, and D. G. Alvindia, “Autodidac: Automated tool for disease detection and assessment for cacao black pod rot,” Crop Protection, vol. 103, pp. 98–102, 2018.
    [106] D. S. Tan, J.-M. Lin, Y.-C. Lai, J. Ilao, and K.-L. Hua, “Depth map upsampling via multi-modal generative adversarial network,” Sensors, vol. 19, no. 7, p. 1587, 2019.
    [107] D. S. Tan, R. N. Leong, A. F. Laguna, C. A. Ngo, A. Lao, D. Amalin, and D. Alvindia, “A framework for measuring infection level on cacao pods,” in 2016 IEEE Region 10 Symposium (TENSYMP), pp. 384–389, IEEE, 2016.
    [108] D. S. Tan, R. N. Leong, A. F. Laguna, C. A. Ngo, A. Lao, D. Amalin, and D. Alvindia, “A method for detecting and segmenting infected part of cacao pods,” 2016.
    [109] Y.-C. Liu, D. S. Tan, J.-C. Chen, W.-H. Cheng, and K.-L. Hua, “Segmenting hepatic lesions using residual attention u-net with an adaptive weighted dice loss,” in 2019 IEEE International Conference on Image Processing (ICIP), pp. 3322–3326, IEEE, 2019.
    [110] Y.-X. Lin, D. S. Tan,W.-H. Cheng, and K.-L. Hua, “Adapting semantic segmentation of urban scenes via mask-aware gated discriminator,” in 2019 IEEE International Conference on Multimedia and Expo (ICME), pp. 218–223, IEEE, 2019.
    [111] Y.-X. Lin, D. S. Tan, W.-H. Cheng, Y.-Y. Chen, and K.-L. Hua, “Spatially-aware domain adaptation for semantic segmentation of urban scenes,” in 2019 IEEE International Conference on Image Processing (ICIP), pp. 1870–1874, IEEE, 2019.
    [112] D. S. Tan, C.-Y. Yao, C. Ruiz, and K.-L. Hua, “Single-image depth inference using generative adversarial networks,” Sensors, vol. 19, no. 7, p. 1708, 2019.
    [113] A. M. C. Antioquia, D. S. Tan, A. Azcarraga, W.-H. Cheng, and K.-L. Hua, “Zipnet: Zfnet-level accuracy with 48× fewer parameters,” in 2018 IEEE Visual Communications and Image Processing (VCIP), pp. 1–4, IEEE, 2018.
    [114] N. Shan, D. S. Tan, M. S. Denekew, Y.-Y. Chen, W.-H. Cheng, and K.-L. Hua, “Photobomb defusal expert: Automatically remove distracting people from photos,” IEEE Transactions on Emerging Topics in Computational Intelligence, 2018.
    [115] J. J. M. Ople, D. S. Tan, A. Azcarraga, C.-L. Yang, and K.-L. Hua, “Super-resolution by image enhancement using texture transfer,” in 2020 IEEE International Conference on Image Processing (ICIP), pp. 953–957, IEEE, 2020.
    [116] J. J. Virtusio, D. S. Tan, W.-H. Cheng, M. Tanveer, and K.-L. Hua, “Enabling artistic control over pattern density and stroke strength,” IEEE Transactions on Multimedia, 2020. 93
    [117] Y.-X. Lin, D. S. Tan, Y.-Y. Chen, C.-C. Huang, and K.-L. Hua, “Domain adaptation with foreground/ background cues and gated discriminators,” IEEE MultiMedia, vol. 27, no. 3, pp. 44–53, 2020.
    [118] D. S. Tan, Y.-X. Lin, and K.-L. Hua, “Incremental learning of multi-domain image-to-image translations,” IEEE Transactions on Circuits and Systems for Video Technology, 2020.
    [119] A. M. C. Antioquia, D. S. Tan, A. Azcarraga, and K.-L. Hua, “Single-fusion detector: Towards faster multi-scale object detection,” in 2019 IEEE International Conference on Image Processing (ICIP), pp. 76–80, IEEE, 2019.

    QR CODE