簡易檢索 / 詳目顯示

研究生: Jilyan Bianca Sy Dy
Jilyan Bianca Sy Dy
論文名稱: MCGAN: Mask Controlled Generative Adversarial Network for Image Retargeting
MCGAN: Mask Controlled Generative Adversarial Network for Image Retargeting
指導教授: 花凱龍
Kai-Lung Hua
口試委員: 花凱龍
Kai-Lung Hua
余能豪
Neng-Hao Yu
郭彥甫
Yan-Fu Kuo
陳永耀
Yung-Yao Chen
學位類別: 碩士
Master
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2021
畢業學年度: 109
語文別: 英文
論文頁數: 35
中文關鍵詞: Image RetargetingConditional GANControllable GAN
外文關鍵詞: Image Retargeting, Conditional GAN, Controllable GAN
相關次數: 點閱:164下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

Generative Adversarial Networks (GAN) can be trained to learn the internal distribution of the image. Once it learns this, it can generate new images of varying aspect ratios while maintaining the image's internal distribution and completeness despite adding or removing particular objects. However, due to existing model's design, it cannot understand an image's semantics and is incapable of distinguishing different objects. The lack of semantic understanding tends to lead to the generation of unnatural objects (i.e., a person with two heads). Since the model's design is not equipped for learning semantics. We choose to address this problem through user intervention. Our method allows the user to generate an image based on the user's desired aspect ratio while also masking objects they want to preserve. Masking the object allows the user to prevent the object from distorting. Masking also allows the user to remove, relocate, or replicate the object from the input image.


Generative Adversarial Networks (GAN) can be trained to learn the internal distribution of the image. Once it learns this, it can generate new images of varying aspect ratios while maintaining the image's internal distribution and completeness despite adding or removing particular objects. However, due to existing model's design, it cannot understand an image's semantics and is incapable of distinguishing different objects. The lack of semantic understanding tends to lead to the generation of unnatural objects (i.e., a person with two heads). Since the model's design is not equipped for learning semantics. We choose to address this problem through user intervention. Our method allows the user to generate an image based on the user's desired aspect ratio while also masking objects they want to preserve. Masking the object allows the user to prevent the object from distorting. Masking also allows the user to remove, relocate, or replicate the object from the input image.

Recommendation Letter . . . . . . . . . . . . . . . . . . . . . . . .i Approval Letter . . . . . . . . . . . . . . . . . . . . . . . . . . . .ii Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .iii Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . .iv Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .v List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . .vii List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . .xiii 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . .1 2 Related Literature . . . . . . . . . . . . . . . . . . . . . . . . .5 3 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8 3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . .8 3.2 Multi­scale Architecture . . . . . . . . . . . . . . . . . .8 3.3 Overlay Function . . . . . . . . . . . . . . . . . . . . . .12 3.4 Training . . . . . . . . . . . . . . . . . . . . . . . . . . .13 3.4.1 Adversarial Loss . . . . . . . . . . . . . . . . . .14 3.4.2 Reconstruction Loss . . . . . . . . . . . . . . . .14 3.4.3 De­association Loss . . . . . . . . . . . . . . . .15 4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .18 4.1 Implementation Details . . . . . . . . . . . . . . . . . . .18 4.2 Experiments . . . . . . . . . . . . . . . . . . . . . . . . .18 4.2.1 Ablation . . . . . . . . . . . . . . . . . . . . . . .19 4.2.2 Overlay Function Scales . . . . . . . . . . . . . .20 4.2.3 Reconstruction Loss . . . . . . . . . . . . . . . .22 4.2.4 User­Defined Mask . . . . . . . . . . . . . . . . .22 4.2.5 Setting Object Location . . . . . . . . . . . . . .24 4.2.6 Object Replication . . . . . . . . . . . . . . . . .25 4.2.7 Object Removal . . . . . . . . . . . . . . . . . .26 4.2.8 Comparison . . . . . . . . . . . . . . . . . . . . .28 4.2.9 User Study . . . . . . . . . . . . . . . . . . . . .29 5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . .33 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .34

[1]M.Rubinstein,A.Shamir,andS.Avidan,“Improvedseamcarvingforvideoretargeting,”ACMtrans­actions on graphics (TOG), vol. 27, no. 3, pp. 1–9, 2008.
[2]A. Shocher, S. Bagon, P. Isola, and M. Irani, “Ingan: Capturing and remapping the ”dna” of a naturalimage,” 2019.
[3]T. R. Shaham, T. Dekel, and T. Michaeli, “Singan: Learning a generative model from a single naturalimage,” in2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 4569–4579,2019.
[4]Mingju Zhang, Lei Zhang, Yanfeng Sun, Lin Feng, and Weiying Ma, “Auto cropping for digital pho­tographs,” in2005 IEEE International Conference on Multimedia and Expo, pp. 4 pp.–, 2005.
[5]B. Suh, H. Ling, B. B. Bederson, and D. W. Jacobs, “Automatic thumbnail cropping and its effective­ness,”inProceedingsofthe16thAnnualACMSymposiumonUserInterfaceSoftwareandTechnology,UIST ’03, (New York, NY, USA), p. 95–104, Association for Computing Machinery, 2003.
[6]M.MaandJ.K.Guo,“Automaticimagecroppingformobiledevicewithbuilt­incamera,”FirstIEEEConsumer Communications and Networking Conference, 2004. CCNC 2004., pp. 710–711, 2004.
[7]V. Setlur, S. Takagi, R. Raskar, M. Gleicher, and B. Gooch, “Automatic image retargeting,” inPro­ceedings of the 4th International Conference on Mobile and Ubiquitous Multimedia,MUM’05, (NewYork, NY, USA), p. 59–68, Association for Computing Machinery, 2005.
[8]F. Liu and M. Gleicher, “Automatic image retargeting with fisheye­view warping,” inProceedings ofthe 18th Annual ACM Symposium on User Interface Software and Technology, UIST ’05, (New York,NY, USA), p. 153–162, Association for Computing Machinery, 2005.
[9]R. Gal, O. Sorkine, and D. Cohen­Or, “Feature­aware texturing,” inProceedings of the 17th Euro­graphicsConferenceonRenderingTechniques,EGSR’06,(Goslar,DEU),p.297–303,EurographicsAssociation, 2006.
[10]YanKe,XiaoouTang,andFengJing,“Thedesignofhighlevelfeaturesforphotoqualityassessment,”in2006IEEEComputerSocietyConferenceonComputerVisionandPatternRecognition(CVPR’06),vol. 1, pp. 419–426, 2006.
[11]D. Simakov, Y. Caspi, E. Shechtman, and M. Irani, “Summarizing visual data using bidirectionalsimilarity,” in2008 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8, 2008.
[12]T. Q. Chen and M. Schmidt, “Fast patch­based style transfer of arbitrary style,”arXiv preprintarXiv:1612.04337, 2016.
[13]A. J. Champandard, “Semantic style transfer and turning two­bit doodles into fine artworks,”arXivpreprint arXiv:1603.01768, 2016.
[14]O. Frigo, N. Sabater, J. Delon, and P. Hellier, “Split and match: Example­based adaptive patch sam­pling for unsupervised style transfer,” inProceedings of the IEEE Conference on Computer Visionand Pattern Recognition, pp. 553–561, 2016.[15]X.HuangandS.Belongie,“Arbitrarystyletransferinreal­timewithadaptiveinstancenormalization,”inProceedings of the IEEE International Conference on Computer Vision, pp. 1501–1510, 2017.
[16]T. Park, J.­Y. Zhu, O. Wang, J. Lu, E. Shechtman, A. A. Efros, and R. Zhang, “Swapping autoencoderfor deep image manipulation,” 2020.
[17]T. Ružić and A. Pižurica, “Context­aware patch­based image inpainting using markov random fieldmodeling,”IEEE transactions on image processing, vol. 24, no. 1, pp. 444–456, 2014.
[18]Y. Song, C. Yang, Z. Lin, X. Liu, Q. Huang, H. Li, and C.­C. Jay Kuo, “Contextual­based imageinpainting: Infer, match, and translate,” inProceedings of the European Conference on ComputerVision (ECCV), pp. 3–19, 2018.
[19]A. Criminisi, P. Pérez, and K. Toyama, “Region filling and object removal by exemplar­based imageinpainting,”IEEE Transactions on image processing, vol. 13, no. 9, pp. 1200–1212, 2004.
[20]O. Mac Aodha, N. D. Campbell, A. Nair, and G. J. Brostow, “Patch based synthesis for single depthimage super­resolution,” inEuropean conference on computer vision, pp. 71–84, Springer, 2012.
[21]W. T. Freeman, T. R. Jones, and E. C. Pasztor, “Example­based super­resolution,”IEEE Computergraphics and Applications, vol. 22, no. 2, pp. 56–65, 2002.
[22]C.Barnes,E.Shechtman,A.Finkelstein,andD.B.Goldman,“Patchmatch: Arandomizedcorrespon­dence algorithm for structural image editing,”ACM Trans. Graph., vol. 28, no. 3, p. 24, 2009.
[23]S.­S. Lin, I.­C. Yeh, C.­H. Lin, and T.­Y. Lee, “Patch­based image warping for content­aware retar­geting,”IEEE transactions on multimedia, vol. 15, no. 2, pp. 359–368, 2012.
[24]W. Dong, F. Wu, Y. Kong, X. Mei, T.­Y. Lee, and X. Zhang, “Image retargeting by texture­awaresynthesis,”IEEE Transactions on Visualization and Computer Graphics, vol. 22, no. 2, pp. 1088–1101, 2015.
[25]K.­K. Maninis, S. Caelles, J. Pont­Tuset, and L. V. Gool, “Deep extreme cut: From extreme points toobject segmentation,” 2018.
[26]I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. Courville, “Improved training of wasser­stein gans,” 2017.
[27]A. Mahendran and A. Vedaldi, “Understanding deep image representations by inverting them,” in2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5188–5196, 2015.

無法下載圖示 全文公開日期 2026/01/31 (校內網路)
全文公開日期 2026/01/31 (校外網路)
全文公開日期 2026/01/31 (國家圖書館:臺灣博碩士論文系統)
QR CODE