Image-to-Image Translation with Conditional Adversarial Network(2017)

Notice

Recent Posts

Recent Comments

Link

« 2025/12 »
일	월	화	수	목	금	토
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

Tags more

Archives

Today

Total

관리 메뉴

쏴아리의 딥러닝 스터디

Image-to-Image Translation with Conditional Adversarial Network(2017) 본문

Image Generation

Image-to-Image Translation with Conditional Adversarial Network(2017)

말해보시개 2021. 4. 18. 15:49

Image-to-Image Translation with Conditional Adversarial Network(2017)

Abstract

1. conditional GAN을 활용한 image-to-image translation problem 해결

본 연구에서는 conditional adversarial network를 활용하여 image-to-image translation problem의 general-purpose solution이 적용가능한지 탐구 하였습니다.
이러한 뉴럴네트워크는 input image에서 output image로의 mapping을 학습할 뿐 아니라, mapping을 학습바기 위한 loss function도 함께 배웁니다.
이는 전통적인 방법과 매우 다른 loss formulation을 요구하는 문제에 적용 가능하게 합니다.

2. Pix2Pix의 wide한 적용 가능성을 검증

본 연구에서는 해당 접근 방법론이 매우 효과적으로 photo를 label map으로 부터 잘 변환하고, edge map으로 부터 object를 잘 reconstruction하며, 이미지에 color를 주는 등 다양한 task에 적용이 가능한 것을 입증하였습니다.
pix2pix software가 본 논문과 함께 발표된 이후로, 많은 관심을 가져 주었고 다양한 연구자들의 softwafe에 그들의 고유한 실험을 진행하여 범용적인 적용성을 검증 하였습니다.

Introduction

본 연구에서는 automatic image-to-image translation 문제는 이미지를 하나의 가능한 representation으로 변환하는 과정으로 정의합니다.

Figire 1: image-to-image translasiton task

본 연구에서는 GAN에 conditional setting을 하여 image-to-image translation으로의 적용가능성을 탐색합니다. Conditional GAN(cGAN)은 conitional generative model로서, input image를 condition으로 주어 output image를 generate 하기 위한 image-to-image translation task에 적합합니다.

Related work

Structured losses for image modeling

Image-to-Image translation problem은 per-pixel classifiaction 혹은 regression 형태로 formulated 되어 왔습니다.
이러한 formulation은 각각의 output pixel이 주어진 input image와 다른것과 모두 조건부 독립적이라는 측면에 있어서, output space를 unstructured로 취급합니다.
Conditional GANs 대신에 structured loss를 학습하는데, Structured losses는 output의 joint configuration을 penalize하는 특징이 있습니다.

Conditional GANs

기존 연구들은 GANs을 사용하여 image-to-image mappings 문제에 활용하였으나, L2 regression과같은 term에 의존한 unconditionally GANs을 적용하였습니다.
본 연구에서는 generator와 discriminator를 위한 serveral architectural choice 측면에서 기존 연구와 차별화됩니다.
- 기존 연구와 다르게 "U-Net"에 기반한 Generator를 활용하였으며, "PatchGAN"에 기반한 Discriminator를 활용하였습니다.
- PatchGAN은 image patch의 scale에 맞게 penalizes structures를 수행합니다.

Method

GAN and Conditional GAN

GAN은 Generative Model로서, random noise vector z로 부터 output image y, G : z → y로의 mapping을 학습합니다.
반면, conditional GAN은 observed image x와 random noise vector z를 활용 하여 y, G : {x, z} → y로의 mapping을 학습합니다.
- Generator G는 Discriminator로 하여금 output(fake image)이 real images 구별이 불가능하 도록 학습을 합니다.
- Discriminator D는 반대로, generator의 fake image를 real image와 구별하여 탐지하도록 학습을 합니다.
- 해당 training procedure는 Figure 2에 기술되어있습니다.

Figure 2: Training a conditional GAN to map edges -> photo

Discriminator D는 fake(Generator가 만든)와 real {edge, photo} tuple을 분류합니다.
Generator G는 discriminator를 속이도록 학습합니다.
unconditional GAN과 달리, generator와 discriminator는 input edge map을 입력받습니다.

3.1 Objective

Conditional GAN의 Objective는 다음과 같이 표현됩니다.

Generator G는 objective를 minimize하고 반대로, D는 maximize하고자 합니다.

본 연구에서는 Discriminator에 conditioning의 중요성을 테스트 하기 위하여, unconditional variant를 함께 비교하였습니다.

다음 objective에는 Discriminator는 x를 observe하지 않은 상황입니다.

기존의 접근 방법들은 GAN objective와 L2 distence와 같은 trainidional loss를 함께 섞는 것이 유용하다는 점을 발견 하였습니다.

discriminator의 역할은 그대로입니다.
하지만 generator는 단순히 discriminator를 속이는 것 뿐만 아니라 L2의 측면에서output이 ground truth 근처에 되어야 합니다.
본 연구에서는 L1 distance 보다 L2를 사용하는것이 less blurring의 측면에서 더 효과적인것을 확인하였습니다.

최종 목적함수는 다음과 같습니다.

3.2 Network architectures

3.2.1 Generator with skips

generator에게 bottleneck information을 우회할 수 있는 수단을 제공하기 위하여, "U-Net" style의 skip-connection을 더하였습니다.
skip connection을 각 layer i와 layer n-i 사이에 연결해 주었습니다.
- n: total number of layers
각각의 skip-connection은 단순히 layer i에서의 모든 채널들을 layer n-i의 채널들과 concatenates합니다.

3.2.2 Markovian discriminator (PatchGAN)

L2, L1 Loss는 image genereation에 적용할 때 blurry results를 생성한다는 문제점이 있습니다.
이러한 loss들은 high frequence cripness를 잘 반영하지 못하는 반면, low frequencies를 capture하는데 좋은 성능을 낸다는 특징이 있습니다.
본 연구에서는 discriminator architecture를 PatchGAN을 적요하였습니다.
- PatchGAN은 scale of patches 에만 penalize합니다.
- Disciminator는 N*N patch 각각에 대하여 real 인지 fake인지 분류합니다.

Figure 3: Two choies for the architreture of the generator

좌측 Network: Encoder-decoder
우측 Network: U-Net
- encoder-decoder에 Skip connection을 더한 것으로, encoder의 mirrored layer가 decoder stacks에 있는 점이 특징입니다.

3.3 Optimization and inference

Training Time: minibatch SGD, Adam solver를 적용하였습니다.

Experiments

4.1 Evaluation metrics

Amazon Mechanical Turk(AMT)

사람을 대상으로 한 실험으로, 실제 이미지와 가짜 이미지를 보여준 후 그들이 진짜라고 생각하는 것을 선택하게 합니다.
알고리즘이 참가자를 속였는지 테스트 합니다.

FCN-Score

classify the synthesized image correctly as well
Semantic Segmentation을 위해 FCN-8s architecture를 채택하고, cityscape dataset에 훈련함

4.2 Analysis of the objective function

Figure 4: Different losses induce different quality of results

Figure 4는 labels -> photo image to image translation problem의 qualitative effects를 보여줍니다.
각 columns은 각자의 loss에 기반하여 훈련된 결과를 보여줍니다.
L1 alone은 reasonable but blurry한 results를 보여줍니다.
cGAN alone(setting lambda = 0 in Eqn 4)의 경우 더 sharper한 result를 보여주지만 visual artifacts on certain applications의 문제가 있습니다.
L1 + cGAN(lambda =100)의 경우 이러한 artifacts problem을 줄여줍니다.

Table 1: FCN-scores for different losses, evaluated on Cityscapes labels<->photos

cityscapes labels -> photo task에서 FCN-score를 측정
GAN: condition을 discriminator에서 제거함.
L1+cGAN이 가장 높은 성능을 보여주었습니다.

4.3 Analysis of the generator architecture

Figure 5: Adding skip connections to an encoder-decoder to create a "U-Net" results in much higher quality results

Encoder-decoder, L1+cGAN 구조 모두에서 U-Net을 적용하였을때 결과가 더 좋은 quality를 보여주었습니다.

Table 2: FCN-scores for different generator architecrues

U-net Generator architecture를 적용하고, L1-cGAN의 Objective의 결과가 가장 좋은 성능을 보여주었습니다.

4.4 From PixelGANs to PatchGANs to ImageGANs

Table 3: FCN-scores for different receptive fields size od the discriminator

70*70의 PatchGAN을 활용하였을때 가장 성능이 좋았습니다.

Figure 6: Patch size vatiants

70*70 PatchGAN의 output이 가장 sharp하였습니다.

4.5 Perceptual validation

map<-> aerial photograph, grayscale -> color Task에서의 perceptual realism 결과를 검증한 내용입니다.

4.6 Semantic Segmentation

Figure 10: Applying conditional GAN to semantis segmentation

cGAN이 가장 sharp하고 ground truth에 가까운 이미지를 생성하였습니다.

Table 6: Performance of photo->labels on city scapes

cGAN을 사용하였을 때보다 Semantic Segmentation 문제에서는 단순히 L1 regression이 더 좋은 성능을 보여주었습니다.
저자들은 vision problem에서 semantic segmentation같이 덜 애매모호한 graphics tasks에서는 L1과 같은 reconstruction losses만으로도 충분할 수 있다고 주장합니다.

4.7 Community-driven Research

Twitter community인 pix2pix codebase를 배포하였을때, computer vision and graphic practitioner들이 성공적으로 다양한 image-to-image translation task를 적용한 결과를 보여줍니다.

Conclusion

cGAN이 성공적인 성과를 보여주었지만, 이러한 문제를 해결하기 위한 best solution은 더 존재할지도 모릅니다. Figure6에서 단순히 L1 regression을 적용하였을때 cGAN보다 더 좋은 score를 보여주었기 때문입니다.

같이 보시면 좋아요.

포스팅 내용이 도움이 되었나요? 공감과 댓글은 큰 힘이 됩니다!

'Image Generation' 카테고리의 다른 글

Meta-Transfer Learning for Zero-Shot Super-Resolution(CVPR 2020) (1)	2021.09.11
“Zero-Shot” Super-Resolution Using Deep Internal Learning(2018) (0)	2021.07.18
Real-World Super-Resolution via Kernel Estimation and Noise Injection(2020) (0)	2021.07.17
Noise2Void - Learning Denoising from Single Noisy Images(2019) (5)	2021.07.11
Unpaired Image-To-Image Translation Using Cycle-Consistent Adversarial Networks(2017) (0)	2021.05.08

'Image Generation' Related Articles

Comments

쏴아리의 딥러닝 스터디

Image-to-Image Translation with Conditional Adversarial Network(2017) 본문

Image-to-Image Translation with Conditional Adversarial Network(2017)

Image-to-Image Translation with Conditional Adversarial Network(2017)

Abstract

Introduction

Related work

Structured losses for image modeling

Conditional GANs

Method

GAN and Conditional GAN

Figure 2: Training a conditional GAN to map edges -> photo

3.1 Objective

3.2 Network architectures

3.3 Optimization and inference

Experiments

4.1 Evaluation metrics

4.2 Analysis of the objective function

4.3 Analysis of the generator architecture

4.4 From PixelGANs to PatchGANs to ImageGANs

4.5 Perceptual validation

4.6 Semantic Segmentation

4.7 Community-driven Research

Conclusion

'Image Generation' 카테고리의 다른 글

티스토리툴바