Noise2Void - Learning Denoising from Single Noisy Images(2019)

Notice

Recent Posts

Recent Comments

Link

« 2025/06 »
일	월	화	수	목	금	토
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

Tags more

Archives

Today

Total

관리 메뉴

쏴아리의 딥러닝 스터디

Noise2Void - Learning Denoising from Single Noisy Images(2019) 본문

Image Generation

Noise2Void - Learning Denoising from Single Noisy Images(2019)

말해보시개 2021. 7. 11. 18:47

Noise2Void - Learning Denoising from Single Noisy Images(2019)

안녕하세요. 쏴아리 입니다.

오늘은 Unsupervised Image Denoising 방법론 중 하나인 Noise2Void에 대해 포스팅 하고자 합니다.

Abstract

Image denoising의 분야는 주로 noisy input, clean target images의 pairs를 활용하여 훈련하는 deep learning 방법론이 주를 이루고 있습니다.

최근에는 clean target 없이 independent pairs of noisy images로만 학습이 가능한 NOISE2NOISE(N2N)의 연구도 이루어 졌습니다.

본 연구에서는 N2N에서 한단계 더 나아간 훈련 아이디어인 NOISE2VOID(N2V)를 제안합니다.

N2V는 noisy image pairs, clean target images를 필요로 하지 않고 noisy image 자체로만 학습이 가능하다는 점에 있어 차별화 됩니다.
예컨대, biomedical image data는 training target(clean or noisy)을 확보하는게 어려운 경우가 많은데, N2V는 이러한 문제에 적용이 가능합니다.

1. Introduction

Denoising은 대개, pixel values $s$는 통계적으로 독립이 아니라는 가정을 따릅니다.

이러한 점을 고려하면, image context를 관찰하고 unobserved pixel을 예측하는 것이 가능할 것입니다.

대개, denoising을 위한 연구들은 training pairs $(x^j, s^j)$를 필요로 합니다.

$x^j$: noisy input images
$s^j$: respective clean target images(ground truth)

하지만 ground truth images가 획득이 불가능하면, 이러한 방법들은 훈련할 수 없다는 단점이 있습니다. 최근 이러한 문제를 해결하기 위한 연구로 NOISE2NOISE(N2N)이 발표되었습니다.

noisy input을 clean ground truth images로 map하는 대신, N2N은 paird of independently degraded versions of the same training image, $(s+n, s+n')$간의 mapping을 학습하려 시도합니다.
N2N은 ground truth images에서 접근하는 전통적인 trained networks와 같은 prediction을 할 ㅅ수 있다는 장점이 있습니다.
하지만, N2N은 independent noises $(n, n')$를 갖는 same content ($s$)를 capturing하여 두 이미지를 획득해야하는 단점이 있습니다.

N2V(NOISE2VOID)는 이러한 단점을 극복하기 위한 novel training scheme을 제안합니다.

N2N이나 전통적인 supervised 방법들과 달리, N2V는 noisy images pairs나 clean target images가 없어도 훈련이 가능합니다.
N2V는다음 두가지 통계적 가정에 근거하고 있습니다.
- signal $s$ is not pixel-wise independent
- noise $n$ is conditionally pixel-wise independent given the signal $s$

본 연구의 기여점은 다음과 같습니다.

only single, noisy images만 활용하여 denoising CNNs을 훈련할 수 있는 NOISE2VOID를 소개합니다.
N2V trained denoising result와 existing CNN training schemes, non-trained methods의 결과를 비교하였습니다.

2. Related Work

Discriminative Deep Learning Methods

offline에서 훈련되어, ground truth annotated training set으로 부터 information을 추출하는 deep learning models 관련 선행연구가 이루어져 왔습니다.

Denoising을 regression task로 여겨, CNN을 predicted와 clean ground truth data간 loss를 최소화 하기 위해 학습시키는 것이 Discriminative Deep Learning Model의 목적입니다.
이와 관련하여 residual learning을 기반으로한 very deep CNN architecture, very deep encoder-decoder-architecture 등 다양한 연구가 이루어져 왔습니다.

Internal Statistics Methods

Internal Statistics Methods는 ground truth data를 사용한 훈련이 필요 없습니다.

대신, 직접 test image에 바로 적용되어, 모든 required information을 추출합니다.
N2V는 test image에 직접 훈련이 가능하다는 점에서 Internal Statistics Method 카테고리에 포함된다고 볼 수 있습니다.

Internal Statistics Methods와 관련된 선행연구는 non-local means, BM3D 등의 모델이 있습니다.

Generative Models

noisy and clean images로 구성된 unpaired training samples를 활용하여 generative adversarial networks에 기반한 denoising 모델이 연구되었습니다.

GAN-generator는 noise를 생성하도록 학습되어, clean and noisy images의 pairs를 생성하게 됩니다. 이는 전통적인 supervised setup에서와 같이 training data로 활용됩니다.
N2V와 다르게 이러한 접근 방법은 clean images가 훈련과정에 필요합니다.

3. Methods

Image Formulation

image 생성과정은 다음과 같은 joint distribution을 따른다고 봅니다.

$x = image, s: singal, n:noise$

$p(s)$는 다음 수식을 만족시키는 arbitrary distribution이라고 가정합니다.

두 픽셀 $i, j$는 서로 certain radius 안에 있습니다.

즉, signal pixel $s_i$는 통계적으로 독립적이지 않습니다.

noise $n$과 관련하여, 다음과 같은 conditional distribution을 따른다고 가정합니다.

noise pixel values $n_i$는 signal이 주어졌을때 conditionally independent합니다.

noise는 zero mean이라는 점을 가정하고, 이는 픽셀i의 이미지 기댓값이 signal임을 의미합니다.

즉, 같은 signal의 multiple images를 취득하였고 noise 수준만 다르다면, 이미지를 average했을 때, true signal의 결과를 얻게 됩니다.

Traditional Supervised Training

Image $x$를 입력받아 signal $s$를 예측하는 fully convolutional network(FCN)을 훈련합니다.

CNN의 output 중 각 픽셀 prediction $\hat{s_i}$은 입력 픽셀들의 receptive field $x_{RF(i)}$를 갖고 있습니다.
픽셀의 receptive field는 대개 해당 픽셀의 주위에있는 square patch 입니다.

이러한 관점을 고려하였을 때, CNN을 패치 정 가운데 존재하는 단일 픽셀 i에 대하여, patch $x_{RF(i)}$을 입력받아 prediction $\hat{s_i}$를 출력하는 function으로 볼 수 있습니다.

전체 이미지를 denoising하는 작업은 overlapping patches를 추출하여 네트워크에 하나하나 입력하여 이뤄낼 수 있습니다.

즉, CNN은 다음과 같은 function으로 정의합니다.

$\theta$: vector of CNN parameters we would like to train
$\hat{s_i}$: pixel prediction

전통적인 supervised training의 training pairs는 $(x^j,s^j)$입니다.

$x^j$: noisy input image
$s^j$: clean ground truth target

이를 patch based CNN의 관점에서 보면, training data를 $({x^j_{RF(i)}}, s^j_i)$으로 볼 수 있습니다.

$x^j_{RF(i)}$: patch around pixel i, extracted from training input image $x^j$
$s^j_i$: corresponding target pixel value, extracted from the ground truth image $s^j$ at the same position

이러한 training pairs를 활용하여 pixel-wise loss를 최소화 하기 위해 parameter $\theta$를 조정합니다.

여기서 standard MSE loss를 고려하면 다음과 같습니다.

Noise2Noise Training

N2N의 훈련데이터는 noisy image pairs $(x^j, x'^j)$이며, clean ground truth가 없습니다.

patch-based 관점에서 training data pairs $(x^j_{RF(i)}, x'^j_i)$를 살펴보겠습니다.

$x^j_{RF(i)}$: noisy input patch extracted from $x^j$
$x'^j_i$: noisy target, taken from $x'^j$ at position $i$.

전통적인 training 에서 식7과 유사하게, loss를 최소화 하기 위해 parameters를 조정합니다.

이때, ground truth signal $s^j_i$대신 noisy target $x'^j_i$를 사용합니다.

비록, noisy input에서 noisy target으로 mapping을 학습하지만, 훈련과정은 여전히 correct solution에 수렴하게 됩니다.

이는 expected value of noisy input이 clean signal과 같다(식 5)의 사실에 기반합니다.

Noise2Void Training

N2V는 input과 target 모두 single noisy training image $x^j$로부터 추출합니다.

만약, 간단히 patch를 입력으로하고, center pixel을 target으로 한다면, 네트워크는input patch의 center를 output으로 직접 mapping하여 단순히 identity를 학습하게 됩니다(Figure 2 a).

network architecture가 특별한 receptive를 갖고 있다고 가정하겠습니다.

receptive field $\tilde{x}_{RF(i)}$는 center에 blind-spot을 갖고있습니다(Figure 2 b).
CNN prediction $\hat{s_i}$는 바로 그 자리에 있는 input pixel $x_i$를 제외한 square neighborhood의 모든 input pixels에 영향을 받습니다.

blind-spot network는 적은 정보만을 prediction에 활용하기 때문에, normal network과 비교하였을때 약간 낮은 성능을 보여줄 수 있습니다.

blind-spot architecture의 장점은 identity를 학습하지 못한다는 점에 있습니다.

주어진 signal과 독립적인 noise를 가정하면(식 3), neighboring pixels는 noise value $n_i$에 대한 어떤 정보도 전달하지 않을 것입니다.
signal은 statistical dependencies를 가정하고 있기 때문에(식 2), network는 여전히 주변을 살펴보고 signal value $s_i$를 추정할 수 있습니다.
결국 blind-spot network는 input patch와 target value를 same noisy training이미지로 활용을 하게됩니다.

다음과 같은 empirical risk를 최소화 하기 위한 훈련을 할 수 있습니다.

Implementation Details

training scheme을 그대로 적용하면, 매우 비효율적일 수 있습니다.

single output pixel을 위한 전체 patch의 gradient를 처리해야합니다.

이러한 이슈를 해결하기 위하여, 다음과 같은 approximation technique을 활용하였습니다.

noisy training image $x_i$에 대하여, 랜덤하게 64x64 size의 pixel patches(network receptive 보다 큼)를 추출합니다.
clustering을 피하기 위해 stratified sampling을 활용하여 각 patch에서 랜덤하게 N개의 pixels를 선택 합니다.
이러한 pixels를 mask하고, original noisy input values를 그 위치의 target으로 활용합니다. (Figure 3)
Keras pipeline을 활용하여, 선택된 위치를 제외한 pixels의 loss는 zero로 설정합니다.
이를 통해 남은 predicted image를 무시하고, all of them을 위한 gradient를 동시에 학습 할 수 있습니다.

4. Experiments

NOISE2VOID를 natural images, simulated biological image data, acquired microscopy images에 대하여 평가하였습니다. 또한 NOISE2VOID 모델을 traditional, NOISE2NOISE, training-free denoising methods(BM3D, non-local means, means and median filters)의 결과와 성능을 비교하였습니다.

이전에 언급하였듯, N2V는 다른 방법론에 비해 더 적은 정보를 prediction에 활용하기 때문에 다른 training methods들을 outperform 하지 못하였습니다.

Performance over Various Noise Levels

BSD68 데이터셋에 대하여 N2V와 다양한 baseline 모델들의 various levels of noise에 따른 PSNR 성능을 측정하였습니다.

이전에 언급하였듯, N2V는 다른 방법론에 비해 더 적은 정보를 prediction에 활용하기 때문에 다른 training methods들을 outperform 하지 못하였습니다.

5. Conclusion

Only single noisy 이미지를 활용하여 denoising CNNs을 훈련하기 위한 방법인 NOISE2VOID를 소개하였습니다.

Photography, fluorescence microscopy, cryo-Transmission Electron Microscopy 등 다양한 image modalities에 N2V가 적용 가능함을 입증하였습니다.

같이 보시면 좋아요.

2021.05.08 - [Image Generation] - Unpaired Image-To-Image Translation Using Cycle-Consistent Adversarial Networks(2017)

Unpaired Image-To-Image Translation Using Cycle-Consistent Adversarial Networks(2017)

Unpaired Image-To-Image Translation Using Cycle-Consistent Adversarial Networks(2017) Abstract ▷ Paired Image-to-image translation 훈련 데이터 획득의 어려움 Image-to-image translation은 input-t..

deepmal.tistory.com

2021.04.18 - [Image Generation] - Image-to-Image Translation with Conditional Adversarial Network(2017)

Image-to-Image Translation with Conditional Adversarial Network(2017)

Image-to-Image Translation with Conditional Adversarial Network(2017) Abstract 1. conditional GAN을 활용한 image-to-image translation problem 해결 본 연구에서는 conditional adversarial network를..

deepmal.tistory.com

포스팅 내용이 도움이 되었나요? 공감과 댓글은 큰 힘이 됩니다!

'Image Generation' 카테고리의 다른 글

Meta-Transfer Learning for Zero-Shot Super-Resolution(CVPR 2020) (1)	2021.09.11
“Zero-Shot” Super-Resolution Using Deep Internal Learning(2018) (0)	2021.07.18
Real-World Super-Resolution via Kernel Estimation and Noise Injection(2020) (0)	2021.07.17
Unpaired Image-To-Image Translation Using Cycle-Consistent Adversarial Networks(2017) (0)	2021.05.08
Image-to-Image Translation with Conditional Adversarial Network(2017) (0)	2021.04.18

'Image Generation' Related Articles

Comments

쏴아리의 딥러닝 스터디

Noise2Void - Learning Denoising from Single Noisy Images(2019) 본문

Noise2Void - Learning Denoising from Single Noisy Images(2019)

Noise2Void - Learning Denoising from Single Noisy Images(2019)

Abstract

1. Introduction

2. Related Work

Discriminative Deep Learning Methods

Internal Statistics Methods

Generative Models

3. Methods

Traditional Supervised Training

Noise2Noise Training

Noise2Void Training

Implementation Details

4. Experiments

Performance over Various Noise Levels

5. Conclusion

'Image Generation' 카테고리의 다른 글

티스토리툴바