<aside> 📌 Generative Adversarial Networks
</aside>
<aside> ⬇️ 이번 학습 목표
Generator
: the latent variable function
$G_{\theta}$
Discriminator
:
$D_{\phi}$ ($=\mathbf{1}{ \mathcal{X}{\text{data}}}(\mathbf{x})$)
즉, binary classification problem → binary cross-entropy loss function
$$
\ell(\mathbf{x},\phi):=-\mathbf{1}{\mathcal{X}{\text{data}}}(\mathbf{x})\log D_{\phi}(\mathbf{x})-(1-\mathbf{1}{\mathcal{X}{\text{data}}}(\mathbf{x}))\log (1-D_{\phi}(\mathbf{x})) $$
$$
\ell(\mathbf{x},\phi)=\begin{cases} -\log D_{\phi}(\mathbf{x}) & :\mathbf{x}\in\mathcal{X}{\text{data}} \\ -\log (1-D{\phi}(\mathbf{x})) & :\mathbf{x}\notin\mathcal{X}_{\text{data}} \end{cases} $$
w.r.t $\phi$, $\mathbf{x}\in\mathcal{X}{\text{data}}\cup\widehat{\mathcal{X}}{\text{data}}$에 대한 $\ell(\mathbf{x},\phi)$의 expectation을 최소화할 수 있다!
→ optimal discriminator $D_{\ast}$를 얻을 수 있음
data distribution으로부터 real data $\mathcal{D}$를 수집하고, latent variable $\mathbf{z}\sim P_{\mathcal{Z}}$를 취함으로써 model distribution $P_{G}$를 따르는 generator $G_{\theta}$를 작동시킬 수 있기 때문에
두 분포 $P_{\text{data}}$와 $P_G$를 비교하는 방법: density ratio $r=\frac{P_{\text{data}}}{P_{G}}$ 계산
$P_{\text{data}}$에 접근할 수 없음 - → 샘플만으로 ratio를 계산할 수 있는 방법을 찾아야 함!
$$
\begin{aligned} \frac{P_{\text{data}}(\mathbf{x})}{P_{G}(\mathbf{x})}= &=\frac{\mathbb{P}(X=\mathbf{x}|\text{real})}{\mathbb{P}(X=\mathbf{x}|\text{generated})} \\ &=\frac{\mathbb{P}(\text{real}|X=\mathbf{x})\mathbb{P}(X=\mathbf{x})}{\mathbb{P}(\text{real})}\Big/ \frac{\mathbb{P}(\text{generated}|X=\mathbf{x})\mathbb{P}(X=\mathbf{x})}{\mathbb{P}(\text{generated})}\\ &=\frac{\mathbb{P}(\text{real}|X=\mathbf{x})}{\mathbb{P}(\text{generated}|X=\mathbf{x})}\approx\frac{D_{\phi}(\mathbf{x})}{1-D_{\phi}(\mathbf{x})} \end{aligned} $$
$$
D_{\phi}(\mathbf{x})\approx\frac{P_{\text{data}}(\mathbf{x})}{P_{\text{data}}(\mathbf{x})+P_{G}(\mathbf{x})} $$
Discriminator
network $D_{\phi}$ 학습을 위한 learning objective를 design할 수 있다!
loss function $l$을 real data 부분과 $G_\theta$ 부분으로 분해
$$
\ell_{\text{data}}(\mathbf{x},\phi)=-\log D_{\phi}(\mathbf{x}),\quad \ell_{G}(\hat{\mathbf{x}},\phi)=-\log(1-D_{\phi}(\hat{\mathbf{x}})) $$
loss function $L(\phi, \theta)$
$$ \begin{aligned} L(\phi,\theta)&:=\mathbb{E}{\mathbf{x}\sim P{\text{data}},\hat{\mathbf{x}}\sim P_{G_{\theta}}}[\ell_{\text{data}}(\mathbf{x},\phi)+\ell_{G}(\hat{\mathbf{x}},\phi)]\\ &=\frac{1}{2}\mathbb{E}{\mathbf{x}\sim P{\text{data}}}[\ell_{\text{data}}(\mathbf{x},\phi)]+\frac{1}{2}\mathbb{E}{\hat{\mathbf{x}}\sim P{G_{\theta}}}[\ell_{G}(\hat{\mathbf{x}},\phi)]\\ &=\frac{1}{2}\int_{\mathcal{X}{\text{data}}} \ell{\text{data}}(\mathbf{x},\phi)P_{\text{data}}(\mathbf{x})\text{d}\mathbf{x}+\frac{1}{2}\int_{\widehat{\mathcal{X}}{\text{data}}}\ell{G}(\hat{\mathbf{x}},\phi)P_{G_{\theta}}(\hat{\mathbf{x}})\text{d}\hat{\mathbf{x}} \end{aligned} $$
이때, generator $G_\theta$가 realistic data를 잘 생성하게 된다면, i.e., $\mathcal{X}{\text{data}}\approx\widehat{\mathcal{X}}{\text{data}}$ 두 적분 항을 합칠 수 있음.
$$
\begin{aligned} \frac{1}{2}&\int_{\mathcal{X}{\text{data}}}[\ell{\text{data}}(\mathbf{x},\phi)P_{\text{data}}(\mathbf{x})\text{d}\mathbf{x}+\ell_{G}(\mathbf{x},\phi)P_{G_{\theta}}(\mathbf{x})]\text{d}\mathbf{x} \\ &=-\frac{1}{2}\int_{\mathcal{X}{\text{data}}}P{\text{data}}(\mathbf{x})\log D_{\phi}(\mathbf{x})+P_{G_{\theta}}(\mathbf{x})\log(1-D_{\phi}(\mathbf{x}))\text{d}\mathbf{x} \end{aligned} $$
위의 objective function을 최소화하는 optimal Discriminator
For $a,b>0$ and $z\in (0,1)$,
$$
\begin{aligned} \frac{\text{d}}{\text{d} z}(a\log z+b\log(1-z))&=\frac{a}{z}-\frac{b}{1-z}=0\quad \Rightarrow \quad z=\frac{a}{a+b} \end{aligned} $$
the minimizer of the above objective function:
$$
D_{\ast}(\mathbf{x})=D_{\phi_{\ast}}(\mathbf{x})=\frac{P_{\text{data}}(\mathbf{x})}{P_{\text{data}}(\mathbf{x})+P_{G_{\theta}}(\mathbf{x})} $$
plug $\phi_{\ast}(\theta)$ into $\phi$ of the above objective function:
$$ \begin{aligned} L(\phi_{\ast}(\theta),\theta)&=\log 2-\frac{1}{2}\mathbb{KL}\left(P_{\text{data}}\Big\|\frac{P_{\text{data}}+P_{G_{\theta}}}{2}\right)-\frac{1}{2}\mathbb{KL}\left( P_{G_{\theta}}\Big\|\frac{P_{\text{data}}+P_{G_{\theta}}}{2}\right)\\ &=\log2 - \mathbb{JS}(P_{\text{data}}\|P_{G_{\theta}}) \end{aligned} $$