<aside> ๐ Generative Adversarial Networks
</aside>
<aside> โฌ๏ธ ์ด๋ฒ ํ์ต ๋ชฉํ
Generator
: the latent variable function
$G_{\theta}$
Discriminator
:
$D_{\phi}$ ($=\mathbf{1}{ \mathcal{X}{\text{data}}}(\mathbf{x})$)
์ฆ, binary classification problem โ binary cross-entropy loss function
$$
\ell(\mathbf{x},\phi):=-\mathbf{1}{\mathcal{X}{\text{data}}}(\mathbf{x})\log D_{\phi}(\mathbf{x})-(1-\mathbf{1}{\mathcal{X}{\text{data}}}(\mathbf{x}))\log (1-D_{\phi}(\mathbf{x})) $$
$$
\ell(\mathbf{x},\phi)=\begin{cases} -\log D_{\phi}(\mathbf{x}) & :\mathbf{x}\in\mathcal{X}{\text{data}} \\ -\log (1-D{\phi}(\mathbf{x})) & :\mathbf{x}\notin\mathcal{X}_{\text{data}} \end{cases} $$
w.r.t $\phi$, $\mathbf{x}\in\mathcal{X}{\text{data}}\cup\widehat{\mathcal{X}}{\text{data}}$์ ๋ํ $\ell(\mathbf{x},\phi)$์ expectation์ ์ต์ํํ ์ ์๋ค!
โ optimal discriminator $D_{\ast}$๋ฅผ ์ป์ ์ ์์
data distribution์ผ๋ก๋ถํฐ real data $\mathcal{D}$๋ฅผ ์์งํ๊ณ , latent variable $\mathbf{z}\sim P_{\mathcal{Z}}$๋ฅผ ์ทจํจ์ผ๋ก์จ model distribution $P_{G}$๋ฅผ ๋ฐ๋ฅด๋ generator $G_{\theta}$๋ฅผ ์๋์ํฌ ์ ์๊ธฐ ๋๋ฌธ์
๋ ๋ถํฌ $P_{\text{data}}$์ $P_G$๋ฅผ ๋น๊ตํ๋ ๋ฐฉ๋ฒ: density ratio $r=\frac{P_{\text{data}}}{P_{G}}$ ๊ณ์ฐ
$P_{\text{data}}$์ ์ ๊ทผํ ์ ์์ - โ ์ํ๋ง์ผ๋ก ratio๋ฅผ ๊ณ์ฐํ ์ ์๋ ๋ฐฉ๋ฒ์ ์ฐพ์์ผ ํจ!
$$
\begin{aligned} \frac{P_{\text{data}}(\mathbf{x})}{P_{G}(\mathbf{x})}= &=\frac{\mathbb{P}(X=\mathbf{x}|\text{real})}{\mathbb{P}(X=\mathbf{x}|\text{generated})} \\ &=\frac{\mathbb{P}(\text{real}|X=\mathbf{x})\mathbb{P}(X=\mathbf{x})}{\mathbb{P}(\text{real})}\Big/ \frac{\mathbb{P}(\text{generated}|X=\mathbf{x})\mathbb{P}(X=\mathbf{x})}{\mathbb{P}(\text{generated})}\\ &=\frac{\mathbb{P}(\text{real}|X=\mathbf{x})}{\mathbb{P}(\text{generated}|X=\mathbf{x})}\approx\frac{D_{\phi}(\mathbf{x})}{1-D_{\phi}(\mathbf{x})} \end{aligned} $$
$$
D_{\phi}(\mathbf{x})\approx\frac{P_{\text{data}}(\mathbf{x})}{P_{\text{data}}(\mathbf{x})+P_{G}(\mathbf{x})} $$
Discriminator
network $D_{\phi}$ ํ์ต์ ์ํ learning objective๋ฅผ designํ ์ ์๋ค!
loss function $l$์ real data ๋ถ๋ถ๊ณผ $G_\theta$ ๋ถ๋ถ์ผ๋ก ๋ถํด
$$
\ell_{\text{data}}(\mathbf{x},\phi)=-\log D_{\phi}(\mathbf{x}),\quad \ell_{G}(\hat{\mathbf{x}},\phi)=-\log(1-D_{\phi}(\hat{\mathbf{x}})) $$
loss function $L(\phi, \theta)$
$$ \begin{aligned} L(\phi,\theta)&:=\mathbb{E}{\mathbf{x}\sim P{\text{data}},\hat{\mathbf{x}}\sim P_{G_{\theta}}}[\ell_{\text{data}}(\mathbf{x},\phi)+\ell_{G}(\hat{\mathbf{x}},\phi)]\\ &=\frac{1}{2}\mathbb{E}{\mathbf{x}\sim P{\text{data}}}[\ell_{\text{data}}(\mathbf{x},\phi)]+\frac{1}{2}\mathbb{E}{\hat{\mathbf{x}}\sim P{G_{\theta}}}[\ell_{G}(\hat{\mathbf{x}},\phi)]\\ &=\frac{1}{2}\int_{\mathcal{X}{\text{data}}} \ell{\text{data}}(\mathbf{x},\phi)P_{\text{data}}(\mathbf{x})\text{d}\mathbf{x}+\frac{1}{2}\int_{\widehat{\mathcal{X}}{\text{data}}}\ell{G}(\hat{\mathbf{x}},\phi)P_{G_{\theta}}(\hat{\mathbf{x}})\text{d}\hat{\mathbf{x}} \end{aligned} $$
์ด๋, generator $G_\theta$๊ฐ realistic data๋ฅผ ์ ์์ฑํ๊ฒ ๋๋ค๋ฉด, i.e., $\mathcal{X}{\text{data}}\approx\widehat{\mathcal{X}}{\text{data}}$ ๋ ์ ๋ถ ํญ์ ํฉ์น ์ ์์.
$$
\begin{aligned} \frac{1}{2}&\int_{\mathcal{X}{\text{data}}}[\ell{\text{data}}(\mathbf{x},\phi)P_{\text{data}}(\mathbf{x})\text{d}\mathbf{x}+\ell_{G}(\mathbf{x},\phi)P_{G_{\theta}}(\mathbf{x})]\text{d}\mathbf{x} \\ &=-\frac{1}{2}\int_{\mathcal{X}{\text{data}}}P{\text{data}}(\mathbf{x})\log D_{\phi}(\mathbf{x})+P_{G_{\theta}}(\mathbf{x})\log(1-D_{\phi}(\mathbf{x}))\text{d}\mathbf{x} \end{aligned} $$
์์ objective function์ ์ต์ํํ๋ optimal Discriminator
For $a,b>0$ and $z\in (0,1)$,
$$
\begin{aligned} \frac{\text{d}}{\text{d} z}(a\log z+b\log(1-z))&=\frac{a}{z}-\frac{b}{1-z}=0\quad \Rightarrow \quad z=\frac{a}{a+b} \end{aligned} $$
the minimizer of the above objective function:
$$
D_{\ast}(\mathbf{x})=D_{\phi_{\ast}}(\mathbf{x})=\frac{P_{\text{data}}(\mathbf{x})}{P_{\text{data}}(\mathbf{x})+P_{G_{\theta}}(\mathbf{x})} $$
plug $\phi_{\ast}(\theta)$ into $\phi$ of the above objective function:
$$ \begin{aligned} L(\phi_{\ast}(\theta),\theta)&=\log 2-\frac{1}{2}\mathbb{KL}\left(P_{\text{data}}\Big\|\frac{P_{\text{data}}+P_{G_{\theta}}}{2}\right)-\frac{1}{2}\mathbb{KL}\left( P_{G_{\theta}}\Big\|\frac{P_{\text{data}}+P_{G_{\theta}}}{2}\right)\\ &=\log2 - \mathbb{JS}(P_{\text{data}}\|P_{G_{\theta}}) \end{aligned} $$