<aside> 📌 Probabilistic Generative Models
</aside>
<aside> ⬇️ 이번 학습 목표
Generative modeling =
1️⃣ estimating the data distribution
2️⃣ performing sampling in situations where we do not know the data distribution (or density) exactly
Gaussian mixture model (GMM) e.g.
<aside> ⭐ Gaussian mixture model
$$
\widehat{P}(\mathbf{x})=\sum_{k=1}^{K}\pi_{k}\mathcal{N}(\mathbf{x}|\mu_{k},\Sigma_{k}),\quad \sum_{k=1}^{K}\pi_{k}=1 $$
전통적인 density estimation 방법
$K$ = mixtures 개수
$\pi_k$ = Gaussian mixture에 할당된 weight
👍 flexible, 정확하게 approximate 가능 </aside>
$z_{k}\in\{0, 1\}$: 각각의 data pt.에 대해 어떤 mixture를 선택할지에 대한 binary random variable
$$
\widehat{P}(\mathbf{x}|\mathbf{z})=\prod_{k=1}^{K}\mathcal{N}(\mathbf{x}|\mu_{k},\Sigma_{k})^{z_{k}},\quad \widehat{P}(\mathbf{z})=\prod_{k=1}^{K}\pi_{k}^{z_{k}} $$
For a given data $\mathcal{D}=\{\mathbf{x}{1},\ldots,\mathbf{x}{N}\}$, the log-likelihood function =
$$
\ln \widehat{P}(\mathcal{D})=\sum_{n=1}^{N}\ln\left(\sum_{k=1}^{K}\pi_{k}\mathcal{N}(\mathbf{x}{n}|\mu{k},\Sigma_{k})\right) $$
Expectation-Maximization (EM) algorithm으로 MLE 문제 풀 수 있음
Set the posterior probabilities (or responsibilities)
$$
\gamma(z_{nk}):=\frac{\widehat{P}(\mathbf{x}{n},z{k}=1)}{\widehat{P}(\mathbf{x}{n})}=\frac{\pi{k}\mathcal{N}(\mathbf{x}{n}|\mu{k},\Sigma_{k})}{\sum_{j}\pi_{j}\mathcal{N}(\mathbf{x}{n}|\mu{j},\Sigma_{j})},\quad 1\leq n\leq N,1\leq k \leq K
$$
$\mathbf{x}_{n}$이라는 data pt.가 주어졌을 때 $k$번째 mixture가 선택될 확률 → $k$번째 mixture가 가지는 일종의 설명력 → 특정 mixture $k$에 들어갈 확률
= (n번째 data가 k번째에 들어갈 확률) / (전체 density)
$\pi_k$, $\mu_k$, $\Sigma_k$ 다 알아야 계산 가능 → MLE 통해서 formula 찾을 것!
$ln\hat{P}(\mathcal{D})$ 미분
w.r.t $\mu_k$ and setting it to zero
$$
N_{k}=\sum_{n=1}^{N}\gamma(z_{nk}),\quad \mu_{k}=\frac{1}{N_{k}}\sum_{n=1}^{N}\gamma(z_{nk})\mathbf{x}_{n} $$
w.r.t $\Sigma_k$ and setting it to zero
$$
\Sigma_{k}=\frac{1}{N_{k}}\sum_{n=1}^{N}\gamma(z_{nk})(\mathbf{x}{n}-\mu{k})(\mathbf{x}{n}-\mu{k})^{\top},\quad \pi_{k}=\frac{N_{k}}{N} $$
[Algorithm] (Expectation-Maximization for Gaussian Mixture Models)
<aside> ⌨️ Repeat:
(E-step) For $\mathbf{x}_{n}\in \mathcal{D}:$
For $k\in\{1,\ldots,K\}:$
compute $\gamma(z_{nk})$ with the parameters $\mu_{k},\Sigma_{k},\pi_{k}$.
(M-step) For $k\in\{1,\ldots,K\}:$
update $N_{k},\mu_{k},\Sigma_{k},\pi_{k}$.
Return $\{\mu_{k},\Sigma_{k},\pi_{k}:k=1,\ldots K\}$.
</aside>
문제점
➡️ GMM, EM은 이미지와 같은 high-dim 데이터에서는 쓰지 않는 것이 좋음 ~
교훈
: $z$를 잘 쓰면 어떨까? 저 녀석이 실제로 표현할 수 있는 range를 넓혀보는 게 어떨까?
→ Latent Modeling의 시작!
Mixture model approach
어떻게 Latent variables $\mathbf{z}\in\mathcal{Z}_{\text{latent}}$를 이해할 수 있을까?
$\widehat{P}(\mathbf{z}|\mathbf{x})$: the posterior distribution
Representation
결론
: latent space에 확률 분포를 할당함으로써 두 데이터 공간의 차이를 추정할 수 있다!