[DL] torch.nn.Linear 에 대하여

728x90

torch.nn.Linear는 PyTorch에서 선형 변환(linear transformation)을 수행하는 핵심 Module임.

다음의 이름으로도 불림.

Fully Connected Layer (FC Layer)
Dense Layer

Constructor (생성자)

torch.nn.Linear(
    in_features, 
    out_features, 
    bias=True,
)

TensorFlow의 Dense와 유사하나, activation등을 내장하고 있지 않음.

파라미터	설명
`in_features`	input tensor의 마지막 차원 크기 (int)
`out_features`	output tensor의 마지막 차원 크기 (int)
`bias`	bias 사용 여부 (기본값: `True`)

수학적 정의

linear module은 다음의 연산을 수행:

$$\mathbf{y} = \mathbf{x} \mathbf{W}^\top + \mathbf{b}$$

$\mathbf{x}$: input tensor, shape = (..., in_features)
$\mathbf{W}$: weight tensor, shape = (out_features, in_features)
$\mathbf{b}$: bias tensor, shape = (out_features,)
$\mathbf{y}$: output tensor, shape = (..., out_features)

여기서 ...는 임의의 batch 구조로,
1개 이상의 앞차원을 가질 수 있음.

input tensor 는 2D 이상의 ndim 을 가지면 됨.

많은 경우, nn.Linear의 입력은 2D 텐서로 사용되나,
실제로는 마지막 차원이 in_features와 같고 2D 이상이면 input tensor로 사용가능함.

Example:

import torch
import torch.nn as nn

x = torch.randn(32, 10, 16)  # 배치 32, 시퀀스 길이 10, feature 16
fc = nn.Linear(16, 8)
y = fc(x)

print(y.shape)  # torch.Size([32, 10, 8])

위 예제 코드에서 linear transform은 마지막 차원(16에서 8로 변환)에 대해서만 적용됨

나머지 앞 차원(32,10)은 유지됨.

Parameters (trainable)

weight: shape = (out_features, in_features)
bias: shape = (out_features,) (optional)

앞서의 예제 코드를 이어서 다음을 확인할 수 있음.

print(fc.weight.shape)  # torch.Size([8, 16])
print(fc.bias.shape)    # torch.Size([8])

이들은 모두 nn.Parameter 객체
이들이 속한 module 객체 또는 해당 module객체를 attribute로 가지는 부모 module 객체의 parameters() 메서드를 사용하여, optim 객체에 넘겨져 학습됨.
nn.Linear는 nn.Module의 subclass이기 때문에, nn.Module.parameters()를 통해 내부의 학습가능한 parameters들에 대한 iterator를 다음과 같은 방식으로 얻을 수 있음.

일반적인 Module에서의 parameters확인하기.

import torch.nn as nn

model = nn.Linear(10, 5)

for param in model.parameters():
    print(param.shape)

parameter의 이름과 값을 함께 얻고 싶을 때엔 nn.Module.named_parameters()를 이용하면 됨.

(name, parameter) 쌍의 tuple 을 차례로 생성하는 generator를 반환함.

for name, param in model.named_parameters():
    print(name, param.shape)

Activation Function과 연결

Linear는 Linear Transform 만 수행하므로, 비선형성을 위한 Activation은 따로 명시적으로 추가해야 함.

import torch.nn.functional as F

x = torch.randn(4, 16)
fc = nn.Linear(16, 8)
out = F.relu(fc(x))  # Linear + ReLU

또는 nn.Sequential로 구조화 가능:

model = nn.Sequential(
    nn.Linear(16, 8),
    nn.ReLU()
)

다양한 활용 예

MLP (Multi Layer Perceptron)
Transformer 모델의 Feed-Forward Network
CNN의 flatten 이후 classifier
시계열 데이터/문자열 처리 (RNN 후 처리)

Summary

항목	설명
핵심 기능	입력 tensor의 마지막 차원에 linear transform 적용
input tensor	마지막 차원의 크기가 `in_features` 인 텐서 (2D 이상 가능)
output tensor	마지막 차원을 제외할 경우 input tensor와 동일한 차원 유지. 마지막 차원의 크기만 `in_features`에서 `out_features`로 변경
parameters (trainable)	`weight`, `bias` (옵션)
activation	지원하지 않음. 추가적으로 연결하여 사용해야 함 (`ReLU`, `Sigmoid` 등)

같이보면 좋은 자료들

PyTorch 공식 문서: nn.Linear

Linear — PyTorch 2.6 documentation

Shortcuts

pytorch.org

PyTorch 튜토리얼: MLP 구성

Welcome to PyTorch Tutorials — PyTorch Tutorials 2.6.0+cu124 documentation