[DL] Tensor: Indexing <Simple, Slicing, Fancy, Boolean Mask>

numpy나 pytorch, tensorflow의 텐서들도

파이썬의 list 또는 tubple 에서의 indexing과 slicing이 거의 그대로 사용됨.

2023.07.12 - [Python] - [Python] list (sequence type) : summary

[Python] list (sequence type) : summary

list는 ordered mutable collection으로, collection을 위한 python data type들 중 가장 많이 사용된다. C에서의 array와 같이 가장 기본적인 collection임. 단, heterogeneous item을 가질 수 있으며, 여러 methods를 가지는

ds31x.tistory.com

단, multi-dimension에서 대가로 안에서 comma로 구분하여 각 축에 대한 indexing과 slicing을 할 수 있다는 차이가 있음.

1. Simple Indexing and Slicing

1-1. numpy의 ndarray

가장 기본이 되는 indexing 을 지원함.
arr[x,y,z] 와 arr[x][y][z] 둘 다 사용가능함.

a = np.arange(0,12).reshape(3,4)

print(a)
print('==================')
print(f'a[0] is "{a[0]}"')       # 첫번째 행.
print(f'a[0,2] is "{a[0,2]}"')   # 2
print(f'a[0][2] is "{a[0][2]}"') # 2
print('------------------')
print(f'a[1,2:] is "{a[1,2:]}"') # slicing의 활용. 
print(f'a[1,::2] is "{a[1,::2]}"')
print(f'a[1,::-2] is "{a[1,::-2]}"')
print(f'a[1,::-1] is "{a[1,::-1]}"')
print(f'a[1,3:0:-1] is "{a[1,3:0:-1]}"')

1-2. pytorch의 tensor

negative step이 동작하지 않음.

a = np.arange(0,12).reshape(3,4)
a_torch = torch.tensor(a)

print(a_torch)
print(f'a_torch[0] is "{a_torch[0]}"')       # 첫번째 행.
print(f'a_torch[0,2] is "{a_torch[0,2]}"')   # 2
print(f'a_torch[0][2] is "{a_torch[0][2]}"') # 2

print(f'a_torch[1,2:] is "{a_torch[1,2:]}"')
print(f'a_torch[1,::2] is "{a_torch[1,::2]}"')
# print(f'a_torch[1,::-2] is "{a_torch[1,::-2]}"')
# print(f'a_torch[1,::-1] is "{a_torch[1,::-1]}"')
# print(f'a_torch[1,3:0:-1] is "{a_torch[1,3:0:-1]}"')

아래의 3개 statement는 동작하지 않음: negative step

1-3. tensorflow의 constant (텐서)

텐서 인스턴스가 immutable이라는 것 이외에는 numpy와 유사함.

a = np.arange(0,12).reshape(3,4)
a_tf = tf.constant(a)

print(a_tf)
print(f'a_tf[0] is "{a_tf[0]}"')       # 첫번째 행.
print(f'a_tf[0,2] is "{a_tf[0,2]}"')   # 2
print(f'a_tf[0][2] is "{a_tf[0][2]}"') # 2

print(f'a_tf[1,2:] is "{a_tf[1,2:]}"')
print(f'a_tf[1,::2] is "{a_tf[1,::2]}"')
# print(f'a_tf[1,::-2] is "{a_tf[1,::-2]}"')
# print(f'a_tf[1,::-1] is "{a_tf[1,::-1]}"')
# print(f'a_tf[1,3:0:-1] is "{a_tf[1,3:0:-1]}"')

2. Fancy Indexing

단순한 scalar index나 slicing가 달리,

Fancy indexing 은

index들의 tensor 를 square bracket 내에 기재하여,
여러 elements를 한번에 선택함.

https://dsaint31.tistory.com/374

[NumPy] Fancy Indexing

NumPy에서 indexing은 4가지 방식을 따름. scalar를 이용한 indexing ( simple indexing ) : array[0] slicing boolean mask : array[array > 1] fancy indexing : vectorized indexing. index들을 element로 가지는 array를 넘겨줌. combined indexi

dsaint31.tistory.com

Fancy indexing에서

index array는 행 or 열 (정확히는 각 axis에서의) index를 나타내는 integer 로 이루어진 seqeunce type 인스턴스를 사용함.

tensorflow의 경우,
tf.gather 와 tf.gather_nd 를 통해
fancy indexing 와 유사한 기능을 제공하나
직접적으로 fancy indexing을 지원하지 않음.

다음 예는 1d tensor에서 fancy indexing을 사용하는 것을 보여줌.

x = np.array([10.,20.,30.,40.,50.])
x_torch = torch.tensor([10.,20.,30.,40.,50.])
x_tf = tf.constant([10.,20.,30.,40.,50.])

f_indices = [3, 4, 1]

print('original:')
print(x)
print('----------')
print('numpy:')
print(x[f_indices])
print('----------')
print('torch:')
print(x_torch[f_indices])
print('----------')
print('tensorflow:')
print(tf.gather(x_tf,f_indices)) #1D 에선 gahter, 2D 이상시 gather_nd
print(tf.gather_nd(x_tf, [ i for i in zip(f_indices,)])) # 굳이 쓴다면, 다음과 같이.

tensorflow의 경우, fancy indexing을 직접적으로 지원하지 않으며
tf.gather 와 tf.gather_nd 함수를 통해 같은 동작을 수행할 수 있음.

다음 예는 2d tensor에서 fancy indexing을 사용하는 것을 보여줌.

x = np.arange(5*5).reshape(5,5) * 10
x_torch = torch.arange(5*5).view(size=(5,5)) * 10
x_tf = tf.constant(x)

indices_0 = [0, 1, 2]
indices_1 = [0, 1, 2]

print('original:')
print(x)
print('----------')
print('numpy:')
b = x[indices_0, indices_1]
print('b.shape =',b.shape)
print(b)
print('----------')
print('torch:')
c = x_torch[indices_0, indices_1]
print('c.shape =',c.shape)
print(c)
print('----------')
print('tensorflow:')
d = tf.gather_nd(x_tf, [ i for i in zip(indices_0, indices_1)])
print('d.shape =',d.shape)
print(d)

tensorflow의 경우, fancy indexing을 직접적으로 지원하지 않으며
tf.gather_nd 함수를 통해 같은 동작을 수행할 수 있음.

다음 예는 3d tensor에서 fancy indexing을 사용하는 것을 보여준다.

각각의 축에서 index array를 접근하려는 elements 에 맞게 설정함

x = np.arange(5*5*5).reshape(5,5,5) * 10
x_torch = torch.arange(5*5*5).view(size=(5,5,5)) * 10
x_tf = tf.constant(x)

indices_0 = [0, 1] # x
indices_1 = [1, 2] # y
indices_2 = [2, 0] # z

print('original:')
print(x)
print('----------')
print('numpy:')
b = x[indices_0, indices_1, indices_2]
print('b.shape=',b.shape)
print(b)
print('----------')
print('torch:')
c = x_torch[indices_0, indices_1, indices_2]
print('c.shape=',c.shape)
print(c)
print('----------')
print('tensorflow')
d = tf.gather_nd(x_tf, [ i for i in zip(indices_0, indices_1, indices_2)]) # multi-dim 에선 gater_nd 임.
print('d.shape=',d.shape)
print(d)

tensorflow의 경우, fancy indexing을 직접적으로 지원하지 않으며
tf.gather_nd 함수를 통해 같은 동작을 수행할 수 있음.

3. Boolean Mask

대상이 되는 tensor와 같은 shape를 가지는 boolean mask의 tensor를 통해 특정 element를 선택할 수 있음.

텐서 인스턴스에 비교 연산자를 적용하여 boolean mask를 얻을 수 있음 (다음 예에서 b가 boolean mask인 tensor 인스턴스임)
해당 텐서 인스턴스에 관계(relative, 비교)연산자으로 구성된 expression(=condition이라고 불림)을 "index가 기재되는 square bracket 안에 넣는 방식"으로의 활용이 많음.

아래 예는 numpy의 ndarray 인스턴스에서의 활용을 보여줌.

x = np.arange(3*3*3).reshape(3,3,3) * 10

print('original:')
print(x)
print('----------')
print('boolean mask:')
b = x <= 270/2
print(b.shape)
print(b)
print('----------')
print('x <= 135')
print(x[b])
print('----------')
print(x[x<=270/2])
print('----------')
print('----------')
print('x <= 135 | x>= 200')
b1 = b | (x >= 200)
print('----------')
print('boolean mask')
print(b1)
print('----------')
print(x[b1])
print('----------')
print(x[ (x<=270/2) | (x>=200)])

boolean mask를 사용하는 방법은 조건을 만족하는 elements를 찾는데 주로 이용됨..
비교 연산자를 텐서에 사용할 경우, 대상 텐서와 같은 shape의 boolean mask 를 얻게 됨.
boolean mask를 변수에 할당한 후 처리하는 것보다 조건을 square bracket 안에 넣어주는 경우가 더 많음.

위의 경우에서 135 이하 또는 200 이상인 elements를 선택하는 동작을

torch와 tensorflow의 tensor 인스턴스로 수행하는 것을 아래의 예에서 보여줌.

x_torch = torch.arange(3*3*3).view(size=(3,3,3)) * 10
print(x_torch[ (x_torch<=270/2) | (x_torch>=200)])

print('--------------')

x_tf = tf.constant(x)
print(x_tf[ (x_tf<= tf.cast(270/2, tf.int64)) | (x_tf>=200)])

3.1 특정 조건에 맞는 element의 index 자체를 얻기: np.where

위의 예에서는 조건에 맞는 value에 접근하는 방법을 보여줌.

만약 해당 조건에 맞는 value들이 있는 index를 얻고자 한다면, np.where 를 사용하면 된다.

2024.03.19 - [Python] - [ML] where: numpy 의 idx찾기

[ML] where: numpy 의 idx찾기

numpy에서 ndarray 인스턴스에서 특정 조건을 만족하는 elements의 위치(index, idx)를 찾는 기능을 numpy 모듈의 where 함수가 제공해줌. 기본적으로 numpy에서 index를 나타내는 방식은 각각의 축마다 해당 축

ds31x.tistory.com

위의 예제 코드들을 수행해 본 ipynb 임. (combined indexing을 포함함)

https://gist.github.com/dsaint31x/eb7a1fcc729ba3f349d7259002318976

fancyindexing.ipynb

fancyindexing.ipynb. GitHub Gist: instantly share code, notes, and snippets.

gist.github.com

'Python' 카테고리의 다른 글

[Tensor] vectorized op. (or universal func) (0)	2024.03.19
[ML] where: numpy 의 idx찾기 (2)	2024.03.19
[DL] Tensor: Transpose and Permute (2)	2024.03.16
[DL] Tensor 객체의 attributes: ndim, shape, dtype (0)	2024.03.15
[DL] Tensor: dtype 변경(casting) 및 shape 변경. (0)	2024.03.15