[DL] Tensor: dtype 변경(casting) 및 shape 변경.

Tensor를 추상화하고 있는 class로는

numpy.array: numpy의 ndarray
torch.tensor
tensorflow.constant: (or tensorflow.Variable)

이 있음.

이들은 Python의 sequence types과 달리 일반적으로 다음과 같은 특징을 지님.

데이터들이 연속적으로 할당되는 특징의 c,c++의 array와 매우 유사함.
element들이 homogeneous인 특징을 가짐 (같은 크기의 unboxed object로 구성=같은 type들)

dtyep 변경

dtype는 The data type of element 를 가르키며, tensor에서 element의 type을 가르킴.

numpy나 tensorflow, torch등이 지원하는 dtype는 다음 URL을 참고.
https://dsaint31.tistory.com/456

[Programming] Primitive Data Type : C, C++, NumPy, Torch

Primitive Data Type이(Unboxed type)란? C, C++, NumPy, Torch, TensorFlow 등에서 numeric data type들은 보통 unboxed type이라고도 불리는 primitive data type들이다. unboxed type에서는 할당된 메모리 bit들이 해당 numeric data type

dsaint31.tistory.com

이 dtype을 바꾸는 방법은 다음과 같음.

참고로, 바꾸는 원본 tensor인스턴스를 기반으로 원하는 dtype로 구성된 새로운 tensor인스턴스가 생성됨
(underlying memory block을 공유하지 않음.)

NumPy의 경우:astype
- ndarray.astype(desired_dtype): numpy 의 ndarray 인스턴스의 method astype를 사용.
- np.uint8(src_array): np모듈에서 각 dtype의 이름에 해당하는 function(함수)를 통해 변경 가능 (이 경우 np.uint8로 casting)
PyTorch의 경우
- torch.tensor.type(desired_dtype): torch의 tensor 인스턴스의 method type를 사용.
- tensor.to(desired_dtype): torch의 tensor 인스턴스의 method to를 사용.
  해당 메서드는 cpu나 gpu등의 다른 device로 tensor를 이동시키는 경우에 주로 활용됨.
- tensor.float(), tensor.int() 와 같은 단축메서드도 지원.

2025.03.14 - [Python] - [PyTorch] dtype 단축메서드로 바꾸기

[PyTorch] dtype 바꾸기

아래의 URL에서 간단히 다룬 단축 method들을 이용한 방식 (to나 type이 아닌)을 설명하는 문서임.2024.03.15 - [Python] - [DL] Tensor: dtype 변경(casting) 및 shape 변경. [DL] Tensor: dtype 변경(casting) 및 shape 변경.Ten

ds31x.tistory.com

TensorFlow의 경우.
- tensorflow.dtypes.cast(src_tensor, desired_dtype): tensorflow의 dtypes모듈의 cast 함수 이용.
- TensorFlow는 메모리 최적화와 효율성을 위해 내부적으로 복잡한 메모리 관리를 수행
- PyTorch나 NumPy처럼 명확하게 메모리 공유 상태를 확인하는 공식 API는 제한적
- 기본적으로 새로 만들어진다고 보면 됨.

NumPy

import numpy as np

a = np.ones((3,3))
b = np.uint8(a)
c = a.astype('float32')
print(f"{c = }")

print(f"{a.dtype = }\n{b.dtype = }\n{c.dtype = }")
print(f"{np.may_share_memory(a,b) = }")
print(f"{np.may_share_memory(a,c) = }")
print(f"{np.may_share_memory(b,c) = }")
# c = array([[1., 1., 1.],
#        [1., 1., 1.],
#        [1., 1., 1.]], dtype=float32)
# a.dtype = dtype('float64')
# b.dtype = dtype('uint8')
# c.dtype = dtype('float32')
# np.may_share_memory(a,b) = False
# np.may_share_memory(a,c) = False
# np.may_share_memory(b,c) = False


c[0,0] = 1000
print(f"{a = }")
print("---------")
print(f"{c = }")
# a = array([[1., 1., 1.],
#        [1., 1., 1.],
#        [1., 1., 1.]])
# ---------
# c = array([[1000.,    1.,    1.],
#        [   1.,    1.,    1.],
#        [   1.,    1.,    1.]], dtype=float32)

PyTorch

import torch

def share_memory(a,b):
  return a.untyped_storage().data_ptr() == b.untyped_storage().data_ptr()

a_torch = torch.rand(3,4)
b_torch = a_torch.to(torch.uint8)
c_torch = a_torch.type(torch.float64)
print(f"{a_torch.dtype = }\n{b_torch.dtype = }\n{c_torch.dtype = }")
# a_torch.dtype = torch.float32
# b_torch.dtype = torch.uint8
# c_torch.dtype = torch.float64

print("-----------")
print(f"{share_memory(a_torch,b_torch) = }")
print(f"{share_memory(a_torch,c_torch) = }")
print(f"{share_memory(b_torch,c_torch) = }")
# share_memory(a_torch,b_torch) = False
# share_memory(a_torch,c_torch) = False
# share_memory(b_torch,c_torch) = False

print("-----------")
b_torch[0,1] = 9
c_torch[0,0] = 1000
print(f"{a_torch = }")
print(f"{b_torch = }")
print(f"{c_torch = }")
# a_torch = tensor([[0.6682, 0.3203, 0.2321, 0.4308],
#         [0.3427, 0.4043, 0.5303, 0.2094],
#         [0.6293, 0.5890, 0.7244, 0.0167]])
# b_torch = tensor([[0, 9, 0, 0],
#         [0, 0, 0, 0],
#         [0, 0, 0, 0]], dtype=torch.uint8)
# c_torch = tensor([[1.0000e+03, 3.2030e-01, 2.3209e-01, 4.3078e-01],
#         [3.4268e-01, 4.0429e-01, 5.3032e-01, 2.0943e-01],
#         [6.2928e-01, 5.8904e-01, 7.2442e-01, 1.6732e-02]], dtype=torch.float64)

print(f"{a_torch = }")
c_torch[0,0] = 777.
print(f"{a_torch = }")
print(f"{c_torch = }")
# a_torch = tensor([[0.6682, 0.3203, 0.2321, 0.4308],
#         [0.3427, 0.4043, 0.5303, 0.2094],
#         [0.6293, 0.5890, 0.7244, 0.0167]])
# a_torch = tensor([[0.6682, 0.3203, 0.2321, 0.4308],
#         [0.3427, 0.4043, 0.5303, 0.2094],
#         [0.6293, 0.5890, 0.7244, 0.0167]])
# c_torch = tensor([[7.7700e+02, 3.2030e-01, 2.3209e-01, 4.3078e-01],
#         [3.4268e-01, 4.0429e-01, 5.3032e-01, 2.0943e-01],
#        [6.2928e-01, 5.8904e-01, 7.2442e-01, 1.6732e-02]], dtype=torch.float64)


d_torch = a_torch.to(torch.float64)
print(f"{share_memory(a_torch,d_torch) = }")
# share_memory(a_torch,d_torch) = False

e_torch = a_torch.float()
print(f"{a_torch.dtype = }")
print(f"{e_torch.dtype = }")
print(f"{share_memory(a_torch,e_torch) = }")
# a_torch.dtype = torch.float32
# e_torch.dtype = torch.float32
# share_memory(a_torch,e_torch) = True

f_torch = a_torch.int()
print(f"{a_torch.dtype = }")
print(f"{f_torch.dtype = }")
print(f"{share_memory(a_torch,f_torch) = }")
# a_torch.dtype = torch.float32
# f_torch.dtype = torch.int32
# share_memory(a_torch,f_torch) = False

TensorFlow

import tensorflow as tf

a_tf = tf.random.uniform(shape=(3,4))
c_tf = tf.dtypes.cast(a_tf, tf.float64)
print(f"{a_tf.dtype = }\n{c_tf.dtype = }")
# a_tf.dtype = tf.float32
# c_tf.dtype = tf.float64

# # not working
# print(f"{tf.experimental.numpy.shares_memory(a_tf,c_tf) = }")

shape 변경

shape는 tensor의 각 축의 크기를 나타내는 sequence type의 인스턴스임.
즉, tensor의 크기와 형태를 나타냄.

tensor에서 전체 element의 수에 맞는 다양한 shape를 가지도록 변경 가능하며,
다음의 방법으로 변경됨.

numpy의 방법
- numpy.reshape(src_ndarray, desired_shape) : numpy모듈의 reshape 함수 이용.
- numpy.array.reshape(desired_shape): numpy의 ndarray인스턴스의 reshape 메서드 이용.
pytorch의 방법
- torch.reshape(src_tensor, desired_shape): torch 모듈의 reshape 함수 이용.
- torch.tensor.reshape(desired_shape): tensor 인스턴스의 reshape 메서드 사용.
- 주의:
  - 같은 역할의 `view` 메소드와 달리 contiguous 하지 않은 tensor 인스턴스에도 적용가능함
    - tranpose된 tensor인스턴스에 reshape를 하는 경우를 위의 예로 들 수 있음.
  - 단 이 경우에는 데이터를 공유하지 않게 된다.
  - 일반적으로 contiguous한 tensor 인스턴스에서 reshape를 수행 시 shape만 바뀐 것일 뿐 서로 데이터를 공유함. 즉, 한쪽이 변경되면 다른 쪽도 변경됨
tensorflow의 방법
- tensorflow.reshape(src_tensor, desired_shape): tensorflow 모듈의 reshape 함수 사용.
- 주의: 다른 라이브러리와 달리 tensor 인스턴스의 method로 수정하지 않음.
- TensorFlow는 메모리 최적화와 효율성을 위해 내부적으로 복잡한 메모리 관리를 수행
- PyTorch나 NumPy처럼 명확하게 메모리 공유 상태를 확인하는 공식 API는 제한적
- 기본적으로 새로 만들어진다고 보면 됨.

NumPy

import numpy as np

a = np.arange(0,10,1) # [ s:e :step_size]
b = a.reshape((2,5))
print(f"{a.shape = }\n{b.shape = }")
print(f"{np.may_share_memory(a,b) = }")
# a.shape = (10,)
# b.shape = (2, 5)
# np.may_share_memory(a,b) = True

c = np.reshape(a,(5,2))
print(f"{c.shape = }")
print(f"{np.may_share_memory(a,c) = }")
# c.shape = (5, 2)
# np.may_share_memory(a,c) = True

c[0,0] = 1000
print(f"{a = }")
print(f"{b = }")
print(f"{c = }")
# a = array([1000,    1,    2,    3,    4,    5,    6,    7,    8,    9])
# b = array([[1000,    1,    2,    3,    4],
#        [   5,    6,    7,    8,    9]])
# c = array([[1000,    1],
#        [   2,    3],
#        [   4,    5],
#        [   6,    7],
#        [   8,    9]])

PyTorch

import torch

def share_memory(a,b):
  return a.untyped_storage().data_ptr() == b.untyped_storage().data_ptr()

a_torch = torch.arange(0,10,1)
b_torch = a_torch.reshape((2,5))
print(f"{a_torch.shape = }\n{b_torch.shape = }")
print(f"{share_memory(a_torch,b_torch) = }")
# a_torch.shape = torch.Size([10])
# b_torch.shape = torch.Size([2, 5])
# share_memory(a_torch,b_torch) = True


c_torch = torch.reshape(a_torch,(5,2))
print(f"{c_torch.shape = }")
print(f"{share_memory(a_torch,c_torch) = }")
# c_torch.shape = torch.Size([5, 2])
# share_memory(a_torch,c_torch) = True

# --------------------
c_torch[0,0] = 1000

print(f"{a_torch = }")
print(f"{b_torch = }")
print(f"{c_torch = }")
# a_torch = tensor([1000,    1,    2,    3,    4,    5,    6,    7,    8,    9])
# b_torch = tensor([[1000,    1,    2,    3,    4],
#         [   5,    6,    7,    8,    9]])
# c_torch = tensor([[1000,    1],
#         [   2,    3],
#         [   4,    5],
#         [   6,    7],
#         [   8,    9]])

TensorFlow

import tensorflow as tf

a_tf = tf.range(0,10,1)
b_tf = tf.reshape(a_tf,(2,5))
# b_tf = a_tf.reshape((2,5)) # not working

print(f"{a_tf.shape = }\n{b_tf.shape = }")
# a_tf.shape = TensorShape([10])
# b_tf.shape = TensorShape([2, 5]) 

# # not working! TensorFlow에선 공유메모리 확인이 어려움.
# print(f"{tf.experimental.numpy.shares_memory(a_tf,b_tf) = }")

c_tf = tf.reshape(a_tf,(5,2))
print(f"{c_tf.shape = }")
# c_tf.shape = TensorShape([5, 2])

# 변경하고 싶은 위치와 값을 정의
indices = tf.constant([[0, 0]]) # (2, 2) 위치를 변경하고자 함
updates = tf.constant([999]) # 해당 위치에 넣고 싶은 값

# 업데이트 적용
d_tf = tf.tensor_scatter_nd_update(c_tf, indices, updates)

print(f"{a_tf = }")
print(f"{b_tf = }")
print(f"{c_tf = }")
print(f"{d_tf = }")

# a_tf = <tf.Tensor: shape=(10,), dtype=int32, numpy=array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=int32)>
# b_tf = <tf.Tensor: shape=(2, 5), dtype=int32, numpy=
# array([[0, 1, 2, 3, 4],
#        [5, 6, 7, 8, 9]], dtype=int32)>
# c_tf = <tf.Tensor: shape=(5, 2), dtype=int32, numpy=
# array([[0, 1],
#        [2, 3],
#        [4, 5],
#        [6, 7],
#        [8, 9]], dtype=int32)>
# d_tf = <tf.Tensor: shape=(5, 2), dtype=int32, numpy=
# array([[999,   1],
#        [  2,   3],
#        [  4,   5],
#        [  6,   7],
#        [  8,   9]], dtype=int32)>

https://gist.github.com/dsaint31x/9e390d73d788766af4f17e2a9e1f6159

dl_tensor_dtype_reshape.ipynb

dl_tensor_dtype_reshape.ipynb. GitHub Gist: instantly share code, notes, and snippets.

gist.github.com

2024.03.21 - [Python] - [DL] Storage: PyTorch 텐서를 위한 메모리 관리

[DL] Storage: PyTorch 텐서를 위한 메모리 관리

Storage는 Tensor 인스턴스의 실제 데이터가 저장되는 1D Numerical Array를 관리함. 여러 Tensor 인스턴스들이 같은 storage를 공유할 수 있음. Storage는 memory에서 contiguous data block를 관리하며, 컴퓨터의 memory

ds31x.tistory.com

2024.09.09 - [Python] - [NumPy] 생성 및 초기화, 기본 조작 (1)

[NumPy] 생성 및 초기화, 기본 조작 (1)

1. ndarray 생성하기 (=tensor생성하기)np.array ( seq [,dtype])list 나 tuple 등의 sequence 객체로부터 ndarray 생성.dtype : data type of element.float64 : default type in the numpy. *uint8 : unsigned int (8bit), the most commonly used for im

ds31x.tistory.com

'Python' 카테고리의 다른 글

[DL] Tensor: Transpose and Permute (2)	2024.03.16
[DL] Tensor 객체의 attributes: ndim, shape, dtype (0)	2024.03.15
[DL] Tensor 간의 변환: NumPy, PyTorch, TensorFlow (0)	2024.03.15
[Python] importlib.util.find_spec() (0)	2024.03.08
[Programming] glue code and (language) binding (0)	2024.03.04

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

ds31x

[DL] Tensor: dtype 변경(casting) 및 shape 변경.

dtyep 변경

이 dtype을 바꾸는 방법은 다음과 같음.

NumPy

PyTorch

TensorFlow

shape 변경

NumPy

PyTorch

TensorFlow

'Python' 카테고리의 다른 글

티스토리툴바

개인정보

단축키

내 블로그

블로그 게시글

모든 영역

[DL] Tensor: dtype 변경(casting) 및 shape 변경.

dtyep 변경

이 dtype을 바꾸는 방법은 다음과 같음.

NumPy

PyTorch

TensorFlow

shape 변경

NumPy

PyTorch

TensorFlow

'Python' 카테고리의 다른 글

관련글

티스토리툴바

개인정보

단축키

내 블로그

블로그 게시글

모든 영역