Deep Learning Fundamentals
Chapter 1: Tensors and Tensor Operations
1.1 Fundamentals of Tensors
Zero-dimensional Tensor (Scalar): A single number, also called a scalar.
One-dimensional Tensor (Vector): A vector with only length and magnitude, represented as an array.
Two-dimensional Tensor (Matrix): Has rows and columns, forming a matrix.
Three-dimensional Tensor: Has rows, columns, and depth. Particularly useful for color images with RGB channels.
Higher-dimensional Tensors: Four-dimensional tensors are collections of 3D tensors, five-dimensional tensors add row/column information, and so on.
Key concept: An n-dimensional tensor is a one-dimensional array where each element is an (n-1)-dimensional tensor.
Tensors in PyTorch support GPU computation and automatic gradient calculation.
Counting dimensions in PyTorch: Count the number of brackets. For example, tensor(1) is zero-dimensional, while tensor([1]) is one-dimensional.
Examples:
# One-dimensional tensor (length 3)
tensor([0.8206, 0.6208, 0.2549])
# Two-dimensional tensor (4ร4 matrix)
tensor([[1., 1., 1., 1.],
[1., 1., 1., 1.],
[1., 1., 1., 1.],
[1., 1., 1., 1.]])
# Three-dimensional tensor (two 2ร3 matrices)
tensor([[[0.8929, 0.0102, 0.2182],
[0.4855, 0.0377, 0.8328]],
[[0.4653, 0.6590, 0.4474],
[0.6367, 0.2073, 0.6972]]])
The 3D tensor above is equivalent to:
a1 = np.array([[0.8929, 0.0102, 0.2182], [0.4855, 0.0377, 0.8328]])
a2 = np.array([[0.4653, 0.6590, 0.4474], [0.6367, 0.2073, 0.6972]])
t3 = torch.tensor([a1, a2])
t3.shape # Output: torch.Size([2, 2, 3])
Terminology:
- Scalar: A single number (0D array). Example: “5” in “5 houses”
- Vector: A list of scalars (1D array). Example: house features like [area, rooms, floor]
- Matrix: A collection of vectors (2D array). Example: m houses with n features each
- Tensor: Generalization of matrices to N dimensions
Tensors in image processing:
- 3D Tensor: Color image with dimensions width ร height ร channels (RGB)
- 4D Tensor: Batch of images with dimensions batch_size ร channels ร width ร height
1.2 Creating Tensors in PyTorch
Tensors can be created directly from data with automatic type inference:
import torch
import numpy as np
# From Python lists
data = [[1, 2], [3, 4]]
x_data = torch.tensor(data)
# From NumPy arrays
np_array = np.array(data)
x_np = torch.from_numpy(np_array)
Creating tensors with similar properties - new tensors retain the shape and datatype unless overridden:
# Create tensor of ones with same shape
x_ones = torch.ones_like(x_data)
# Output: tensor([[1, 1], [1, 1]])
# Create random tensor with same shape but different type
x_rand = torch.rand_like(x_data, dtype=torch.float)
# Output: tensor([[0.4815, 0.7297], [0.8659, 0.5418]])
The rand_like() method fills the tensor with random numbers from a uniform distribution over [0, 1).
Creating tensors with specific shapes:
shape = (2, 5)
rand_tensor = torch.rand(shape) # Random values in [0, 1)
ones_tensor = torch.ones(shape) # All ones
zeros_tensor = torch.zeros(shape) # All zeros
Summary of creation methods:
torch.tensor(data)- from datatorch.from_numpy(array)- from NumPyones_like(),rand_like()- with same shapetorch.rand(),torch.ones(),torch.zeros()- with specific shape
1.3 Tensor Attributes
Every tensor has three key attributes: shape, data type, and device location.
tensor = torch.rand(3, 4) # Create a 3ร4 tensor
print(f"Shape: {tensor.shape}") # torch.Size([3, 4])
print(f"Datatype: {tensor.dtype}") # torch.float32
print(f"Device: {tensor.device}") # cpu
Shape: tensor.shape returns dimensions. torch.Size([3, 4]) means 3 rows and 4 columns.
Data Type: Common PyTorch types include torch.float32 (default floating-point), torch.int64 (integers), and torch.bool (boolean).
Device: Indicates storage location - cpu for CPU memory or cuda:0 for GPU.
Moving tensors to GPU:
if torch.cuda.is_available():
tensor = tensor.to('cuda')
Understanding these attributes is crucial for memory management, ensuring compatible operations, and optimizing computation with GPU acceleration.
1.4 Tensor Indexing and Slicing
Tensors support various operations including transposition, indexing, slicing, mathematical operations, linear algebra, and random sampling. These operations can run on GPU for significantly faster computation.
GPU acceleration: GPUs have lower computing power per core but many more cores than CPUs, making them excellent for parallel computation. In deep learning, GPU computation is much faster than CPU. CUDA (Nvidia’s parallel computing framework) and cuDNN (for deep convolutional networks) enable this acceleration.
Moving tensors to GPU:
if torch.cuda.is_available():
tensor = tensor.to('cuda')
print(f"Device tensor is stored on: {tensor.device}")
Indexing and slicing:
tensor = torch.ones(4, 4) # Create 4ร4 tensor of ones
print(tensor)
# Output: 4ร4 matrix of all ones
tensor[:, 1] = 0 # Set all rows, second column to 0
# Syntax: [first_dimension, second_dimension]
# : means "from start to end with step 1"
print(tensor)
# Output:
# tensor([[1., 0., 1., 1.],
# [1., 0., 1., 1.],
# [1., 0., 1., 1.],
# [1., 0., 1., 1.]])
1.5 Tensor Concatenation
Use torch.cat() to concatenate multiple tensors along a specified dimension.
tensor = torch.ones(4, 4)
tensor[:, 1] = 0
# Concatenate along dimension 1 (columns, horizontally)
t1 = torch.cat([tensor, tensor, tensor], dim=1)
# Result: 4ร12 tensor (three 4ร4 tensors side by side)
Understanding dimensions: For 2D arrays, dim=0 refers to rows (vertical concatenation), dim=1 refers to columns (horizontal concatenation).
More examples:
A = torch.ones(2, 3) # 2ร3 tensor
B = 2 * torch.ones(4, 3) # 4ร3 tensor
# Concatenate along rows (vertically)
C = torch.cat([A, B], dim=0) # Result: 6ร3 tensor
D = 2 * torch.ones(2, 4) # 2ร4 tensor
# Concatenate along columns (horizontally)
E = torch.cat([A, D], dim=1) # Result: 2ร7 tensor
1.6 Tensor Arithmetic Operations
Tensors support element-wise arithmetic operations with automatic broadcasting.
x = torch.tensor([[[1, 2, 3],
[4, 5, 6]],
[[7, 8, 9],
[10, 11, 12]]]) # 3D tensor
y = torch.tensor([1, 2, 3]) # 1D tensor
result = y - x
# Broadcasting applies: y is expanded to match x's shape
# Each row of x is subtracted from y element-wise
1.7 Broadcasting Mechanism
Broadcasting allows PyTorch operations to work on tensors of different shapes without explicitly copying data, enabling efficient computation.
Broadcasting conditions:
- Both tensors must have at least one element
x = torch.empty(0,) # Cannot broadcast - empty
y = torch.empty(2, 2)
x = torch.empty(1,) # Can broadcast - has one element
y = torch.empty(2, 2)
- Dimensions must be compatible when compared right-to-left
Dimensions are compatible if:
- a) They are equal in size
- b) One tensor is missing that dimension
- c) One tensor has size 1 in that dimension
Example:
x = torch.empty(5, 3, 4, 1)
y = torch.empty( 3, 1, 1)
Checking from right to left:
- 4th dimension: both size 1 โ (condition a)
- 3rd dimension: 4 vs 1 โ (condition c)
- 2nd dimension: both size 3 โ (condition a)
- 1st dimension: 5 vs missing โ (condition b)
Broadcasting process:
Step 1 - Align dimensions by adding size-1 dimensions:
# Before alignment:
x = torch.empty(5, 3, 4, 1)
y = torch.empty( 3, 1, 1)
# After alignment:
x = torch.empty(5, 3, 4, 1)
y = torch.empty(1, 3, 1, 1)
Step 2 - Expand size-1 dimensions to match:
# Before expansion:
x = torch.empty(5, 3, 4, 1)
y = torch.empty(1, 3, 1, 1)
# After expansion (ready for operation):
x = torch.empty(5, 3, 4, 1)
y = torch.empty(5, 3, 4, 1)
Key rule: If two dimensions are unequal and neither is 1, broadcasting is not possible.
Chapter 2: Derivatives, Differentials, and Gradients
2.1 Derivatives of Composite Functions
Chain Rule: When differentiating composite functions, work from the outside in, taking derivatives layer by layer. Multiply the derivative of the outermost function by the derivative of the inner function.
Product Rule (Leibniz Formula): For products of two functions like $x^{\ln x}$:
$$(u \cdot v)' = u' \cdot v + u \cdot v'$$This is the cross-derivative rule: differentiate each function while keeping the other constant, then sum.
Example: $(x \cdot \ln x)' = 1 \cdot \ln x + x \cdot \frac{1}{x} = \ln x + 1$
Quotient Rule: For quotients:
$$\left(\frac{v}{u}\right)' = \frac{v' \cdot u - v \cdot u'}{u^2}$$Example 1 - Exponential with quotient: Find the derivative of $y = 2^{x/\ln x}$
Solution:
Apply chain rule: $y' = 2^{x/\ln x} \cdot \ln 2 \cdot \left(\frac{x}{\ln x}\right)'$
Apply quotient rule: $\left(\frac{x}{\ln x}\right)' = \frac{\ln x - 1}{(\ln x)^2}$
Final result: $y' = 2^{x/\ln x} \cdot \ln 2 \cdot \frac{\ln x - 1}{(\ln x)^2}$
Example 2 - Logarithmic differentiation: Find the derivative of $y = (1+2x)^x$
Solution:
Use logarithmic identity: $y = e^{\ln((1+2x)^x)} = e^{x \cdot \ln(1+2x)}$
Apply chain and product rules: $y' = e^{x \cdot \ln(1+2x)} \cdot [1 \cdot \ln(1+2x) + x \cdot (\ln(1+2x))']$
Find inner derivative: $(\ln(1+2x))' = \frac{2}{1+2x}$
Final result: $y' = (1+2x)^x \cdot \left[\ln(1+2x) + \frac{2x}{1+2x}\right]$
Key takeaway: Work layer by layer - identify the outermost function, apply the appropriate rule (chain, product, or quotient), continue differentiating inner functions, and combine all derivatives through multiplication.
