《动手学深度学习》第三章-softmax回归总结

softmax回归总结

《动手学深度学习》第三章-softmax回归总结

1.从fashion_mnist导入数据(d2l.load_data_fashion_mnist(batch_size))

def load_data_fashion_mnist(batch_size, resize=None):
    """Download the Fashion-MNIST dataset and then load it into memory.

    Defined in :numref:`sec_fashion_mnist`"""
    trans = [transforms.ToTensor()]
    if resize:
        trans.insert(0, transforms.Resize(resize))
    trans = transforms.Compose(trans)
    mnist_train = torchvision.datasets.FashionMNIST(
        root="../data", train=True, transform=trans, download=True)
    mnist_test = torchvision.datasets.FashionMNIST(
        root="../data", train=False, transform=trans, download=True)
    return (data.DataLoader(mnist_train, batch_size, shuffle=True,
                            num_workers=get_dataloader_workers()),
            data.DataLoader(mnist_test, batch_size, shuffle=False,
                            num_workers=get_dataloader_workers()))

torchvision : This library is part of the PyTorch project. PyTorch is an open source machine learning framework.

DataLoader : Combines a dataset and a sampler, and provides an iterable over the given dataset.The DataLoader supports both map-style and iterable-style datasets with single- or multi-process loading, customizing loading order and optional automatic batching (collation) and memory pinning.

返回值:mnist_train和mnist_test的迭代器

在这里插入图片描述

2.网络的定义

def net(X):
    return softmax(torch.matmul( X.reshape(-1,W.shape[0])) ,W) + b)

torch.matmul:matmul是tensor的乘法，输入可以是高维的。

(1)若输入都是二维，即普通的矩阵乘法。

(2)若输入是多维的，把多出的一维作为batch提出来，其余部分做矩阵乘法。

(5,3,4) @(4,2) -> (5,3,2)

(2,5,3) @ (1,3,4)->(2,5,4)

(2,1,3,4)@(5,4,2)->(2,5,3,2)

在进行tensor的乘法时，需要对输入的X进行reshape，size=(-1,28*28)

X.reshape(-1,W.shape[0])的size为[256,28*28]

W的size为[28*28,10]

b会经过广播机制为[256,10]

3.计算正确分类的样本数

#计算模型精度
def accuracy(y_hat,y):
    if len(y_hat.shape) > 1 and y_hat.shape[1] > 1:
        y_hat = torch.argmax(y_hat,axis=1)
    cmp = y_hat.type(y.dtype) == y
    return float( cmp.type(y.dtype).sum() )

(1)y_hat = torch.argmax(y_hat,axis=1)，y_hat的shape为[256,10]，按行选择最大概率的下标，并且将下标保存到y_hat中。

(2)==比较运算符对类型敏感，先对y_hat类型进行转换。

4.累加器实现

在这里插入图片描述

(1)add(self,*args)该函数在定义时，实现的可变参数args。

例：Accumulator(2)，累加器中有两个元素，第一个是正确数，第二个是总数。传参的时候也有两个参数(正确数，总数)

假设Accumulator中原有元素为[2,10]，传参为[3,6]

zip(self.data,args) = [(2,3),(10,6)]

在第一轮for循环时，取出的a=2,b=3，即相加后正确数为5。

在第二轮for循环时，取出的a=10,b=6,即相加后总数为16。

5.训练过程设计思路

def train_epoch(net,train_iter,loss,updater):
    #将模型设置为训练模式
    if isinstance(net,torch.nn.Module):
        net.train()
    #定义三个累加器存放 损失总和，正确样本数，总样本数
    metric = Accumulator(3)
    for X,y in train_iter:
        y_hat = net(X)
        l = loss(y_hat,y)
        #更新参数
        if isinstance(updater,torch.optim.Optimizer):
            #若优化器使用python内置的
            updater.zero_grad()
            l.mean().backward()
            updater.step()
        else:
            l.sum().backward()
            updater(X.shape[0])
        metric.add(float(l.sum()), accuracy(y_hat, y), y.numel())
    # 返回训练损失和训练精度
    return metric[0] / metric[2], metric[1] / metric[2]

def train(net, train_iter, test_iter, loss, num_epochs, updater):  #@save
    """训练模型"""
    animator = Animator(xlabel='epoch', xlim=[1, num_epochs], ylim=[0.3, 0.9],
                        legend=['train loss', 'train acc', 'test acc'])
    for epoch in range(num_epochs):
        train_metrics = train_epoch(net, train_iter, loss, updater)
        test_acc = evaluate_accuracy(net, test_iter)
        animator.add(epoch + 1, train_metrics + (test_acc,))
    train_loss, train_acc = train_metrics
    assert train_loss < 0.5, train_loss
    assert train_acc <= 1 and train_acc > 0.7, train_acc
    assert test_acc <= 1 and test_acc > 0.7, test_acc

(1)train_epoch:在一个epoch中，我们将完整遍历一次数据集（train_data），不停地从中获取一个小批量的输入和相应的标签。对于每一个小批量，我们会进行以下步骤:

通过调用net(X)生成预测并计算损失l（前向传播）。
通过进行反向传播来计算梯度。
通过调用优化器来更新模型参数。

为了更好的衡量训练效果，我们计算每个迭代周期后的损失，并打印它来监控训练过程。

作业:

在本节中，我们直接实现了基于数学定义softmax运算的softmax函数。这可能会导致什么问题？
本节中的函数cross_entropy是根据交叉熵损失函数的定义实现的。它可能有什么问题？提示：考虑对数的定义域。
你可以想到什么解决方案来解决上述两个问题？
返回概率最大的分类标签总是最优解吗？例如，医疗诊断场景下你会这样做吗？
假设我们使用softmax回归来预测下一个单词，可选取的单词数目过多可能会带来哪些问题?

(1)经过softmax函数转换时，分母是exp求和，若某个输入很大，会造成分母很大，进而引起数值范围错误。

(2)对数函数在X接近0时，其值趋于负无穷大。若某个输入经过softmax函数，输出的值可能是接近0，会造成loss值无限大，产生范围错误。

(3)softmax-concise课程的重新审视softmax函数设计。
$\begin{aligned} \log{(\hat y_j)} & = \log\left( \frac{\exp(o_j - \max(o_k))}{\sum_k \exp(o_k - \max(o_k))}\right) \\ & = \log{(\exp(o_j - \max(o_k)))}-\log{\left( \sum_k \exp(o_k - \max(o_k)) \right)} \\ & = o_j - \max(o_k) -\log{\left( \sum_k \exp(o_k - \max(o_k)) \right)}. \end{aligned}$
(4)在医疗场景中，比如通过照片判断疾病类别，概率最大的分类标签不一定是最优解，需要具体问题，具体分析。

(5)当可选单词过于多，而神经网络参数学习效果不佳时，可能每个分类的概率都很小，成为随机选择分类。