RError.com

RError.com Logo RError.com Logo

RError.com Navigation

  • 主页

Mobile menu

Close
  • 主页
  • 系统&网络
    • 热门问题
    • 最新问题
    • 标签
  • Ubuntu
    • 热门问题
    • 最新问题
    • 标签
  • 帮助
主页 / 问题 / 1602779
Accepted
Mixadyt
Mixadyt
Asked:2024-12-19 01:18:18 +0000 UTC2024-12-19 01:18:18 +0000 UTC 2024-12-19 01:18:18 +0000 UTC

如何识别该图像中的物体?

  • 772

我有图像, 例如。但所有图像的大小都调整为 120x80。我需要识别图像中的内容:数字(1 到 9)或字母(完整英文字母表)。但我的模型没有学习。它只是在错误〜3.6处停止(CrossEntropyLoss,35个类)。

然后我尝试查看每层之后输出的图像,并且在块 3 之后(参见下面的模型)它们绝对相同(有孤立的例外),仅保留白色背景。我的对象(数字/字母)没有进入下一层。我尝试增加 Conv2d 内核的大小,减少过滤器的数量,但它不起作用。

更改:我使用 pytorch。在训练期间,Adam 优化器使用 lr = 0.001,batch_size 尝试了 32、64 - 两者都不起作用。数据集分为 20% - 验证集,80% - 训练集。我尝试训练 100 和 500 epoch,结果是相同的:(培训计划蓝色 - 训练集,黄色 - 验证集)。

训练代码:

from torch.utils.data import DataLoader, random_split
from torch.nn import CrossEntropyLoss
from torch.optim import Adam
from torchvision import transforms
import matplotlib.pyplot as plt

from dataset.dataset import CellsDataset
from model import RecognitionModel

batch_size = 32
epochs = 100
lr = 0.001

transform = transforms.Compose([
    transforms.Resize((80, 120)),  # Изменение размера изображений
    # transforms.RandomHorizontalFlip(),  # Случайное горизонтальное отражение
    # transforms.RandomRotation(20),  # Случайное вращение на 20 градусов
    # transforms.RandomAffine(degrees=15, translate=(0.1, 0.1)),  # Случайная аффинная трансформация
    transforms.Grayscale(),
    transforms.ToTensor(),  # Преобразование в тензор
])

dataset = CellsDataset(transform)

train_dataset, valid_dataset = random_split(dataset, [0.8, 0.2])
train_dataloader = DataLoader(train_dataset, batch_size = batch_size, shuffle = True)
valid_dataloader = DataLoader(valid_dataset, batch_size = batch_size, shuffle = True)

model = RecognitionModel()

loss_func = CrossEntropyLoss()
optimizer = Adam(model.parameters(), lr = lr)

train_losses = []
valid_losses = []

for i in range(epochs):
    # Train
    optimizer.zero_grad()
    image, label = next(iter(train_dataloader))

    pred = model(image)
    loss = loss_func(pred, label)
    train_losses.append(loss.item())

    # Validation
    image, label = next(iter(valid_dataloader))

    pred = model(image)
    loss_ = loss_func(pred, label)
    valid_losses.append(loss_.item())

    # Backward

    loss.backward()
    optimizer.step()

    print(f"Epoch {i+1}/{epochs} Loss {loss.item()} Validation loss {loss_.item()}")

型号代码:

import torch
from torch.nn import Module, Conv2d, MaxPool2d, ReLU, AdaptiveMaxPool2d, Linear, LeakyReLU, Softmax

class CNNBlock(Module):
    def __init__(self, in_channels, out_channels):
        super().__init__()
        self.conv = Conv2d(
            in_channels = in_channels,
            out_channels = out_channels,
            kernel_size = 5,
            padding = "same"
        )
        self.act = ReLU()
        self.maxpool = MaxPool2d(
            kernel_size = 2,
            stride = 2
        )

    def forward(self, x):
        return self.maxpool(self.act(self.conv(x)))

class RecognitionModel(Module):
    def __init__(self):
        super().__init__()
        
        self.block1 = CNNBlock(1, 32)
        self.block2 = CNNBlock(32, 64)
        self.block3 = CNNBlock(64, 128)
        self.block4 = CNNBlock(128, 256)
        self.conv1 = Conv2d(
            in_channels = 256,
            out_channels = 512,
            kernel_size = 3
        )
        self.act1 = ReLU()
        self.conv2 = Conv2d(
            in_channels = 512,
            out_channels = 1024,
            kernel_size = 3
        )
        self.act2 = ReLU()
        self.globalmaxpool = AdaptiveMaxPool2d(output_size = 1)

        self.sqz = lambda x: x.squeeze()
        self.linear1 = Linear(
            in_features = 1024,
            out_features = 512
        )
        self.act3 = LeakyReLU()
        self.linear2 = Linear(
            in_features = 512,
            out_features = 256
        )
        self.act4 = LeakyReLU()
        self.linear3 = Linear(
            in_features = 256,
            out_features = 128
        )
        self.act5 = LeakyReLU()
        self.linear4 = Linear(
            in_features = 128,
            out_features = 64
        )
        self.act6 = LeakyReLU()
        self.linear5 = Linear(
            in_features = 64,
            out_features = 35
        )
        self.act7 = Softmax()

    def forward(self, x):
        x = self.block1(x)
        x = self.block2(x)
        x = self.block3(x)
        x = self.block4(x)
        x = self.act1(self.conv1(x))
        x = self.act2(self.conv2(x))
        x = self.globalmaxpool(x)
        x = self.sqz(x)
        x = self.act3(self.linear1(x))
        x = self.act4(self.linear2(x))
        x = self.act5(self.linear3(x))
        x = self.act6(self.linear4(x))
        y = self.act7(self.linear5(x))

        return y

组装数据集的代码:

import json
import torch
from torch.utils.data import Dataset
from PIL import Image

class CellsDataset(Dataset):
    def __init__(self, transform):
        self.classes = "123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ"
        self.transform = transform

        with open("dataset/labels.json", 'r') as labels:
            self.labels = json.load(labels)

    def __len__(self):
        return len(self.labels)

    def __getitem__(self, index: int):
        image_path = "dataset/images/" + self.labels[index]["image"]
        
        label = self.labels[index]["choice"]
        label_index = self.classes.index(label.upper())
        label_hot_encoding = torch.zeros(len(self.classes))
        label_hot_encoding[label_index] = 1

        image = Image.open(image_path)
        return self.transform(image), label_hot_encoding

我还可以添加第 1 层的随机内核: 1层芯

нейронные-сети
  • 1 1 个回答
  • 37 Views

1 个回答

  • Voted
  1. Best Answer
    Mixadyt
    2024-12-22T23:32:36Z2024-12-22T23:32:36Z

    问题是模型对于这样的任务来说太大了。我去掉了最后几层,它或多或少开始工作了。现在模型看起来像这样:

    import torch
    from torch.nn import Module, Conv2d, MaxPool2d, ReLU, AdaptiveMaxPool2d, Linear, LeakyReLU, Softmax
    
    class CNNBlock(Module):
        def __init__(self, in_channels, out_channels):
            super().__init__()
            self.conv = Conv2d(
                in_channels = in_channels,
                out_channels = out_channels,
                kernel_size = 5,
                padding = "same"
            )
            self.act = ReLU()
            self.maxpool = MaxPool2d(
                kernel_size = 2,
                stride = 2
            )
    
        def forward(self, x):
            return self.maxpool(self.act(self.conv(x)))
    
    class RecognitionModel(Module):
        def __init__(self):
            super().__init__()
            
            self.block1 = CNNBlock(1, 32)
            self.block2 = CNNBlock(32, 64)
            self.block3 = CNNBlock(64, 128)
            self.block4 = CNNBlock(128, 256)
            self.conv1 = Conv2d(
                in_channels = 256,
                out_channels = 512,
                kernel_size = 3
            )
            self.act1 = ReLU()
            self.conv2 = Conv2d(
                in_channels = 512,
                out_channels = 1024,
                kernel_size = 3
            )
            self.act2 = ReLU()
            self.globalmaxpool = AdaptiveMaxPool2d(output_size = 1)
    
            self.sqz = lambda x: x.squeeze()
            self.linear1 = Linear(
                in_features = 1024,
                out_features = 35
            )
            self.act3 = Softmax()
    
        def forward(self, x):
            x = self.block1(x)
            x = self.block2(x)
            x = self.block3(x)
            x = self.block4(x)
            x = self.act1(self.conv1(x))
            x = self.act2(self.conv2(x))
            x = self.globalmaxpool(x)
            x = self.sqz(x)
            x = self.act3(self.linear1(x))
    
            return y
    
    • 0

相关问题

  • 在 MATLAB 中使用神经网络进行滑动窗口时间序列预测

  • 错误反向传播算法——神经网络的输出出现错误

  • 我们需要现成的神经网络来识别图片或视频序列中的对象,以及它们的描述

  • 帮助改写tensorflow 2.0的代码;图像人工智能;Python

  • 使用卷积网络的特征图

  • 将绝对值转移到归一化值的范围内,反之亦然

Sidebar

Stats

  • 问题 10021
  • Answers 30001
  • 最佳答案 8000
  • 用户 6900
  • 常问
  • 回答
  • Marko Smith

    我看不懂措辞

    • 1 个回答
  • Marko Smith

    请求的模块“del”不提供名为“default”的导出

    • 3 个回答
  • Marko Smith

    "!+tab" 在 HTML 的 vs 代码中不起作用

    • 5 个回答
  • Marko Smith

    我正在尝试解决“猜词”的问题。Python

    • 2 个回答
  • Marko Smith

    可以使用哪些命令将当前指针移动到指定的提交而不更改工作目录中的文件?

    • 1 个回答
  • Marko Smith

    Python解析野莓

    • 1 个回答
  • Marko Smith

    问题:“警告:检查最新版本的 pip 时出错。”

    • 2 个回答
  • Marko Smith

    帮助编写一个用值填充变量的循环。解决这个问题

    • 2 个回答
  • Marko Smith

    尽管依赖数组为空,但在渲染上调用了 2 次 useEffect

    • 2 个回答
  • Marko Smith

    数据不通过 Telegram.WebApp.sendData 发送

    • 1 个回答
  • Martin Hope
    Alexandr_TT 2020年新年大赛! 2020-12-20 18:20:21 +0000 UTC
  • Martin Hope
    Alexandr_TT 圣诞树动画 2020-12-23 00:38:08 +0000 UTC
  • Martin Hope
    Air 究竟是什么标识了网站访问者? 2020-11-03 15:49:20 +0000 UTC
  • Martin Hope
    Qwertiy 号码显示 9223372036854775807 2020-07-11 18:16:49 +0000 UTC
  • Martin Hope
    user216109 如何为黑客设下陷阱,或充分击退攻击? 2020-05-10 02:22:52 +0000 UTC
  • Martin Hope
    Qwertiy 并变成3个无穷大 2020-11-06 07:15:57 +0000 UTC
  • Martin Hope
    koks_rs 什么是样板代码? 2020-10-27 15:43:19 +0000 UTC
  • Martin Hope
    Sirop4ik 向 git 提交发布的正确方法是什么? 2020-10-05 00:02:00 +0000 UTC
  • Martin Hope
    faoxis 为什么在这么多示例中函数都称为 foo? 2020-08-15 04:42:49 +0000 UTC
  • Martin Hope
    Pavel Mayorov 如何从事件或回调函数中返回值?或者至少等他们完成。 2020-08-11 16:49:28 +0000 UTC

热门标签

javascript python java php c# c++ html android jquery mysql

Explore

  • 主页
  • 问题
    • 热门问题
    • 最新问题
  • 标签
  • 帮助

Footer

RError.com

关于我们

  • 关于我们
  • 联系我们

Legal Stuff

  • Privacy Policy

帮助

© 2023 RError.com All Rights Reserve   沪ICP备12040472号-5