论文介绍
题目:Frequency-Adaptive Dilated Convolution for Semantic Segmentation
论文地址:https://arxiv.org/abs/2403.05369
QQ深度学习交流群:719278780
扫描下方二维码,加入深度学习论文指南星球!
加入即可获得,模块缝合、制作、写作技巧,学会“结构”创新、“创新点”创新,从三区到顶会,小论文大论文,毕业一站式服务
创新点
频率自适应空洞卷积(Frequency-Adaptive Dilated Convolution, FADC):
论文通过频谱分析,提出了一种新的空洞卷积方法(FADC),可以动态调整空洞率以适应局部的频率成分,从而解决传统固定空洞率方法在频率响应方面的局限性。三种关键策略:
自适应空洞率(Adaptive Dilation Rate, AdaDR): 根据输入的频率分布,空间上动态地分配空洞率,在高频区域使用较小的空洞率以保留高效带宽,在低频区域使用较大的空洞率以扩展感受野。
自适应卷积核(Adaptive Kernel, AdaKern): 将卷积权重分解为低频和高频分量,并动态调整这两种分量的比例,从而增强有效带宽。
频率选择模块(Frequency Selection, FreqSelect): 通过空间变化的重新加权,平衡特征表示中的高频和低频分量,鼓励学习更大的空洞率,从而进一步扩大感受野。
频率域视角下的分析:
论文从频率角度重新审视空洞卷积,分析了空洞率选择在有效带宽和感受野之间的权衡,并提出了一种优化策略以最大化两者的平衡。多任务验证:
在语义分割任务中,与多个主流方法(如PSPNet、DeepLabV3+等)相比,FADC在准确率和计算效率方面都表现出了明显的改进。
在目标检测任务中,FADC通过与现有的可变形卷积(DCNv2)和注意力机制(如DiNAT)结合,也展示了性能提升。
减少混叠伪影(Aliasing Artifacts):
动态调整空洞率并利用频率选择模块,有效减少了由于高频分量超过采样率而导致的混叠伪影(如栅格伪影),从而提高了分割和检测精度。轻量化设计:
新提出的模块如AdaKern和FreqSelect,参数量和计算开销相对较小,可以无缝集成到现有模型中,同时显著提升性能。
方法
整体架构
这篇论文提出了基于 CondConv(条件参数化卷积)的模型结构,它通过动态生成输入依赖的卷积核取代传统静态卷积核,核心是将卷积核参数化为多专家(Mixture of Experts)的线性组合,专家权重由输入通过路由函数生成。CondConv 可无缝集成到现有 CNN 架构中,作为普通卷积层的替代,显著提升了多种模型(如 MobileNet、ResNet 和 EfficientNet)的性能,同时保持较低的计算开销,非常适合对实时性和计算资源有限的应用场景。
1. 自适应空洞率模块(Adaptive Dilation Rate, AdaDR)
核心功能:
动态调整空洞率,使其基于输入的频率分布进行空间变化。细节:
根据局部区域的高频或低频分量,计算适合该区域的空洞率。
在高频区域(如边界细节)应用较小的空洞率,以提高高频信息的捕获能力。
在低频区域(如物体内部或背景)应用较大的空洞率,以扩大感受野。
优化目标:
最大化感受野的同时最小化高频信息的丢失。
2. 自适应卷积核模块(Adaptive Kernel, AdaKern)
核心功能:
调整卷积核权重中高频和低频分量的比例,动态优化频率响应。细节:
将卷积核权重分解为低频部分(平均值表示)和高频部分(残差表示)。
动态分配低频和高频权重的比例(由轻量化网络预测),以适应输入的特定频率特性。
提高对高频信息的敏感性,从而提升有效带宽。
增强效果:
AdaKern有效扩展了卷积的频率响应范围,提升了对不同频率特征的捕获能力。
3. 频率选择模块(Frequency Selection, FreqSelect)
核心功能:
平衡特征表示中的高频和低频分量,鼓励空洞卷积学习更大的感受野。细节:
通过傅里叶变换将输入特征分解为多个频率带。
动态生成空间变化的选择权重,对每个频率带进行加权。
压制背景或物体中心的高频分量,突出边界区域的高频特征。
效果:
通过对特征频率分量的选择性抑制和放大,FADC能够进一步增大感受野并提高分割性能。
4. 模块集成的整体框架
模型整体结构的设计以FADC为核心,可以直接替换标准卷积层或空洞卷积层。
在网络中:
AdaDR负责动态分配空洞率;
AdaKern优化卷积核的频率响应;
FreqSelect调整输入特征的频率分量。
这些模块能够无缝集成到现有的深度学习框架(如DeepLabV3+、PSPNet、PIDNet等)中。
即插即用模块作用
FADConv 作为一个即插即用模块:
增强边界细节捕获能力:在高频区域(如边界)分配更多带宽,提升对小目标和复杂边界的细节建模。
动态调整感受野:根据不同区域的频率分布,扩大背景和低频区域的感受野,保留高频区域的特征带宽。
减少混叠伪影:通过动态调整空洞率,缓解因频率超出采样带宽而导致的伪影(如栅格伪影)。
优化高低频信息的平衡:动态压制背景中的高频噪声,突出目标区域的重要特征。
提高实时性与效率:在保持高分辨率的同时,通过轻量化设计减少计算开销,提升处理速度。
增强多尺度特征建模能力:同时适应大物体和小物体的特征需求,提升模型对多尺度目标的鲁棒性。
消融实验结果
展示了 FADC 在 Cityscapes 数据集上的性能,与传统方法(如标准空洞卷积、DCNv2、ADC 等)相比,FADC 在多种主干网络(Dilated-ResNet-50 和 Dilated-ResNet-101)上显著提升了 mIoU(平均交并比)。例如,在 PSPNet 和 DeepLabV3+ 上,FADC 仅增加少量参数和计算开销(+0.5M 参数,+9.2G FLOPs),分别带来了 2.6 和 1.1 的 mIoU 提升。这表明 FADC 在保持高效率的同时,大幅度提高了分割精度。
展示了 FADC 在更具挑战性的 ADE20K 数据集上的表现,与多个主流模型(如 ResNet、ConvNeXt、Swin Transformer 等)相比,FADC 在提升准确率方面表现出色。例如,在 ResNet-50 上,FADC 提升了 mIoU 至 44.4(相比原始 ResNet-50 提高了 3.7),且性能超过了参数更多的 ResNet-101。同时,在 HorNet-B 等更大型模型中,FADC 也带来了 0.6 mIoU 的增益。该表验证了 FADC 在不同规模网络和数据集上的广泛适用性。
评估了 FADC 在实时语义分割任务中的表现。通过将 FADC 集成到轻量化的 PIDNet-M 模型中,FADC 实现了 81.0 mIoU,同时保持了 37.7 FPS 的较高帧率。这一结果超过了更重的 PIDNet-L 模型(80.9 mIoU,31.1 FPS),表明 FADC 在实时性和精度上的优异平衡。这使其特别适合于实时应用场景,如自动驾驶和机器人视觉任务。
即插即用模块
# 论文:Frequency-Adaptive Dilated Convolution for Semantic Segmentation[CVPR 2024]
# 论文地址:https://arxiv.org/abs/2403.05369
import torch
import torch.nn as nn
import torch.fft
class OmniAttention(nn.Module):
def __init__(self, in_planes, out_planes, kernel_size, groups=1, reduction=0.0625, kernel_num=4, min_channel=16):
super(OmniAttention, self).__init__()
attention_channel = max(int(in_planes * reduction), min_channel)
self.kernel_size = kernel_size
self.kernel_num = kernel_num
self.temperature = 1.0
self.avgpool = nn.AdaptiveAvgPool2d(1)
self.fc = nn.Conv2d(in_planes, attention_channel, 1, bias=False)
self.bn = nn.BatchNorm2d(attention_channel)
self.relu = nn.ReLU(inplace=True)
self.channel_fc = nn.Conv2d(attention_channel, in_planes, 1, bias=True)
self.func_channel = self.get_channel_attention
if in_planes == groups and in_planes == out_planes: # depth-wise convolution
self.func_filter = self.skip
else:
self.filter_fc = nn.Conv2d(attention_channel, out_planes, 1, bias=True)
self.func_filter = self.get_filter_attention
if kernel_size == 1: # point-wise convolution
self.func_spatial = self.skip
else:
self.spatial_fc = nn.Conv2d(attention_channel, kernel_size * kernel_size, 1, bias=True)
self.func_spatial = self.get_spatial_attention
if kernel_num == 1:
self.func_kernel = self.skip
else:
self.kernel_fc = nn.Conv2d(attention_channel, kernel_num, 1, bias=True)
self.func_kernel = self.get_kernel_attention
self._initialize_weights()
def _initialize_weights(self):
for m in self.modules():
if isinstance(m, nn.Conv2d):
nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
if m.bias is not None:
nn.init.constant_(m.bias, 0)
if isinstance(m, nn.BatchNorm2d):
nn.init.constant_(m.weight, 1)
nn.init.constant_(m.bias, 0)
def update_temperature(self, temperature):
self.temperature = temperature
@staticmethod
def skip(_):
return 1.0
def get_channel_attention(self, x):
channel_attention = torch.sigmoid(self.channel_fc(x).view(x.size(0), -1, 1, 1) / self.temperature)
return channel_attention
def get_filter_attention(self, x):
filter_attention = torch.sigmoid(self.filter_fc(x).view(x.size(0), -1, 1, 1) / self.temperature)
return filter_attention
def get_spatial_attention(self, x):
spatial_attention = self.spatial_fc(x).view(x.size(0), 1, 1, 1, self.kernel_size, self.kernel_size)
spatial_attention = torch.sigmoid(spatial_attention / self.temperature)
return spatial_attention
def get_kernel_attention(self, x):
kernel_attention = self.kernel_fc(x).view(x.size(0), -1, 1, 1, 1, 1)
kernel_attention = F.softmax(kernel_attention / self.temperature, dim=1)
return kernel_attention
def forward(self, x):
x = self.avgpool(x)
x = self.fc(x)
x = self.bn(x)
x = self.relu(x)
return self.func_channel(x), self.func_filter(x), self.func_spatial(x), self.func_kernel(x)
import torch.nn.functional as F
def generate_laplacian_pyramid(input_tensor, num_levels, size_align=True, mode='bilinear'):
""""
a alternative way for feature frequency decompose
"""
pyramid = []
current_tensor = input_tensor
_, _, H, W = current_tensor.shape
for _ in range(num_levels):
b, _, h, w = current_tensor.shape
downsampled_tensor = F.interpolate(current_tensor, (h // 2 + h % 2, w // 2 + w % 2), mode=mode,
align_corners=(H % 2) == 1) # antialias=True
if size_align:
# upsampled_tensor = F.interpolate(downsampled_tensor, (h, w), mode='bilinear', align_corners=(H%2) == 1)
# laplacian = current_tensor - upsampled_tensor
# laplacian = F.interpolate(laplacian, (H, W), mode='bilinear', align_corners=(H%2) == 1)
upsampled_tensor = F.interpolate(downsampled_tensor, (H, W), mode=mode, align_corners=(H % 2) == 1)
laplacian = F.interpolate(current_tensor, (H, W), mode=mode, align_corners=(H % 2) == 1) - upsampled_tensor
# print(laplacian.shape)
else:
upsampled_tensor = F.interpolate(downsampled_tensor, (h, w), mode=mode, align_corners=(H % 2) == 1)
laplacian = current_tensor - upsampled_tensor
pyramid.append(laplacian)
current_tensor = downsampled_tensor
if size_align: current_tensor = F.interpolate(current_tensor, (H, W), mode=mode, align_corners=(H % 2) == 1)
pyramid.append(current_tensor)
return pyramid
class FrequencySelection(nn.Module):
def __init__(self,
in_channels,
k_list=[2],
# freq_list=[2, 3, 5, 7, 9, 11],
lowfreq_att=True,
fs_feat='feat',
lp_type='freq',
act='sigmoid',
spatial='conv',
spatial_group=1,
spatial_kernel=3,
init='zero',
global_selection=False,
):
super().__init__()
# k_list.sort()
# print()
self.k_list = k_list
# self.freq_list = freq_list
self.lp_list = nn.ModuleList()
self.freq_weight_conv_list = nn.ModuleList()
self.fs_feat = fs_feat
self.lp_type = lp_type
self.in_channels = in_channels
# self.residual = residual
if spatial_group > 64: spatial_group = in_channels
self.spatial_group = spatial_group
self.lowfreq_att = lowfreq_att
if spatial == 'conv':
self.freq_weight_conv_list = nn.ModuleList()
_n = len(k_list)
if lowfreq_att: _n += 1
for i in range(_n):
freq_weight_conv = nn.Conv2d(in_channels=in_channels,
out_channels=self.spatial_group,
stride=1,
kernel_size=spatial_kernel,
groups=self.spatial_group,
padding=spatial_kernel // 2,
bias=True)
if init == 'zero':
freq_weight_conv.weight.data.zero_()
freq_weight_conv.bias.data.zero_()
else:
# raise NotImplementedError
pass
self.freq_weight_conv_list.append(freq_weight_conv)
else:
raise NotImplementedError
if self.lp_type == 'avgpool':
for k in k_list:
self.lp_list.append(nn.Sequential(
nn.ReplicationPad2d(padding=k // 2),
# nn.ZeroPad2d(padding= k // 2),
nn.AvgPool2d(kernel_size=k, padding=0, stride=1)
))
elif self.lp_type == 'laplacian':
pass
elif self.lp_type == 'freq':
pass
else:
raise NotImplementedError
self.act = act
# self.freq_weight_conv_list.append(nn.Conv2d(self.deform_groups * 3 * self.kernel_size[0] * self.kernel_size[1], 1, kernel_size=1, padding=0, bias=True))
self.global_selection = global_selection
if self.global_selection:
self.global_selection_conv_real = nn.Conv2d(in_channels=in_channels,
out_channels=self.spatial_group,
stride=1,
kernel_size=1,
groups=self.spatial_group,
padding=0,
bias=True)
self.global_selection_conv_imag = nn.Conv2d(in_channels=in_channels,
out_channels=self.spatial_group,
stride=1,
kernel_size=1,
groups=self.spatial_group,
padding=0,
bias=True)
if init == 'zero':
self.global_selection_conv_real.weight.data.zero_()
self.global_selection_conv_real.bias.data.zero_()
self.global_selection_conv_imag.weight.data.zero_()
self.global_selection_conv_imag.bias.data.zero_()
def sp_act(self, freq_weight):
if self.act == 'sigmoid':
freq_weight = freq_weight.sigmoid() * 2
elif self.act == 'softmax':
freq_weight = freq_weight.softmax(dim=1) * freq_weight.shape[1]
else:
raise NotImplementedError
return freq_weight
def forward(self, x, att_feat=None):
"""
att_feat:feat for gen att
"""
# freq_weight = self.freq_weight_conv(x)
# self.sp_act(freq_weight)
# if self.residual: x_residual = x.clone()
if att_feat is None: att_feat = x
x_list = []
if self.lp_type == 'avgpool':
# for avg, freq_weight in zip(self.avg_list, self.freq_weight_conv_list):
pre_x = x
b, _, h, w = x.shape
for idx, avg in enumerate(self.lp_list):
low_part = avg(x)
high_part = pre_x - low_part
pre_x = low_part
# x_list.append(freq_weight[:, idx:idx+1] * high_part)
freq_weight = self.freq_weight_conv_list[idx](att_feat)
freq_weight = self.sp_act(freq_weight)
# tmp = freq_weight[:, :, idx:idx+1] * high_part.reshape(b, self.spatial_group, -1, h, w)
tmp = freq_weight.reshape(b, self.spatial_group, -1, h, w) * high_part.reshape(b, self.spatial_group,
-1, h, w)
x_list.append(tmp.reshape(b, -1, h, w))
if self.lowfreq_att:
freq_weight = self.freq_weight_conv_list[len(x_list)](att_feat)
# tmp = freq_weight[:, :, len(x_list):len(x_list)+1] * pre_x.reshape(b, self.spatial_group, -1, h, w)
tmp = freq_weight.reshape(b, self.spatial_group, -1, h, w) * pre_x.reshape(b, self.spatial_group, -1, h,
w)
x_list.append(tmp.reshape(b, -1, h, w))
else:
x_list.append(pre_x)
elif self.lp_type == 'laplacian':
# for avg, freq_weight in zip(self.avg_list, self.freq_weight_conv_list):
# pre_x = x
b, _, h, w = x.shape
pyramids = generate_laplacian_pyramid(x, len(self.k_list), size_align=True)
# print('pyramids', len(pyramids))
for idx, avg in enumerate(self.k_list):
# print(idx)
high_part = pyramids[idx]
freq_weight = self.freq_weight_conv_list[idx](att_feat)
freq_weight = self.sp_act(freq_weight)
# tmp = freq_weight[:, :, idx:idx+1] * high_part.reshape(b, self.spatial_group, -1, h, w)
tmp = freq_weight.reshape(b, self.spatial_group, -1, h, w) * high_part.reshape(b, self.spatial_group,
-1, h, w)
x_list.append(tmp.reshape(b, -1, h, w))
if self.lowfreq_att:
freq_weight = self.freq_weight_conv_list[len(x_list)](att_feat)
# tmp = freq_weight[:, :, len(x_list):len(x_list)+1] * pre_x.reshape(b, self.spatial_group, -1, h, w)
tmp = freq_weight.reshape(b, self.spatial_group, -1, h, w) * pyramids[-1].reshape(b, self.spatial_group,
-1, h, w)
x_list.append(tmp.reshape(b, -1, h, w))
else:
x_list.append(pyramids[-1])
elif self.lp_type == 'freq':
pre_x = x.clone()
b, _, h, w = x.shape
# b, _c, h, w = freq_weight.shape
# freq_weight = freq_weight.reshape(b, self.spatial_group, -1, h, w)
x_fft = torch.fft.fftshift(torch.fft.fft2(x, norm='ortho'))
if self.global_selection:
# global_att_real = self.global_selection_conv_real(x_fft.real)
# global_att_real = self.sp_act(global_att_real).reshape(b, self.spatial_group, -1, h, w)
# global_att_imag = self.global_selection_conv_imag(x_fft.imag)
# global_att_imag = self.sp_act(global_att_imag).reshape(b, self.spatial_group, -1, h, w)
# x_fft = x_fft.reshape(b, self.spatial_group, -1, h, w)
# x_fft.real *= global_att_real
# x_fft.imag *= global_att_imag
# x_fft = x_fft.reshape(b, -1, h, w)
# 将x_fft复数拆分成实部和虚部
x_real = x_fft.real
x_imag = x_fft.imag
# 计算实部的全局注意力
global_att_real = self.global_selection_conv_real(x_real)
global_att_real = self.sp_act(global_att_real).reshape(b, self.spatial_group, -1, h, w)
# 计算虚部的全局注意力
global_att_imag = self.global_selection_conv_imag(x_imag)
global_att_imag = self.sp_act(global_att_imag).reshape(b, self.spatial_group, -1, h, w)
# 重塑x_fft为形状为(b, self.spatial_group, -1, h, w)的张量
x_real = x_real.reshape(b, self.spatial_group, -1, h, w)
x_imag = x_imag.reshape(b, self.spatial_group, -1, h, w)
# 分别应用实部和虚部的全局注意力
x_fft_real_updated = x_real * global_att_real
x_fft_imag_updated = x_imag * global_att_imag
# 合并为复数
x_fft_updated = torch.complex(x_fft_real_updated, x_fft_imag_updated)
# 重塑x_fft为形状为(b, -1, h, w)的张量
x_fft = x_fft_updated.reshape(b, -1, h, w)
for idx, freq in enumerate(self.k_list):
mask = torch.zeros_like(x[:, 0:1, :, :], device=x.device)
mask[:, :, round(h / 2 - h / (2 * freq)):round(h / 2 + h / (2 * freq)),
round(w / 2 - w / (2 * freq)):round(w / 2 + w / (2 * freq))] = 1.0
low_part = torch.fft.ifft2(torch.fft.ifftshift(x_fft * mask), norm='ortho').real
high_part = pre_x - low_part
pre_x = low_part
freq_weight = self.freq_weight_conv_list[idx](att_feat)
freq_weight = self.sp_act(freq_weight)
# tmp = freq_weight[:, :, idx:idx+1] * high_part.reshape(b, self.spatial_group, -1, h, w)
tmp = freq_weight.reshape(b, self.spatial_group, -1, h, w) * high_part.reshape(b, self.spatial_group,
-1, h, w)
x_list.append(tmp.reshape(b, -1, h, w))
if self.lowfreq_att:
freq_weight = self.freq_weight_conv_list[len(x_list)](att_feat)
# tmp = freq_weight[:, :, len(x_list):len(x_list)+1] * pre_x.reshape(b, self.spatial_group, -1, h, w)
tmp = freq_weight.reshape(b, self.spatial_group, -1, h, w) * pre_x.reshape(b, self.spatial_group, -1, h,
w)
x_list.append(tmp.reshape(b, -1, h, w))
else:
x_list.append(pre_x)
x = sum(x_list)
return x
from mmcv.ops.deform_conv import DeformConv2dPack
from mmcv.ops.modulated_deform_conv import ModulatedDeformConv2d, modulated_deform_conv2d, ModulatedDeformConv2dPack, \
CONV_LAYERS
import torch_dct as dct
#pip install torch-dct
class AdaptiveDilatedConv(ModulatedDeformConv2d):
"""A ModulatedDeformable Conv Encapsulation that acts as normal Conv
layers.
Args:
in_channels (int): Same as nn.Conv2d.
out_channels (int): Same as nn.Conv2d.
kernel_size (int or tuple[int]): Same as nn.Conv2d.
stride (int): Same as nn.Conv2d, while tuple is not supported.
padding (int): Same as nn.Conv2d, while tuple is not supported.
dilation (int): Same as nn.Conv2d, while tuple is not supported.
groups (int): Same as nn.Conv2d.
bias (bool or str): If specified as `auto`, it will be decided by the
norm_cfg. Bias will be set as True if norm_cfg is None, otherwise
False.
"""
_version = 2
def __init__(self, *args,
offset_freq=None, # deprecated
padding_mode='repeat',
kernel_decompose='both',
conv_type='conv',
sp_att=False,
pre_fs=True, # False, use dilation
epsilon=1e-4,
use_zero_dilation=False,
use_dct=False,
fs_cfg={
'k_list': [2, 4, 8],
'fs_feat': 'feat',
'lowfreq_att': False,
'lp_type': 'freq',
# 'lp_type':'laplacian',
'act': 'sigmoid',
'spatial': 'conv',
'spatial_group': 1,
},
**kwargs):
super().__init__(*args, **kwargs)
if padding_mode == 'zero':
self.PAD = nn.ZeroPad2d(self.kernel_size[0] // 2)
elif padding_mode == 'repeat':
self.PAD = nn.ReplicationPad2d(self.kernel_size[0] // 2)
else:
self.PAD = nn.Identity()
self.kernel_decompose = kernel_decompose
self.use_dct = use_dct
if kernel_decompose == 'both':
self.OMNI_ATT1 = OmniAttention(in_planes=self.in_channels, out_planes=self.out_channels, kernel_size=1,
groups=1, reduction=0.0625, kernel_num=1, min_channel=16)
self.OMNI_ATT2 = OmniAttention(in_planes=self.in_channels, out_planes=self.out_channels,
kernel_size=self.kernel_size[0] if self.use_dct else 1, groups=1,
reduction=0.0625, kernel_num=1, min_channel=16)
elif kernel_decompose == 'high':
self.OMNI_ATT = OmniAttention(in_planes=self.in_channels, out_planes=self.out_channels, kernel_size=1,
groups=1, reduction=0.0625, kernel_num=1, min_channel=16)
elif kernel_decompose == 'low':
self.OMNI_ATT = OmniAttention(in_planes=self.in_channels, out_planes=self.out_channels, kernel_size=1,
groups=1, reduction=0.0625, kernel_num=1, min_channel=16)
self.conv_type = conv_type
if conv_type == 'conv':
self.conv_offset = nn.Conv2d(
self.in_channels,
self.deform_groups * 1,
kernel_size=self.kernel_size,
stride=self.stride,
padding=self.kernel_size[0] // 2 if isinstance(self.PAD, nn.Identity) else 0,
dilation=1,
bias=True)
self.conv_mask = nn.Conv2d(
self.in_channels,
self.deform_groups * 1 * self.kernel_size[0] * self.kernel_size[1],
kernel_size=self.kernel_size,
stride=self.stride,
padding=self.kernel_size[0] // 2 if isinstance(self.PAD, nn.Identity) else 0,
dilation=1,
bias=True)
if sp_att:
self.conv_mask_mean_level = nn.Conv2d(
self.in_channels,
self.deform_groups * 1,
kernel_size=self.kernel_size,
stride=self.stride,
padding=self.kernel_size[0] // 2 if isinstance(self.PAD, nn.Identity) else 0,
dilation=1,
bias=True)
self.offset_freq = offset_freq
# An offset is like [y0, x0, y1, x1, y2, x2, ⋯, y8, x8]
offset = [-1, -1, -1, 0, -1, 1,
0, -1, 0, 0, 0, 1,
1, -1, 1, 0, 1, 1]
offset = torch.Tensor(offset)
# offset[0::2] *= self.dilation[0]
# offset[1::2] *= self.dilation[1]
# a tuple of two ints – in which case, the first int is used for the height dimension, and the second int for the width dimension
self.register_buffer('dilated_offset', torch.Tensor(offset[None, None, ..., None, None])) # B, G, 18, 1, 1
if fs_cfg is not None:
if pre_fs:
self.FS = FrequencySelection(self.in_channels, **fs_cfg)
else:
self.FS = FrequencySelection(1, **fs_cfg) # use dilation
self.pre_fs = pre_fs
self.epsilon = epsilon
self.use_zero_dilation = use_zero_dilation
self.init_weights()
def freq_select(self, x):
if self.offset_freq is None:
res = x
elif self.offset_freq in ('FLC_high', 'SLP_high'):
res = x - self.LP(x)
elif self.offset_freq in ('FLC_res', 'SLP_res'):
res = 2 * x - self.LP(x)
else:
raise NotImplementedError
return res
def init_weights(self):
super().init_weights()
if hasattr(self, 'conv_offset'):
# if isinstanace(self.conv_offset, nn.Conv2d):
if self.conv_type == 'conv':
self.conv_offset.weight.data.zero_()
# self.conv_offset.bias.data.fill_((self.dilation[0] - 1) / self.dilation[0] + 1e-4)
self.conv_offset.bias.data.fill_((self.dilation[0] - 1) / self.dilation[0] + self.epsilon)
# self.conv_offset.bias.data.zero_()
# if hasattr(self, 'conv_offset'):
# self.conv_offset_low[1].weight.data.zero_()
# if hasattr(self, 'conv_offset_high'):
# self.conv_offset_high[1].weight.data.zero_()
# self.conv_offset_high[1].bias.data.zero_()
if hasattr(self, 'conv_mask'):
self.conv_mask.weight.data.zero_()
self.conv_mask.bias.data.zero_()
if hasattr(self, 'conv_mask_mean_level'):
self.conv_mask.weight.data.zero_()
self.conv_mask.bias.data.zero_()
# @force_fp32(apply_to=('x',))
# @force_fp32
def forward(self, x):
# offset = self.conv_offset(self.freq_select(x)) + self.conv_offset_low(self.freq_select(x))
if hasattr(self, 'FS') and self.pre_fs: x = self.FS(x)
if hasattr(self, 'OMNI_ATT1') and hasattr(self, 'OMNI_ATT2'):
c_att1, f_att1, _, _, = self.OMNI_ATT1(x)
c_att2, f_att2, spatial_att2, _, = self.OMNI_ATT2(x)
elif hasattr(self, 'OMNI_ATT'):
c_att, f_att, _, _, = self.OMNI_ATT(x)
if self.conv_type == 'conv':
offset = self.conv_offset(self.PAD(self.freq_select(x)))
elif self.conv_type == 'multifreqband':
offset = self.conv_offset(self.freq_select(x))
# high_gate = self.conv_offset_high(x)
# high_gate = torch.exp(-0.5 * high_gate ** 2)
# offset = F.relu(offset, inplace=True) * self.dilation[0] - 1 # ensure > 0
if self.use_zero_dilation:
offset = (F.relu(offset + 1, inplace=True) - 1) * self.dilation[0] # ensure > 0
else:
# offset = F.relu(offset, inplace=True) * self.dilation[0] # ensure > 0
offset = offset.abs() * self.dilation[0] # ensure > 0
# offset[offset<0] = offset[offset<0].exp() - 1
# print(offset.mean(), offset.std(), offset.max(), offset.min())
if hasattr(self, 'FS') and (self.pre_fs == False): x = self.FS(x, F.interpolate(offset, x.shape[-2:],
mode='bilinear', align_corners=(
x.shape[
-1] % 2) == 1))
# print(offset.max(), offset.abs().min(), offset.abs().mean())
# offset *= high_gate # ensure > 0
b, _, h, w = offset.shape
offset = offset.reshape(b, self.deform_groups, -1, h, w) * self.dilated_offset
# offset = offset.reshape(b, self.deform_groups, -1, h, w).repeat(1, 1, 9, 1, 1)
# offset[:, :, 0::2, ] *= self.dilated_offset[:, :, 0::2, ]
# offset[:, :, 1::2, ] *= self.dilated_offset[:, :, 1::2, ]
offset = offset.reshape(b, -1, h, w)
x = self.PAD(x)
mask = self.conv_mask(x)
mask = mask.sigmoid()
# print(mask.shape)
# mask = mask.reshape(b, self.deform_groups, -1, h, w).softmax(dim=2)
if hasattr(self, 'conv_mask_mean_level'):
mask_mean_level = torch.sigmoid(self.conv_mask_mean_level(x)).reshape(b, self.deform_groups, -1, h, w)
mask = mask * mask_mean_level
mask = mask.reshape(b, -1, h, w)
if hasattr(self, 'OMNI_ATT1') and hasattr(self, 'OMNI_ATT2'):
offset = offset.reshape(1, -1, h, w)
mask = mask.reshape(1, -1, h, w)
x = x.reshape(1, -1, x.size(-2), x.size(-1))
adaptive_weight = self.weight.unsqueeze(0).repeat(b, 1, 1, 1, 1) # b, c_out, c_in, k, k
adaptive_weight_mean = adaptive_weight.mean(dim=(-1, -2), keepdim=True)
adaptive_weight_res = adaptive_weight - adaptive_weight_mean
_, c_out, c_in, k, k = adaptive_weight.shape
if self.use_dct:
dct_coefficients = dct.dct_2d(adaptive_weight_res)
# print(adaptive_weight_res.shape, dct_coefficients.shape)
spatial_att2 = spatial_att2.reshape(b, 1, 1, k, k)
dct_coefficients = dct_coefficients * (spatial_att2 * 2)
# print(dct_coefficients.shape)
adaptive_weight_res = dct.idct_2d(dct_coefficients)
# adaptive_weight_res = adaptive_weight_res.reshape(b, c_out, c_in, k, k)
# print(adaptive_weight_res.shape, dct_coefficients.shape)
# adaptive_weight = adaptive_weight_mean * (2 * c_att.unsqueeze(1)) * (2 * f_att.unsqueeze(2)) + adaptive_weight - adaptive_weight_mean
# adaptive_weight = adaptive_weight_mean * (c_att1.unsqueeze(1) * 2) * (f_att1.unsqueeze(2) * 2) + (adaptive_weight - adaptive_weight_mean) * (c_att2.unsqueeze(1) * 2) * (f_att2.unsqueeze(2) * 2)
adaptive_weight = adaptive_weight_mean * (c_att1.unsqueeze(1) * 2) * (
f_att1.unsqueeze(2) * 2) + adaptive_weight_res * (c_att2.unsqueeze(1) * 2) * (
f_att2.unsqueeze(2) * 2)
adaptive_weight = adaptive_weight.reshape(-1, self.in_channels // self.groups, 3, 3)
if self.bias is not None:
bias = self.bias.repeat(b)
else:
bias = self.bias
# print(adaptive_weight.shape)
# print(bias.shape)
# print(x.shape)
x = modulated_deform_conv2d(x, offset, mask, adaptive_weight, bias,
self.stride,
(self.kernel_size[0] // 2, self.kernel_size[1] // 2) if isinstance(self.PAD,
nn.Identity) else (
0, 0), # padding
(1, 1), # dilation
self.groups * b, self.deform_groups * b)
elif hasattr(self, 'OMNI_ATT'):
offset = offset.reshape(1, -1, h, w)
mask = mask.reshape(1, -1, h, w)
x = x.reshape(1, -1, x.size(-2), x.size(-1))
adaptive_weight = self.weight.unsqueeze(0).repeat(b, 1, 1, 1, 1) # b, c_out, c_in, k, k
adaptive_weight_mean = adaptive_weight.mean(dim=(-1, -2), keepdim=True)
# adaptive_weight = adaptive_weight_mean * (2 * c_att.unsqueeze(1)) * (2 * f_att.unsqueeze(2)) + adaptive_weight - adaptive_weight_mean
if self.kernel_decompose == 'high':
adaptive_weight = adaptive_weight_mean + (adaptive_weight - adaptive_weight_mean) * (
c_att.unsqueeze(1) * 2) * (f_att.unsqueeze(2) * 2)
elif self.kernel_decompose == 'low':
adaptive_weight = adaptive_weight_mean * (c_att.unsqueeze(1) * 2) * (f_att.unsqueeze(2) * 2) + (
adaptive_weight - adaptive_weight_mean)
adaptive_weight = adaptive_weight.reshape(-1, self.in_channels // self.groups, 3, 3)
# adaptive_bias = self.unsqueeze(0).repeat(b, 1, 1, 1, 1)
# print(adaptive_weight.shape)
# print(offset.shape)
# print(mask.shape)
# print(x.shape)
x = modulated_deform_conv2d(x, offset, mask, adaptive_weight, self.bias,
self.stride,
(self.kernel_size[0] // 2, self.kernel_size[1] // 2) if isinstance(self.PAD,
nn.Identity) else (
0, 0), # padding
(1, 1), # dilation
self.groups * b, self.deform_groups * b)
else:
x = modulated_deform_conv2d(x, offset, mask, self.weight, self.bias,
self.stride,
(self.kernel_size[0] // 2, self.kernel_size[1] // 2) if isinstance(self.PAD,
nn.Identity) else (
0, 0), # padding
(1, 1), # dilation
self.groups, self.deform_groups)
# x = modulated_deform_conv2d(x, offset, mask, self.weight, self.bias,
# self.stride, self.padding,
# self.dilation, self.groups,
# self.deform_groups)
# if hasattr(self, 'OMNI_ATT'): x = x * f_att
return x.reshape(b, -1, h, w)
if __name__ == '__main__':
input_tensor = torch.randn(2, 64, 128, 128)
adaptive_dilated_conv = AdaptiveDilatedConv(in_channels=64,out_channels=64,kernel_size=3)
output_tensor = adaptive_dilated_conv(input_tensor)
print(input_tensor.shape) print(output_tensor.shape)
便捷下载方式
浏览打开网址:https://github.com/ai-dawang/PlugNPlay-Modules
更多分析可见原文