最近几篇较好论文实现代码（附源代码下载）

科技 2024-12-05 10:02 江苏

关注并星标

从此不迷路

计算机视觉研究院

公众号ID｜ComputerVisionGzq

学习群｜扫码在主页获取加入方式

计算机视觉研究院专栏

这个是”计算机视觉研究院“新推出的模块，后期我们会陆续为大家带来最新文章及技术的代码实现分享！

《Towards Layer-wise Image Vectorization》

GitHub: github.com/ma-xu/LIVE

Installation

We suggest users to use the conda for creating new python environment.

Requirement: 5.0<GCC<6.0; nvcc >10.0.

git clone git@github.com:ma-xu/LIVE.gitcd LIVEconda create -n live python=3.7conda activate liveconda install -y pytorch torchvision -c pytorchconda install -y numpy scikit-imageconda install -y -c anaconda cmakeconda install -y -c conda-forge ffmpegpip install svgwrite svgpathtools cssutils numba torch-tools scikit-fmm easydict visdompip install opencv-python==4.5.4.60  # please install this version to avoid segmentation fault.cd DiffVGgit submodule update --init --recursivepython setup.py installcd ..

Run Experiments

conda activate livecd LIVE# Please modify the paramters accordingly.python main.py --config <config.yaml> --experiment <experiment-setting> --signature <given-folder-name> --target <input-image> --log_dir <log-dir># Here is an simple example:python main.py --config config/base.yaml --experiment experiment_5x1 --signature smile --target figures/smile.png --log_dir log/

《Multimodal Token Fusion for Vision Transformers》

GitHub: github.com/yikaiw/TokenFusion

《PointAugmenting: Cross-Modal Augmentation for 3D Object Detection》

GitHub: github.com/VISION-SJTU/PointAugmenting

《Fantastic questions and where to find them: FairytaleQA -- An authentic dataset for narrative comprehension.》

GitHub: github.com/uci-soe/FairytaleQAData

《LUNAR: Unifying Local Outlier Detection Methods via Graph Neural Networks》

GitHub: github.com/agoodge/LUNAR

Firstly, extract data.zip

To replicate the results on the HRSS dataset with neighbour count k = 100 and "Mixed" negative sampling scheme

Extract saved_models.zip
Run:

python3 main.py --dataset HRSS --samples MIXED --k 100

To train a new model:

python3 main.py --dataset HRSS --samples MIXED --k 100 --train_new_model

《Pseudo-Label Transfer from Frame-Level to Note-Level in a Teacher-Student Framework for Singing Transcription from Polyphonic Music》

GitHub: github.com/keums/icassp2022-vocal-transcription

《Robust Disentangled Variational Speech Representation Learning for Zero-shot Voice Conversion》

GitHub: github.com/jlian2/Robust-Voice-Style-Transfer

Demo：https://jlian2.github.io/Robust-Voice-Style-Transfer/

《HandoverSim: A Simulation Framework and Benchmark for Human-to-Robot Object Handovers》

GitHub: github.com/NVlabs/handover-sim

2022-06-03 16:13:46: Running evaluation for results/2022-02-28_08-57-34_yang-icra2021_s0_test2022-06-03 16:13:47: Evaluation results:|  success rate   |    mean accum time (s)    |                    failure (%)                     ||      (%)        |  exec  |  plan  |  total  |  hand contact   |   object drop   |    timeout     ||:---------------:|:------:|:------:|:-------:|:---------------:|:---------------:|:--------------:|| 64.58 ( 93/144) | 4.864  | 0.036  |  4.900  | 17.36 ( 25/144) | 11.81 ( 17/144) | 6.25 (  9/144) |2022-06-03 16:13:47: Printing scene ids2022-06-03 16:13:47: Success (93 scenes):---  ---  ---  ---  ---  ---  ---  ---  ---  ---  ---  ---  ---  ---  ---  ---  ---  ---  ---  ---  0    1    2    3    4    5    6    7    8    9   10   12   13   15   16   17   18   19   21   22 23   25   26   27   28   30   33   34   35   36   37   38   42   43   46   49   50   53   54   56 59   60   62   63   64   66   68   69   70   71   72   77   81   83   85   87   89   91   92   93 94   95   96   98  103  106  107  108  109  110  111  112  113  114  115  116  117  120  121  123125  126  127  128  130  131  132  133  137  138  139  141  143---  ---  ---  ---  ---  ---  ---  ---  ---  ---  ---  ---  ---  ---  ---  ---  ---  ---  ---  ---2022-06-03 16:13:47: Failure - hand contact (25 scenes):---  ---  ---  ---  ---  ---  ---  ---  ---  ---  ---  ---  ---  ---  ---  ---  ---  ---  ---  --- 11   14   20   29   39   40   41   44   45   47   51   55   57   58   65   67   74   80   82   88102  105  118  124  136---  ---  ---  ---  ---  ---  ---  ---  ---  ---  ---  ---  ---  ---  ---  ---  ---  ---  ---  ---2022-06-03 16:13:47: Failure - object drop (17 scenes):---  ---  ---  ---  ---  ---  ---  ---  ---  ---  ---  ---  ---  ---  ---  ---  --- 24   31   32   52   61   78   79   84   86   97  101  104  119  122  134  140  142---  ---  ---  ---  ---  ---  ---  ---  ---  ---  ---  ---  ---  ---  ---  ---  ---2022-06-03 16:13:47: Failure - timeout (9 scenes):---  ---  ---  ---  ---  ---  ---  ---  --- 48   73   75   76   90   99  100  129  135---  ---  ---  ---  ---  ---  ---  ---  ---2022-06-03 16:13:47: Evaluation complete.

《CDLM: Cross-Document Language Modeling》

GitHub: github.com/aviclu/CDLM

You can either pretrain by yourself or use the pretrained CDLM model weights and tokenizer files, which are available on HuggingFace.

Then, use：

from transformers import AutoTokenizer, AutoModel# load model and tokenizertokenizer = AutoTokenizer.from_pretrained('biu-nlp/cdlm')model = AutoModel.from_pretrained('biu-nlp/cdlm')

《Continual Learning for Task-Oriented Dialogue Systems》

GitHub: github.com/andreamad8/ToDCL

《Torsional Diffusion for Molecular Conformer Generation》

GitHub: github.com/gcorso/torsional-diffusion

《MMChat: Multi-Modal Chat Dataset on Social Media》

GitHub: github.com/silverriver/MMChat

《Can CNNs Be More Robust Than Transformers?》

GitHub: github.com/UCSC-VLAA/RobustCNN

《Revealing Single Frame Bias for Video-and-Language Learning》

GitHub: github.com/jayleicn/singularity

《Progressive Distillation for Fast Sampling of Diffusion Models》

GitHub: github.com/Hramchenko/diffusion_distiller

《Neural Basis Models for Interpretability》

GitHub: github.com/facebookresearch/nbm-spam

《Scalable Interpretability via Polynomials》

GitHub: github.com/facebookresearch/nbm-spam

《Infinite Recommendation Networks: A Data-Centric Approach》

GitHub: github.com/noveens/infinite_ae_cf

《The GatedTabTransformer. An enhanced deep learning architecture for tabular modeling》

GitHub: github.com/radi-cho/GatedTabTransformer

Usage：

import torchimport torch.nn as nnfrom gated_tab_transformer import GatedTabTransformer
model = GatedTabTransformer(    categories = (10, 5, 6, 5, 8),      # tuple containing the number of unique values within each category    num_continuous = 10,                # number of continuous values    transformer_dim = 32,               # dimension, paper set at 32    dim_out = 1,                        # binary prediction, but could be anything    transformer_depth = 6,              # depth, paper recommended 6    transformer_heads = 8,              # heads, paper recommends 8    attn_dropout = 0.1,                 # post-attention dropout    ff_dropout = 0.1,                   # feed forward dropout    mlp_act = nn.LeakyReLU(0),          # activation for final mlp, defaults to relu, but could be anything else (selu, etc.)    mlp_depth=4,                        # mlp hidden layers depth    mlp_dimension=32,                   # dimension of mlp layers    gmlp_enabled=True                   # gmlp or standard mlp)
x_categ = torch.randint(0, 5, (1, 5))   # category values, from 0 - max number of categories, in the order as passed into the constructor abovex_cont = torch.randn(1, 10)             # assume continuous values are already normalized individually
pred = model(x_categ, x_cont)print(pred)

《Distract Your Attention: Multi-head Cross Attention Network for Facial Expression Recognition》

GitHub: github.com/yaoing/DAN

《Towards Principled Disentanglement for Domain Generalization》

GitHub: github.com/hlzhang109/DDG

《SoundStream: An End-to-End Neural Audio Codec》

GitHub: github.com/wesbz/SoundStream

© THE END

转载请联系本公众号获得授权

计算机视觉研究院学习群等你加入！

计算机视觉研究院主要涉及深度学习领域，主要致力于人脸检测、人脸识别，多目标检测、目标跟踪、图像分割等研究方向。研究院接下来会不断分享最新的论文算法新框架，我们这次改革不同点就是，我们要着重”研究“。之后我们会针对相应领域分享实践过程，让大家真正体会摆脱理论的真实场景，培养爱动手编程爱动脑思考的习惯！

扫码关注

计算机视觉研究院

公众号ID｜ComputerVisionGzq

学习群｜扫码在主页获取加入方式

计算机视觉研究院

计算机视觉研究院主要涉及AI研究和落地实践，主要致力于目标检测、目标跟踪、图像分割、OCR、模型量化、模型部署等研究方向。研究院每日分享最新的论文算法新框架，提供论文一键下载，并分享实战项目。研究院主要着重”技术研究“和“实践落地”！

独自一人，怒发顶会！

除了Yolo的其他选择，轻量级检测网络层出不穷（框架解析及部署实践）

AI顶会ICML收了一篇论文：没算法没实验，全靠idea思路好

QueryDet：级联稀疏query加速高分辨率下的小目标检测（代码已开源）

YoloV：视频中目标实时检测依然很棒（附源代码下载）

图像自适应YOLO：恶劣天气下的目标检测（附源代码）

ResNet超强变体：京东AI新开源的计算机视觉模块！（附源代码）

最高加速9倍！字节跳动开源8比特混合精度Transformer引擎

水下检测+扩散模型：或成明年CVPR最大惊喜！

2024新技术：远距离的小目标也可以准确检测

轻量级模型，重量级性能，TinyLlama、LiteLlama小模型火起来了

欢迎加入“计算机视觉研究院”学习群

机场项目：解决飞行物空间大小/纵横比、速度、遮挡等问题引起的实时目标检测问题

视觉语言大模型新SOTA！高效微调方法性能碾压LoRA

YOLO-S：小目标检测的轻量级、精确的类YOLO网络

旋转角度目标检测的重要性！！！（附源论文下载）

245个目标检测开源项目合集，建议收藏！

新技术：高效的自监督视觉预训练，局部遮挡再也不用担心！

最近几篇较好论文实现代码（附源代码下载）

华科&字节提出目标感知基础模型GLEE，一统所有目标感知任务

比标准Attention提速5-9倍，大模型都在用的FlashAttention v2来了

NeurIPS 2024｜杜克大学&谷歌提出SLED解码框架，无需外部数据与额外训练，有效缓解大语言模型幻觉，提高事实准确性

ICLR'25 惊现满分论文！！！走对捷径，高分论文并不难

大改Yolo框架 | 能源消耗极低的目标检测新框架（附论文下载）

RTX 4090可跑、完全开源，最快视频生成模型问世，实测一言难尽

EdgeYOLO：边缘设备上实时运行的目标检测器及Pytorch实现

实用教程详解：模型部署，用DNN模块部署YOLOv5目标检测（附源代码）

半监督辅助目标检测：自训练+数据增强提升精度（附源码下载）

为什么制造业都在用低代码提效？

欢迎加入“计算机视觉研究院”学习群

从源头消除大模型“幻觉”，性价比吊打传统微调方法

Fast YOLO：用于实时嵌入式目标检测（附论文下载）

旋转角度目标检测的重要性！！！（附源论文下载）

YOLO-S：小目标检测的轻量级、精确的类YOLO网络

最后征稿+连续8届检索 | 第九届控制工程与人工智能国际会议(CCEAI 2025)会议地点已定！

智慧建筑：基于YOLOv7的建筑外墙缺陷检测

轻量级模型，重量级性能，TinyLlama、LiteLlama小模型火起来了

欢迎加入“计算机视觉研究院”学习群

年薪百万or惨遭裁员，AIGC开发者如何破局？

陈天奇团队LLM结构化生成新引擎XGrammar：百倍加速、近零开销

2024全球无人机市场洞察报告

粉丝福利！免费赠书中奖名单

凭什么YOLO是最强目标检测器，一文读懂！

大改Yolo框架 | 能源消耗极低的目标检测新框架（附论文下载）

智能体零样本解决未见过人类设计环境！全靠这个开放式物理RL环境空间

史上最通俗易懂的YOLO系列（v1-v10）模型解读！

更快、更灵活的Transformer图像去雾网络

“计算机视觉研究院”商务合作

利用先进技术保家卫国：深度学习进行小目标检测（适合初学者入门）

分类

时事

民生

政务

教育

文化

科技

财富

体娱

健康

情感

旅行

百科

职场

楼市

企业

乐活

学术

汽车

时尚

创业

美食

幽默

美体

文摘

原创标签

时事社会财经军事教育体育科技汽车科学房产搞笑综艺明星音乐动漫游戏时尚健康旅游美食生活摄影宠物职场育儿情感小说曲艺文化历史三农文学娱乐电影视频图片新闻宗教电视剧纪录片广告创意壁纸头像心灵鸡汤星座命理教育培训艺术文化金融财经健康医疗美妆时尚餐饮美食母婴育儿社会新闻工业农业时事政治星座占卜幽默笑话独立短篇连载作品文化历史科技互联网

发布位置

广东北京山东江苏河南浙江山西福建河北上海四川陕西湖南安徽湖北内蒙古江西云南广西甘肃辽宁黑龙江贵州新疆重庆吉林天津海南青海宁夏西藏香港澳门台湾美国加拿大澳大利亚日本新加坡英国西班牙新西兰韩国泰国法国德国意大利缅甸菲律宾马来西亚越南荷兰柬埔寨俄罗斯巴西智利卢森堡芬兰瑞典比利时瑞士土耳其斐济挪威朝鲜尼日利亚阿根廷匈牙利爱尔兰印度老挝葡萄牙乌克兰印度尼西亚哈萨克斯坦塔吉克斯坦希腊南非蒙古奥地利肯尼亚加纳丹麦津巴布韦埃及坦桑尼亚捷克阿联酋安哥拉