应用bibliometrix包进行文献计量学（四）

文摘 2024-10-22 00:08 北京

A brief introduction to bibliometrix（四）

包网址：https://www.bibliometrix.org

教程网址：https://www.bibliometrix.org/vignettes/Introduction_to_bibliometrix.html

网络图特征的描述性分析

函数networkStat计算几个汇总统计信息。

特别是，从文献矩阵（或igraph对象）开始，计算两组描述性度量：

网络的汇总统计；The summary statistics of the network;
顶点中心性和声望的主要指标。The main indices of centrality and prestige of vertices.

# An example of a classical keyword co-occurrences network

NetMatrix <- biblioNetwork(M, analysis = 
"co-occurrences", network = "keywords", sep = ";"
)
netstat <- networkStat(NetMatrix)

网络的汇总统计

这组统计数据允许描述网络的结构属性：

大小是组成网络的顶点数量；Size is the number of vertices composing the network;
密度是网络中所有可能边的当前边的比例；Density is the proportion of present edges from all possible edges in the network;
传递性是三角形与连通三元组的比率；Transitivity is the ratio of triangles to connected triples;
直径是网络中最长的测地线距离（两个节点之间最短路径的长度）；Diameter is the longest geodesic distance (length of the shortest path between two nodes) in the network;
度分布是顶点度的累积分布；Degree distribution is the cumulative distribution of vertex degrees;
度中心性是整个网络的归一化程度；Degree centralization is the normalized degree of the overall network
贴近中心化是网络中顶点平均测地线距离的归一化倒数；Closeness centralization is the normalized inverse of the vertex average geodesic distance to others in the network;
特征向量中心化是图矩阵的第一个特征向量；Eigenvector centralization is the first eigenvector of the graph matrix;
介数中心化是通过顶点的测地线的归一化数量；Betweenness centralization is the normalized number of geodesics that pass through the vertex;
平均路径长度是网络中每对顶点之间最短距离的平均值。Average path length is the mean of the shortest distance between each pair of vertices in the network.

names(netstat$network)

顶点的中心性和威望的主要指标 The main indices of centrality and prestige of vertices

这些度量有助于识别网络中最重要的顶点以及连接到第三个顶点的两个顶点的倾向。

networkStat返回的顶点级别的统计信息是：

Degree centrality
Closeness centrality measures how many steps are required to access every other vertex from a given vertex;
Eigenvector centrality is a measure of being well-connected connected to the well-connected;
Betweenness centrality measures brokerage or gatekeeping potential. It is (approximately) the number of shortest paths between vertices that pass through a particular vertex;
PageRank score approximates probability that any message will arrive to a particular vertex. This algorithm was developed by Google founders, and originally applied to website links;
Hub Score estimates the value of the links outgoing from the vertex. It was initially applied to the web pages;
Authority Score is another measure of centrality initially applied to the Web. A vertex has high authority when it is linked by many other vertices that are linking many other vertices;
Vertex Ranking is an overall vertex ranking obtained as a linear weighted combination of the centrality and prestige vertex measures. The weights are proportional to the loadings of the first component of the Principal Component Analysis.

names(netstat$vertex)

要总结networkStat函数的主要结果，请使用通用函数summary。

它通过几个表格显示有关网络和顶点描述的主要信息。

summary接受一个额外的参数。k是一个格式值，指示每个表的行数。选择k=10，您决定查看前10个顶点。

summary(netstat, k=10)

## 
## 
## Main statistics about the network
## 
##  Size                                  475 
##  Density                               0.024 
##  Transitivity                          0.335 
##  Diameter                              5 
##  Degree Centralization                 0.301 
##  Average path length                   2.743 
##

可视化书目网络

所有文献网络都可以图形可视化或建模。

在这里，我们展示了如何使用函数networkPlot和VOSView软件可视化网络(https://www.vosviewer.com).

使用函数networkPlot或使用VOSview绘制由bibleoNetwork创建的网络。

networkPlot的主要参数是type。它表示网络地图布局：圆、kamada-kawai、mds等。选择type="vosView"，函数会自动：（i）将网络保存到一个名为"vosnetwork.net"的pajek网络文件中；（ii）启动一个将映射文件“vosnetwork.net”的VOSView实例。您需要使用参数vos. path声明VOSView软件所在文件夹的完整路径（即vos.path='c：/soft/VOSView'）。

国家间科学合作分析

# Create a country collaboration network

M <- metaTagExtraction(M, Field = 
"AU_CO", sep = ";"
)

NetMatrix <- biblioNetwork(M, analysis = 
"collaboration", network = "countries", sep = ";"
)

# Plot the network
net=networkPlot(NetMatrix, n = dim(NetMatrix)[
1], Title = "Country Collaboration", type = "circle", size=TRUE, remove.multiple=FALSE,labelsize=0.7,cluster="none")

Co-Citation Network

# Create a co-citation network

# NetMatrix <- biblioNetwork(M, analysis = "co-citation", network = "references", sep = ";")

# Plot the network
#net=networkPlot(NetMatrix, n = 30, Title = "Co-Citation Network", type = "fruchterman", size=T, remove.multiple=FALSE, labelsize=0.7,edgesize = 5)

Keyword co-occurrences

# Create keyword co-occurrences network

NetMatrix <- biblioNetwork(M, analysis = 
"co-occurrences", network = "keywords", sep = ";"
)

# Plot the network
net=networkPlot(NetMatrix, normalize=
"association", weighted=T, n = 30, Title = "Keyword Co-occurrences", type = "fruchterman", size=T,edgesize = 5,labelsize=0.7)

共词分析：一个领域的概念结构 Co-Word Analysis: The conceptual structure of a field

共词分析的目的是使用文献集合中的共出现一词来映射框架的概念结构。分析可以通过降维技术进行，例如多维缩放（MDS）、对应分析（CA）或多重对应分析（MCA）。

在这里，我们展示了一个例子，使用函数概念结构来绘制字段的概念结构，并使用K-means聚类来识别表达常见概念的文档集群。结果绘制在二维地图上。

概念结构包括自然语言处理（NLP）例程（参见函数术语提取），从标题和摘要中提取术语。此外，它实现了波特词干算法，将屈折（或有时派生）的单词简化为词干、基部或词根形式。

# Conceptual Structure using keywords (method="CA")

CS <- conceptualStructure(M,field="ID", method="CA", minDegree=4, clust=5, stemming=FALSE, labelsize=10, documents=10)

历史直引网络-Historical Direct Citation Network

历史地图是由E. Garfield（2004）提出的图表，用于表示由文献收藏产生的最相关直接引用的时间网络地图。

该函数生成一个按时间顺序排列的直接引用网络矩阵，可以使用histPlot绘制：

# Create a historical citation network
options(width=130)
histResults <- histNetwork(M, min.citations = 1, sep = ";")

# Plot a historical co-citation network
net <- histPlot(histResults, n=15, size = 10, labelsize=5)

参考文献-Main Authors’ references (about bibliometrics)


Aria, M. & Cuccurullo, C. (2017). bibliometrix: An R-tool for comprehensive science mapping analysis, Journal of Informetrics, 11(4), pp 959-975, Elsevier, DOI: 10.1016/j.joi.2017.08.007 (https://doi.org/10.1016/j.joi.2017.08.007).

Aria M., Misuraca M., Spano M. (2020) Mapping the evolution of social research and data science on 30 years of Social Indicators Research, Social Indicators Research. (DOI: )https://doi.org/10.1007/s11205-020-02281-3)

Aria, M., Cuccurullo, C., D’Aniello, L., Misuraca, M., & Spano, M. (2022). Thematic Analysis as a New Culturomic Tool: The Social Media Coverage on COVID-19 Pandemic in Italy. Sustainability, 14(6), 3643, (https://doi.org/10.3390/su14063643).

Aria M., Alterisio A., Scandurra A, Pinelli C., D’Aniello B, (2021) The scholar’s best friend: research trends in dog cognitive and behavioural studies, Animal Cognition. (https://doi.org/10.1007/s10071-020-01448-2)

Cuccurullo, C., Aria, M., & Sarto, F. (2016). Foundations and trends in performance management. A twenty-five years bibliometric analysis in business and public administration domains, Scientometrics, DOI: 10.1007/s11192-016-1948-8 (https://doi.org/10.1007/s11192-016-1948-8).

Cuccurullo, C., Aria, M., & Sarto, F. (2015). Twenty years of research on performance management in business and public administration domains. Presentation at the Correspondence Analysis and Related Methods conference (CARME 2015) in September 2015 (https://www.bibliometrix.org/documents/2015Carme_cuccurulloetal.pdf).

Sarto, F., Cuccurullo, C., & Aria, M. (2014). Exploring healthcare governance literature: systematic review and paths for future research. Mecosan (https://www.francoangeli.it/Riviste/Scheda_Rivista.aspx?IDarticolo=52780&lingua=en).

Cuccurullo, C., Aria, M., & Sarto, F. (2013). Twenty years of research on performance management in business and public administration domains. In Academy of Management Proceedings (Vol. 2013, No. 1, p. 14270). Academy of Management (https://doi.org/10.5465/AMBPP.2013.14270abstract).

广告-新课推荐

高分文章新方法-基于R语言的动态预测模型课程第三期

开课目的及前言

预测模型类文章目前总结起来发展经历了以下三个阶段：

基于传统流行病学的列线图模型（本质都是cox回归及glm回归），简单的统计学分析模型，是模型依赖的方法，临床上实际情况很难满足其前提假设，实际效果不好。
基于机器学习/深度学习的预测模型的构建（在数据上提高了维度，在算法上引入了机器学习），虽然算法上引入了机器学习模型，处理数据更加灵活，模型的假设也更少。但是在使用的数据上还是患者的一次基线数据进行预测，与临床实际不符。
基于纵向数据的动态预测模型（基于纵向多次随访数据，模型应用联合模型等动态预测模型方法），应用患者的多次随访数据对最终的生存结果进行预测，从数据和方法上都更类似于临床实际。

考虑到动态预测模型有以下特点，因此必然是后续高分文章的必备方法：

数据上必须有同一个患者的多次随访数据，相对于既往横断面一次基线数据，数据的收集难度更大，而且动态预测模型需拟合纵向的线性混合模型，因此需要的数据量较大。这就提示我们如果能收集到如上数据更加容易发高分文章。
应用方法学动态预测模型需首先掌握普通生存分析及普通预测模型的方法，并且还需要熟悉纵向数据分析的广义线性混合模型，再次基础上还需要掌握tidyverse语法基础来将自己的数据转换为满足函数要求的纵向数据，另外对于联合模型，模型的结合形式及变量选择也均需要从临床背景及统计学方法考虑。

授课老师

1 灵活胖子

双一流学校肿瘤学博士毕业，目前就职于国内五大肿瘤中心之一。科研方向为真实世界研究，生物信息学分析及人工智能研究。目前以第一或共同第一作者身份发表SCI论文10余篇，累计IF50+。目前与国内多个院校及医院有科研合作。联合翻译小组同学，在国内第一次将jmbayes2及dynamicLM全文翻译为中文并在公众号发表。

2 Rio

医学博士，临床医生。发表中英文文章 10 余篇。R 与 python 爱好者。

课程目录及安排

第一部分：R语言基础部分

第二部分：传统临床基础统计图表制作

第三部分：常规生存分析部分

第四部分：高级生存分析部分

第五部分：动态预测模型部分

授课形式及时间

授课形式：远程在线实时直播授课。

授课时间：2024年12月开课，总课时不少于30小时，每周利用休息时间进行4-6小时的授课，预计4-6周完成所有授课内容。

答疑支持：建立课程专属微信群，1年内课程内容免费答疑。

视频回看：3年内免费无限次回看。

课程售价及售后保证

课程售价：总价3000元，报名可先交300元预定，开课后2周内交齐即可

对公转账等手续务必提前联系助教

承办公司：天企助力（天津）生产力促进有限公司

奖励政策：学员应用所学内容发表IF 10+文章可退还学费（具体要求及流程需要咨询助教）

报名咨询

可联系我的助教进行咨询

助教联系电话：18502623993

灵活胖子的科研进步之路

医学博士，R语言及Python爱好者，科研方向为真实世界研究，生信分析与人工智能研究。

最新文章

数据可视化

数据处理

Cursor的python语言配置-基于anaconda

整洁数据在R语言中的要求

Cursor的R语言配置-本地实操截屏步骤版

tidyplots学习超详细注释版-基于Cursor-(4)

12月15日开课-动态预测模型

Cell主刊文章超详细解读及代码注释-基于Cursor

第40期分享会-1区动态预测模型文章解析

tidyplots学习超详细注释版-基于Cursor-(3)

tidyplots学习超详细注释版-基于Cursor-(2)

tidyplots学习超详细注释版-基于Cursor-(1)

课程更新-应用Cursor进行代码编写演示

Cursor 的基本用法-R语言实战版

第39期分享会-算力平台介绍及AI辅助编程体会

外网第一AI辅助IDE-Cursor配置R语言环境(实操总结)

lcmm(2)-如何使用hlme函数估计潜在类别混合模型-2

lcmm(2)-如何使用hlme函数估计潜在类别混合模型-1

pycaret学习笔记（6）-quickstart-时间序列

第38期分享会-动态预测模型文章分享-联合法模型

肌少症文献计量学分析

应用潜类别分析扩展混合模型体系-lcmm（1）：introduction&get started

pycaret学习笔记（5）-quickstart-异常值检测

50分肿瘤学年鉴文献分享-乳腺癌的动态预测:利用TEAM试验在临床实践中证明动态预测模型的可行性

动态预测模型文献分享-血小板计数作为感染患者住院死亡率的动态预测标记

第37期公开课-临床结构化数据整理及科研选题

目标仿真试验的基本原理、设计要素及其优缺点

动态模型文献分享-通过纵向D-二聚体分析动态评估癌症患者静脉血栓栓塞风险：一项前瞻性研究

冰冷的临床数据背后，是一个个活生生的个体

西柚仓库集合

第36期公开课-肿瘤学年鉴（IF 50）文献动态预测模型方法学分享

pycaret学习笔记（3）-quickstart-回归任务

pycaret学习笔记（2）-quickstart-分类任务

给pandas带来tidyverse的力量！-tidyversetopandas教程(3)

给pandas带来tidyverse的力量！-tidyversetopandas教程(2)

pycaret学习笔记（1）-整体介绍及安装

胖子老师独自授课-基于R语言的动态预测模型课程

给pandas带来tidyverse的力量！-tidyversetopandas教程(1)

应用bibliometrix包进行文献计量学（四）

conda使用教程

IF=30.8：COX+竞争风险+XGBOOST+神经网络=预测英国普通女性10年癌症死亡率风险

第35期公开课-基于Python机器学习文献解读及实操演示

应用bibliometrix包进行文献计量学（三）

应用bibliometrix包进行文献计量学（二）

应用bibliometrix包进行文献计量学（一）

文献计量学的一体化解决方案-bibliometrix包（slides-1）

第34 期分享会-也来谈谈贝叶斯

bulk+单细胞+热点文章套路分享学习-bulk和scRNA揭示肝细胞癌中的细胞异质性和免疫浸润(一)

DEPTH2-基于表达谱的肿瘤异质性分析

IOBR2：转录组数据预处理

分类

时事

民生

政务

教育

文化

科技

财富

体娱

健康

情感

旅行

百科

职场

楼市

企业

乐活

学术

汽车

时尚

创业

美食

幽默

美体

文摘

原创标签

时事社会财经军事教育体育科技汽车科学房产搞笑综艺明星音乐动漫游戏时尚健康旅游美食生活摄影宠物职场育儿情感小说曲艺文化历史三农文学娱乐电影视频图片新闻宗教电视剧纪录片广告创意壁纸头像心灵鸡汤星座命理教育培训艺术文化金融财经健康医疗美妆时尚餐饮美食母婴育儿社会新闻工业农业时事政治星座占卜幽默笑话独立短篇连载作品文化历史科技互联网

发布位置

广东北京山东江苏河南浙江山西福建河北上海四川陕西湖南安徽湖北内蒙古江西云南广西甘肃辽宁黑龙江贵州新疆重庆吉林天津海南青海宁夏西藏香港澳门台湾美国加拿大澳大利亚日本新加坡英国西班牙新西兰韩国泰国法国德国意大利缅甸菲律宾马来西亚越南荷兰柬埔寨俄罗斯巴西智利卢森堡芬兰瑞典比利时瑞士土耳其斐济挪威朝鲜尼日利亚阿根廷匈牙利爱尔兰印度老挝葡萄牙乌克兰印度尼西亚哈萨克斯坦塔吉克斯坦希腊南非蒙古奥地利肯尼亚加纳丹麦津巴布韦埃及坦桑尼亚捷克阿联酋安哥拉