赛事简介
欢迎参加SMP 2024大模型图分析挑战赛。第十二届全国社会媒体处理大会将在2024年10月于河南新乡召开。全国社会媒体处理大会专注于社会媒体处理领域的科学研究和工程开发,为传播最新的学术研究与技术成果提供广泛的交流平台。第十二届全国社会媒体处理大会(SMP2024)由中国中文信息学会社会媒体处理专委会主办,河南师范大学承办。
SMP 2024 大模型图分析挑战赛旨在探索将大语言模型应用于图分析场景的可能性。比赛由中国中文信息学会社会媒体处理专委会主办,通过使用六个Python库(NetworkX、igraph、CDlib、Karateclub、Littleballoffur、graspologic)构建评测数据集,以衡量大语言模型在解决图分析问题上的能力,包括基础图论、图统计学习和图嵌入三个方面的问题。天池平台为指定赛事平台。本届比赛的目标是共同推动大模型在图领域的发展,并邀请广大开发者和相关技术团队为大模型时代加速创新。
赛事日程
报名:2024年8月5日-8月31日在天池平台完成个人信息注册,即可报名参赛。
初赛:2024年8月5日-8月31日
参赛团队通过天池平台下载镜像,本地调试算法,在线提交结果,系统进行实时评测并返回成绩。初赛开放评测时间为8月17日—8月31日,初赛排名前 10 名的团队将进入复赛。
复赛:2024年9月4日-9月17日
复赛仍然为在线提交结果,系统将在一天内进行评测并返回成绩。
获奖者撰写大赛详细技术报告:2024年9月30日前
颁奖:2024年10月10日-10月13日,具体日期另行通知(一等奖和二等奖获奖者需要派代表线下参加会议)
获奖团队将参加 SMP 2024 大会,进行技术报告,并接受现场颁奖典礼。
奖项设置
总奖金为55,000元人民币,各奖项分配如下:(奖金均为税前)
🥇一等奖(1名):20,000元
🥈二等奖(2名):10,000元
🥉三等奖(3名):5,000元
任务和主题
本次比赛建议参赛选手以闭源大语言模型(GPT、Claude、Gemini等)为基础构建问答系统,让系统能够通过编写执行Python代码来回答用户提出的图分析相关的问题。参赛选手可以使用其他可公开访问的外部数据来增强模型,也可以使用向量数据库等技术。
本次比赛评估模型能力的赛题按照设计模型的能力不同,以及题目的难易程度,分为判断题、计算题、绘图题、综合题四种类型。在初赛阶段,题目仅包含判断题和计算题。在复赛阶段,题目包括上述所有题型。
本次比赛的题目数据有两种类型,一种是在题目信息中对所需图数据进行描述,另一种则是将图数据保存在文件中。对于两种类型的数据,均建议让模型编写(并执行代码)进行作答。
1)比赛数据
本次比赛将提供一些数据集,帮助参赛选手使用外部增强技术,提高大语言模型在相关任务上的表现。具体包括:文档数据集:参赛选手可以使用和拓展相关文档数据集,采用检索增强生成等技术,提高大语言模型在图相关任务上的表现。文档数据集是从几个Python库的相关网站爬取并整理的数据集。根据不同Python库的文档,分别整理出了6个json文档。参赛选手可以使用上述文档数据集,对闭源大语言模型进行增强,从而提高大语言模型的整体性能。同时,参赛选手也可以对上述文档数据集进行重新组织或修改,也可以拓展数据集,进一步提高大语言模型的表现。
2)测试数据
本次比赛评估模型能力的赛题按照设计模型的能力不同,以及题目的难易程度,分为判断题、计算题、绘图题、综合题四种类型。在初赛阶段,题目仅包含判断题和计算题。在复赛阶段,题目包括上述所有题型。
本次比赛的题目数据有两种类型,一种是在题目信息中对所需图数据进行描述,另一种则是将图数据保存在文件中。对于两种类型的数据,均建议让模型编写(并执行代码)进行作答。
初赛测试数据:包含1000道题目,仅包含判断题和计算题
复赛测试数据:包含512道题目,包括判断题、计算题、绘图题和综合题
*比赛各阶段的详细内容和评测方式,请参考比赛官网。
链接报名
https://tianchi.aliyun.com/competition/entrance/532253
比赛数据
本次比赛将提供一些数据集,帮助参赛选手使用外部增强技术,提高大语言模型在相关任务上的表现。具体包括:文档数据集:参赛选手可以使用和拓展相关文档数据集,采用检索增强生成等技术,提高大语言模型在图相关任务上的表现。文档数据集是从几个Python库的相关网站爬取并整理的数据集。根据不同Python库的文档,分别整理出了6个json文档。参赛选手可以使用上述文档数据集,对闭源大语言模型进行增强,从而提高大语言模型的整体性能。同时,参赛选手也可以对上述文档数据集进行重新组织或修改,也可以拓展数据集,进一步提高大语言模型的表现。
测试数据
本次比赛评估模型能力的赛题按照设计模型的能力不同,以及题目的难易程度,分为判断题、计算题、绘图题、综合题四种类型。在初赛阶段,题目仅包含判断题和计算题。在复赛阶段,题目包括上述所有题型。
本次比赛的题目数据有两种类型,一种是在题目信息中对所需图数据进行描述,另一种则是将图数据保存在文件中。对于两种类型的数据,均建议让模型编写(并执行代码)进行作答。
初赛测试数据:包含1000道题目,仅包含判断题和计算题
复赛测试数据:包含512道题目,包括判断题、计算题、绘图题和综合题
结果提交
初赛结果提交:初赛阶段,参赛队伍本地调试算法,在天池平台在线提交结果,结果文件命名为"参赛队名称_result",以utf-8编码格式保存。文件每行是一个json串,包含"ID","question","answer"。样例见下文。“ID”:指题目编号,并无实际意义;“question”:需要参赛选手使用大语言模型解决的问题;“answer”:问题对应的结果(绘图题答案可能不唯一,answer仅仅是参考答案,选手作答结果符合题目要求即可);
复赛结果提交:复赛阶段,参赛队伍本地调试算法,在天池平台在线提交结果,结果文件命名为"参赛队名称_result",以utf-8编码格式保存。文件每行是一个json串,包含"ID","question","code","answer"。样例见下文。“ID”:指题目编号,并无实际意义;“question”:需要参赛选手使用大语言模型解决的问题;“answer”:问题对应的结果(绘图题答案可能不唯一,answer仅仅是参考答案,选手作答结果符合题目要求即可);“code”:大语言模型生成的用于解决问题的代码字段。
复赛结束后,参赛队伍需要提交相关代码,相关数据集和说明文档(具体要求请参见其他需求)。赛题组织方将审核优胜参赛队伍的代码,要求代码符合比赛主题和相关规范。对于未提交或审核不通过的队伍,将取消其资格和比赛奖励,并通知递补选手。
评价指标
对于判断题和计算题,由于结果唯一,将比对结果与参考答案的结果是否一致,进行判断,正确则1分,错误则0分。
对于绘图题和综合题,由于结果类型较为复杂,将采用GPT-4o根据作答情况进行半自动化的评分,完全正确则1分,完全错误则0分,否则将根据作答情况给出部分分值。
注:判断题的结果只有TRUE/FALSE。计算题、综合题的结果保留两位小数。
初赛样例集
NetworkX示例
"Section ID": "maximum_branching",
"Description": [
"Returns a maximum branching from G."
],
"Field List": {
"Parameters:": {
"G : (multi)digraph-like": "The graph to be searched.",
"attr : str": "The edge attribute used to in determining optimality.",
"default : float": "The value of the edge attribute used if an edge does not have\nthe attributeattr.",
"preserve_attrs : bool": "If True, preserve the other attributes of the original graph (that are not\npassed toattr)",
"partition : str": "The key for the edge attribute containing the partition\ndata on the graph. Edges can be included, excluded or open using theEdgePartitionenum."
},
"Returns:": {
"B : (multi)digraph-like": "A maximum branching."
},
"Methods": []
},
"Rubrics": {}
}
igraph示例
{
"Section_id": "RectangleDrawer",
"Description": "Static class which draws rectangular vertices",
"Field List": {
"Methods": {
"draw_path": {
"Description": "overrides igraph.drawing.shapes.ShapeDrawer.draw_path\nDraws a rectangle-shaped path on the Cairo context without stroking or filling it.",
"Paramters": {},
"Return": [],
"References": [],
"Rasises": {},
"See Also": "ShapeDrawer.draw_path",
"example": []
},
"intersection_point": {
"Description": "overrides igraph.drawing.shapes.ShapeDrawer.intersection_point\nDetermines where the rectangle centered at (center_x, center_y) having the given width and height intersects with a line drawn from (source_x, source_y) to (center_x, center_y).",
"Paramters": {},
"Return": [],
"References": [],
"Rasises": {},
"See Also": "ShapeDrawer.intersection_point",
"example": []
}
},
"property": {}
},
"Rubric": {
"Example": []
}
}
Karateclub示例
{
"Section_id": "EgoNetSplitter",
"Description": "An implementation of \u201cEgo-Splitting\u201dfrom the KDD \u201817 paper \u201cEgo-Splitting Framework: from Non-Overlapping to Overlapping Clusters\u201d. The tool first createsthe ego-nets of nodes. A persona-graph is created which is clustered by the Louvain method. The resulting overlappingcluster memberships are stored as a dictionary.",
"Field List": {
"Parameters": {
"resolution(float)": "Resolution parameter of Python Louvain. Default 1.0.",
"seed(int)": "Random seed value. Default is 42.",
"weight(str)": "the key in the graph to use as weight. Default to \u2018weight\u2019. Specify None to force using an unweighted version of the graph."
},
"Methods": [
{
"fit": {
"Description": "Fitting an Ego-Splitter clustering model.",
"Arg types:": {
"graph(NetworkX graph)": "The graph to be clustered."
}
}
},
{
"get_memberships": {
"Description": "Getting the cluster membership of nodes.",
"Return types:": {
"memberships(dictionary of lists)": "Cluster memberships."
}
}
}
]
}
}
Littleballoffur示例
{
"Section_id": "RandomNodeSampler",
"Description": "An implementation of random node sampling. Nodes are sampled with uniform probability. For details about the algorithm see this paper.",
"Field List": {
"Parameters": {
"number_of_nodes": "Number of nodes. Default is 100.",
"seed": "Random seed. Default is 42."
},
"Methods": {
"sample": {
"Description": "Sampling nodes randomly.",
"Arg types": {
"graph": "NetworkX or NetworKit graph - The graph to be sampled from."
},
"Return types": {
"new_graph": "NetworkX or NetworKit graph - The graph of sampled nodes."
}
}
}
}
}
CDlib示例
{
"Section_id": "read_community_json",
"Description": "Read community list from JSON file.\n",
"Parameters": {
"path": "input filename",
"compress": "wheter the file is in a copress format, default False"
},
"Return": [
"a Clustering object\n"
],
"Example": [
"import networkx as nx\nfrom cdlib import algorithms, readwrite\ng = nx.karate_club_graph()\ncoms = algorithms.louvain(g)\nreadwrite.write_community_json(coms, \"communities.json\")\nreadwrite.read_community_json(coms, \"communities.json\")"
],
"References": []
}
初赛测试集
仅包括判断题和计算题,共计1000道题目
判断题示例
{"ID": 1,
"question": "Imagine we're constructing a new activity scheduling system for our community rehabilitation center, aimed at promoting social interaction for our clients through various group activities. The activities are represented by nodes, and the direct pairwise overlaps in schedulingdue to shared participants or resourcesare represented by edges between them. Our current activity network is comprised of the following connections: [(0, 1), (0, 2), (1, 2), (1, 3), (1, 4), (4, 5), (3, 6), (5, 7), (3, 8), (5, 9), (3, 10)].
To enhance the effectiveness of our program, we want to ensure that our activity schedule is conflict-free, enabling a seamless flow without overloading our clients or our resources. In other words, we're looking for an "Asteroidal Triple-free" (AT-free) structure within our activity network, a condition that ensures a more manageable and stress-free experience for participants as they transition from one activity to another.
Could we utilize the 'is_at_free' feature of NetworkX to verify whether our planned activity network maintains the AT-free property? This will assist us in confirming that our activity schedule is optimally structured for the well-being of our clients.
?"}
计算题示例
{"ID": 2,
"question": "Captain, imagine you're tasked with reviewing the flight network efficiency for a new regional airline with a modest fleet. Presently, they have only three destinations, labeled as 1, 2, and 3. The airline operates direct flights resembling a simplified network: Flight 1 directly connects to both Flight 2 and Flight 3, yet there is no direct flight between Flight 2 and Flight 3. In aviation terms, the 'density' of this network measures the proportion of possible direct connections that are operational, between the trio of destinations.
To calculate the operational efficiency of this network or its 'density', you would be provided with the current route graph of the airline. The node set acknowledging the destinations would be [1, 2, 3], and the edge set that represents the direct connections would be [(1, 2), (1, 3)]. Captain, could you kindly compute the density of this graph using the density function in order to assess how effectively the airline is utilizing its potential for direct connections? This information would be pivotal for optimizing routing and ensuring the most streamlined service for your passengers. Please print the resulting network density as a part of your report." }
往期精彩:
SynchroTrap-基于松散行为相似度的欺诈账户检测算法