栅格数据用 GPU 计算

学术 2025-01-06 21:27 北京

前几天做了一下土地覆被二级分类转一级分类，操作就是把二级类的像元值的十位数提取出来，作为一级类的数值，这个可以在QGIS中完成，但是转换后的数据数据量极大，在ChatGPT的帮助下，我做了个Python GPU的计算，计算结果非常理想，计算速度快，数据量很小。

QGIS重分类

QGIS的重分类是我经常使用的工具，可以完成土地覆被重分类工作，但是经实验发现，QGIS转换速度很慢，转换后的文件数据量巨大，全国30米土地覆被一幅高大20GB，转换效果差。

Python设置代码如下：

processing.run("native:reclassifybytable", {'INPUT_RASTER':'G:/Geodata/CASLUCC/1980s.tif','RASTER_BAND':1,'TABLE':['10','12','1','20','24','2','30','33','3','40','46','4','50','53','5','60','67','6','90','99','9'],'NO_DATA':-9999,'RANGE_BOUNDARIES':0,'NODATA_FOR_MISSING':False,'DATA_TYPE':0,'OUTPUT':'I:/LUCC_CAS/L1_1980s.tif'})

Python CPU栅格计算

我用的服务器是至强E5双路CPU，256GB内存，运行下面的程序，居然显示内存不足了，ChatGPT表示并行计算建议将分块设置的小一点，下面代码默认是1024，我估计256可能会好一些吧，计算速度极慢，等了好半天，不想等了，突然想起GPU也是可以计算的嘛，干嘛不试试GPU计算，于是就有了后面第三部分的代码。

import os
import numpy as np
import rasterio
from concurrent.futures import ProcessPoolExecutor
from tqdm import tqdm  # 导入tqdm库

# Define input and output directories
input_dir = r'G:\Geodata\CASLUCC'
output_dir = r'I:\LUCC_CAS'

# Create output directory if it doesn't exist
os.makedirs(output_dir, exist_ok=True)

# Get list of all .tif files in the input directory
tif_files = [f for f in os.listdir(input_dir) if f.endswith('.tif')]

# Function to process each raster file
def process_raster(input_path, output_path):
    with rasterio.open(input_path) as src:
        # Read metadata
        meta = src.meta.copy()

        # Initialize the processed data array
        processed_data = np.zeros((src.height, src.width), dtype=np.uint8)
        
        # Define block size (you can adjust this based on available memory)
        block_size = 1024  # 建议改256，减小块减小内存使用
        
        # Calculate total number of blocks
        num_blocks_x = (src.width + block_size - 1) // block_size  # Horizontal blocks
        num_blocks_y = (src.height + block_size - 1) // block_size  # Vertical blocks

        # Process the raster in blocks to reduce memory usage
        for i in tqdm(range(0, src.height, block_size), desc=f'Processing {os.path.basename(input_path)}', total=num_blocks_y):
            for j in range(0, src.width, block_size):
                window = rasterio.windows.Window(j, i, block_size, block_size)
                data = src.read(1, window=window)
                
                # Create a mask for valid data (values between 11 and 99)
                mask = (data >= 11) & (data <= 99)

                # Process data (compute tens digit)
                processed_data[i:i+block_size, j:j+block_size] = np.where(mask, data // 10, 0)

        # Update metadata for output
        meta.update(dtype=rasterio.uint8, compress='LZW', tiled=True, bigtiff='YES')

        # Write the processed data to the output file with high compression
        with rasterio.open(output_path, 'w', **meta) as dst:
            dst.write(processed_data, 1)

# Use ProcessPoolExecutor to process each .tif file in parallel
def parallel_processing():
    with ProcessPoolExecutor() as executor:
        # Map the process_raster function to each file in the list, and track progress with tqdm
        futures = [
            executor.submit(process_raster, os.path.join(input_dir, tif_file), os.path.join(output_dir, tif_file))
            for tif_file in tif_files
        ]
        
        # Use tqdm to track the overall progress of the futures
        for future in tqdm(futures, desc="Overall progress", total=len(futures)):
            future.result()  # This will raise exceptions if there were any errors during processing

# Run the parallel processing
if __name__ == "__main__":
    parallel_processing()
    print("All files processed successfully.")

Python GPU栅格计算

这个效果最好，推荐使用，我是用的RTX3060显卡，12GB显存，计算速度非常快，一幅全国30米的影像2分半就可以算完，效率很高，而且处理结果数据量很小，只要400MB，非常理想。

要说唯一的不足，就是这个GPU计算的环境配置非常麻烦。两点要注意：

CUDA环境不要安装太高版本的，11.3版本比较好
使用conda forge来安装cupy,参考参考文献1

Conda安装cupy代码：

conda install -c conda-forge cupy

GPU栅格计算代码：

分块处理，block大小为1024
输出数据使用LZW压缩，INT8，NODATA值设为0

# -*- coding: utf-8 -*-
"""
Created on Mon Dec 30 22:06:12 2024

@author: Xu Yang
"""

import os
import cupy as cp
import rasterio
from tqdm import tqdm
import numpy as np

# Define input and output directories定义输入和输出路径
input_dir = r'G:\Geodata\CASLUCC'
output_dir = r'I:\LUCC_CAS'

# Create output directory if it doesn't exist
os.makedirs(output_dir, exist_ok=True)

# Get list of all .tif files in the input directory获取所有的tif文件
tif_files = [f for f in os.listdir(input_dir) if f.endswith('.tif')]

tif_files = tif_files[6:10]

# Function to process each raster file using GPU (with memory-efficient approach)
def process_raster(input_path, output_path):
    with rasterio.open(input_path) as src:
        # Read metadata
        meta = src.meta.copy()

        # Initialize the processed data array (on CPU)
        processed_data = np.zeros((src.height, src.width), dtype=np.uint8)
        
        # Define block size (you can adjust this based on available memory)
        block_size = 1024  # You can modify this to a smaller block size if needed
        
        # Calculate total number of blocks
        num_blocks_x = (src.width + block_size - 1) // block_size  # Horizontal blocks
        num_blocks_y = (src.height + block_size - 1) // block_size  # Vertical blocks

        # Process the raster in blocks to reduce memory usage
        for i in tqdm(range(0, src.height, block_size), desc=f'Processing {os.path.basename(input_path)}', total=num_blocks_y):
            for j in range(0, src.width, block_size):
                window = rasterio.windows.Window(j, i, block_size, block_size)
                data = src.read(1, window=window)

                # Transfer data to GPU using CuPy
                data_gpu = cp.asarray(data)  # Transfer data to GPU using CuPy

                # Create a mask for valid data (values between 11 and 99)
                mask = (data_gpu >= 11) & (data_gpu <= 99)

                # Compute the tens digit (L1 level) on the GPU
                processed_data_gpu = cp.zeros_like(data_gpu, dtype=cp.uint8)
                processed_data_gpu[mask] = data_gpu[mask] // 10

                # Transfer the result back to CPU (host memory) for writing
                processed_data[i:i+block_size, j:j+block_size] = cp.asnumpy(processed_data_gpu)  # Transfer to NumPy array

        # Update metadata for output
        meta.update(dtype=rasterio.uint8, nodata = 0, compress='LZW', tiled=True, bigtiff='YES')

        # Write the processed data to the output file with high compression
        with rasterio.open(output_path, 'w', **meta) as dst:
            dst.write(processed_data, 1)

# Use tqdm to show progress during parallel processing显示进度条
def parallel_processing():
    for tif_file in tqdm(tif_files, desc="Processing files"):
        input_path = os.path.join(input_dir, tif_file)
        output_path = os.path.join(output_dir, tif_file)
        process_raster(input_path, output_path)

# Run the parallel processing并行处理
if __name__ == "__main__":
    parallel_processing()
    print("All files processed successfully.")

GPU计算，基本上两分半就能把一幅全国30米的土地覆被数据计算完，而且内存使用率不高

参考文献

https://docs.cupy.dev/en/stable/install.html

锐多宝

锐多宝的地理空间原名:遥感之家

最新文章

集合卡尔曼滤波ENKF的学习笔记

遥感科研助理招聘

招收客座学生

栅格数据用 GPU 计算

遥感影像的左上角，是一个锚点，是一个区域。

遥感卫星影像和产品免费下载软件

高分二号标签样本库（GF2-Label）上线啦！

高光谱卫星EnMAP数据介绍及下载

重磅！遥感论文-SCI一区TOP，再次刷新顶刊巅峰..

全球每日无缝数据正式上线（30m）

USGS2024年盘点：Landsat精彩瞬间

地理信息科学的学生们，毕业后，一人公司等你来操刀

欧空局：2024年遥感领域重大事件

MEIC团队今日发布全球高分辨率碳排放数据集

全国10年路网、建筑物等OSM数据下载与分享

开源卫星展示杜鲁门号航母近10天航行轨迹

GEE+本地XGboost分类

宝藏级地图引擎！最低99元，买就送项目源码！

使用ERA5数据绘制风向玫瑰图的简易流程

EO速报 | 欧空局发布Sentinel-1C卫星首批图像

打破记录！真心建议各位遥感研究生读一下《Nature》这篇文章

BIL、BIP、BSQ都知道，二进制文件会读取吗

nc数据补值

GEE细碎化耕地提取

一键制作四色定理地图

中国区域2000-2022年高时空分辨率（500m）SIF数据集

大语言模型在教育中的双刃剑

AI学会大气动力学了吗？一个气候变化的角度

ArcGIS Pro 批量出图教程

24年最新影像？点位高频更新？吉林一号网APP更新内测体验分享

开放近百项空天信息技术服务星图云开放平台全新发布

用Python实现地理探测器

SCI遥感一区TOP，再次刷新顶刊巅峰！

Capella Space SAR 开放数据下载方法

CTAmap version 1.2.0版本（更新2009年-2024年初）

2024年初省市县行政区划数据

2009年初省市县数据(截止2008.12.31)

2009年-2024年CTAmap类型数量检查

2013年初省市县行政区划数据

2012年初省市县行政区划数据

2011年初省市县行政区划数据

2010年初省市县行政区划数据

环境与生态领域的高被引科学家

NASA与微软联合推出“Earth Copilot”

我国商业光学遥感卫星再获突破(0.2米)

GIS最好的十年已经结束了～

GEE下载ERA5-Land气象数据（1950-至今，降水、温度）

批量对齐栅格数据代码

适合记者的几款地图工具

目前国内最好的卫星数据采购平台

分类

时事

民生

政务

教育

文化

科技

财富

体娱

健康

情感

旅行

百科

职场

楼市

企业

乐活

学术

汽车

时尚

创业

美食

幽默

美体

文摘

原创标签

时事社会财经军事教育体育科技汽车科学房产搞笑综艺明星音乐动漫游戏时尚健康旅游美食生活摄影宠物职场育儿情感小说曲艺文化历史三农文学娱乐电影视频图片新闻宗教电视剧纪录片广告创意壁纸头像心灵鸡汤星座命理教育培训艺术文化金融财经健康医疗美妆时尚餐饮美食母婴育儿社会新闻工业农业时事政治星座占卜幽默笑话独立短篇连载作品文化历史科技互联网

发布位置

广东北京山东江苏河南浙江山西福建河北上海四川陕西湖南安徽湖北内蒙古江西云南广西甘肃辽宁黑龙江贵州新疆重庆吉林天津海南青海宁夏西藏香港澳门台湾美国加拿大澳大利亚日本新加坡英国西班牙新西兰韩国泰国法国德国意大利缅甸菲律宾马来西亚越南荷兰柬埔寨俄罗斯巴西智利卢森堡芬兰瑞典比利时瑞士土耳其斐济挪威朝鲜尼日利亚阿根廷匈牙利爱尔兰印度老挝葡萄牙乌克兰印度尼西亚哈萨克斯坦塔吉克斯坦希腊南非蒙古奥地利肯尼亚加纳丹麦津巴布韦埃及坦桑尼亚捷克阿联酋安哥拉