试图识别在线图片,又搜到了tesseract-ocr,
windows安装包下载:https://digi.bib.uni-mannheim.de/tesseract/
中文识别数据:
测试 :
import requests
requests.packages.urllib3.disable_warnings()
from PIL import Image
from io import BytesIO
import pytesseract
pytesseract.pytesseract.tesseract_cmd = r'D:\\Program Files (x86)\\Tesseract-OCR\\tesseract.exe'
headers={
'Connection':'close',
****************
}
# 指定图片的URL
image_url = "***********/2024/08/0-1723173075.png"
# 下载图片
response = requests.get(image_url,verify=False,headers=headers)
# 检查响应状态码
if response.status_code == 200:
# 检查Content-Type是否为图片
content_type = response.headers.get('Content-Type')
if 'image' in content_type:
try:
img = Image.open(BytesIO(response.content))
# 使用pytesseract识别图片中的文字
text = pytesseract.image_to_string(img, lang='chi_sim')
# 打印识别的文字
print(text.replace(' ',''))
except Exception as e:
print(f"图片处理失败: {e}")
else:
print(f"URL的内容类型不是图片: {content_type}")
else:
print(f"请求失败,状态码: {response.status_code}")
搞完发现之前搞过: