项目地址:
https://github.com/baidu/lac
安装:
建议使用如下命令安装:
python38 -m pip install -U lac
否则会出一些问题
测试代码:
#coding:utf-8
#python38
from LAC import LAC
def deal_Tags_ByLAC(words):
#设置模式
lac = LAC(mode='lac')
#加载自定义字典
lac.load_customization("online_sec-dict.txt")
lac_result = lac.run(words)
if len(lac_result[1])<2 and lac_result[1][0].find('n')>=0:
print (words,'---|-->',lac_result[0],'---|-->',lac_result[1])
def main():
mstr = ['信用卡', '信息安全','恶意软件', '软件', 'python软件库现恶意软件能盗取用户信用卡信息', 'pypi', '安全', '信息', 'python', '盗取', '库现']
for i in mstr:
deal_Tags_ByLAC(i)
if __name__ == '__main__':
main()
结果
(耗时是比较的长,因为当时主要是为了分析词性,找到了LAC,后来发现jieba自带可以输出词性,后面还是使用了jieba分词)
以下为使用过程中的弯路
安装paddlepaddle(飞桨)
安装paddle
python37 -m pip install common -i https://pypi.tuna.tsinghua.edu.cn/simple
python37 -m pip install dual -i https://pypi.tuna.tsinghua.edu.cn/simple
python37 -m pip install data -i https://pypi.tuna.tsinghua.edu.cn/simple
python37 -m pip install prox -i https://pypi.tuna.tsinghua.edu.cn/simple
python37 -m pip install tight -i https://pypi.tuna.tsinghua.edu.cn/simple
python37 -m pip install paddle -i https://pypi.tuna.tsinghua.edu.cn/simple
安装paddlepaddle
python37 -m pip install paddlepaddle -i https://pypi.tuna.tsinghua.edu.cn/simple
python37 -m pip install paddlepaddle
https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/install/pip/windows-pip.html
安装后在使用中,出现以下报错:
关于AVX:
https://post.smzdm.com/p/a4wm098x/
发现现用CPU并支持AVX指令集,后续需要下载noavx版本,但对python版本有要求
pythom3.8
https://www.python.org/downloads/release/python-3810/
下载noavx版本:
https://github.com/PaddlePaddle/Paddle/issues/31165
https://www.paddlepaddle.org.cn/whl/stable/noavx.html
折腾半天,卸载lac,重新使用命令:
python38 -m pip install -U lac
安装之后,测试正常:
#coding:utf-8
#python38
from LAC import LAC
import paddle as paddle
import paddle.fluid as fluid
import numpy as np
def main():
lac = LAC(mode='seg')
mstr = 'LAC是个优秀的分词工具'
seg_result = lac.run(mstr)
print (seg_result)
if __name__ == '__main__':
main()
但是安装过程中paddle模块报错众多,估计后期使用中会有问题:
ERROR: Command errored out with exit status 1:
command: 'C:\Python38\python38.exe' -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\\Users\\***\\AppData\\Local\\Temp\\pip-install-kysjemv2\\lac_30ef2c595fae4909acd1bd5c95509f47\\setup.py'"'"'; __file__='"'"'C:\\Users\\***\\AppData\\Local\\Temp\\pip-install-kysjemv2\\lac_30ef2c595fae4909acd1bd5c95509f47\\setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base 'C:\Users\****\AppData\Local\Temp\pip-pip-egg-info-scy5seug'
cwd: C:\Users\**\AppData\Local\Temp\pip-install-kysjemv2\lac_30ef2c595fae4909acd1bd5c95509f47\
Complete output (5 lines):
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "C:\Users\***\AppData\Local\Temp\pip-install-kysjemv2\lac_30ef2c595fae4909acd1bd5c95509f47\setup.py", line 33, in <module>
if paddle.__version__ < '1.6.0':
AttributeError: module 'paddle' has no attribute '__version__'