[ "王一峰(1995- ),男,哈尔滨工业大学(深圳)理学院硕士生,主要研究方向为自然语言处理、计算机视觉、智能控制、机器人运动、惯性制导以及机器学习的数学原理" ]
[ "孙丽茹(1994- ),女,哈尔滨工业大学(深圳)理学院硕士生,主要研究方向为自然语言处理、教育大数据和机器学习中的聚类算法" ]
[ "崔良乐(1978- ),男,哈尔滨工业大学(深圳)理学院讲师,主要研究方向为西方美学、中国近现代思想文化传播、文化研究和与在线学习相关的教育大数据" ]
[ "赵毅(1977- ),男,博士,哈尔滨工业大学(深圳)理学院教授、博士生导师,哈尔滨工业大学(深圳)应用数学研究中心主任,主要研究方向为非线性时间序列分析、动力系统、复杂网络、生物数学和数据科学" ]
网络首发:2020-07,
纸质出版:2020-07-15
移动端阅览
王一峰, 孙丽茹, 崔良乐, 等. 适用于特殊类型自然语言分类的自适应特征谱神经网络[J]. 大数据, 2020,6(4):2020036-1.
Yifeng WANG, Liru SUN, Liangle CUI, et al. Adaptive feature spectrum neural networks for special types of natural language classification[J]. Big Data Research, 2020, 6(4): 2020036-1.
王一峰, 孙丽茹, 崔良乐, 等. 适用于特殊类型自然语言分类的自适应特征谱神经网络[J]. 大数据, 2020,6(4):2020036-1. DOI: 10.11959/j.issn.2096-0271.2020036.
Yifeng WANG, Liru SUN, Liangle CUI, et al. Adaptive feature spectrum neural networks for special types of natural language classification[J]. Big Data Research, 2020, 6(4): 2020036-1. DOI: 10.11959/j.issn.2096-0271.2020036.
计算机算力的提升使得深度学习算法迅速发展,然而由于古诗文特殊的语序、用词、结构、句式、文法结构、表达方式,深度学习模型需要消耗更多的算力进行特征提取等工作,因此并未在这一领域取得广泛的应用。为此,提出了一种新型的神经网络结构——自适应特征谱神经网络。该算法有效减少了运算时间,可以自适应地选择对分类最有用的特征,形成最高效的特征谱,得到的分类结果具有一定的可解释性,而且由于其运行速度快、内存占用小,因此非常适用于学习辅助软件等方面。以此算法为基础,开发了相应的个性化学习平台。该算法使古诗文分类的准确率由93.84%提升到了99%。
The improvement of computer computing power has led to the rapid development of deep learning algorithms.However
due to the special word order
wording
structure
sentence structure
grammatical structure
and expression of ancient poetry
deep learning models need to consume more computing power for feature extraction
etc.Therefore
it has not been widely used in this field.As a result
a new kind neural network:the adaptive feature spectrum neural network was proposed
which can considerably reduce the computation and adaptively select the features that are the most useful for classification in order to form the most efficient feature spectrum.The classification results obtained have certain interpretability.Moreover
its fast running speed and lower RAM consumption make it very suitable for learning aids software
and other fields.Based on this algorithm
a corresponding personalized learning platform was developed.This algorithm improves the classification accuracy of ancient Chinese poetry from 93.84% to 99%.
李钝 , 曹付元 , 曹元大 , 等 . 基于短语模式的文本情感分类研究 [J ] . 计算机科学 , 2008 , 35 ( 4 ): 132 - 134 .
LI D , CAO F Y , CAO Y D , et al . Text sentiment classification based on phrase patterns [J ] . Computer Science , 2008 , 35 ( 4 ): 132 - 134 .
胡熠 , 陆汝占 , 李学宁 , 等 . 基于语言建模的文本情感分类研究 [J ] . 计算机研究与发展 , 2007 , 44 ( 9 ): 1469 - 1475 .
HU Y , LU R Z , LI X N , et al . Research on language modeling based sentiment classification of text [J ] . Journal of Computer Research and Development , 2007 , 44 ( 9 ): 1469 - 1475 .
沈加 . 基于SVM模型的新闻分类系统设计与实现 [D ] . 成都:电子科技大学 , 2013 .
SHEN J . The design and realization of webnews classification system based on SVM [D ] . Chengdu:University of Electronic Science and Technology of China , 2013 .
潘澄 . 基于领域向量模型的新闻网页分类算法 [J ] . 软件导刊 , 2015 ( 7 ): 57 - 60 .
PAN C . News web classification algorithm based on domain vector model [J ] . Software Guide , 2015 ( 7 ): 57 - 60 .
SEBASTIANI F . Machine learning in automated text categorization [J ] . ACM Computing Surveys , 2002 , 34 ( 1 ): 1 - 47 .
郝春风 , 王忠民 . 一种用于大规模文本分类的特征表示方法 [J ] . 计算机工程与应用 , 2006 , 43 ( 15 ): 170 - 172 .
HAO C F , WANG Z M . Method of expressing features used for large-scale text classification [J ] . Computer Engineering and Applications , 2006 , 43 ( 15 ): 170 - 172 .
吴春龙 , 周昌乐 . 基于频繁关键字共现的诗词风格分类模型研究 [J ] . 厦门大学学报(自然科学版) , 2008 , 47 ( 1 ): 41 - 44 .
WU C L , ZHOU C L . Frequent keyword concurrence-based vector space modelfor Chinese poetry style analysis [J ] . Journal of Xiamen University (Natural Science) , 2008 , 47 ( 1 ): 41 - 44 .
孙晋文 , 肖建国 . 基于SVM的中文文本分类反馈学习技术的研究 [J ] . 控制与决策 , 2004 , 19 ( 8 ): 927 - 930 .
SUN J W , XIAO J G . Study on feedback learning of SVM-based Chinese text classification [J ] . Control and Decision , 2004 , 19 ( 8 ): 927 - 930 .
JOACHIMS T , . Text categorization with support vector machines:learning with many relevant features [C ] // European Conference on Machine Learning . Heidelberg:Springer , 1999 : 137 - 142 .
KIM H , HOWLAND P , PARK H , et al . Dimension reduction in text classification with support vector machines [J ] . Journal of Machine Learning Research , 2005 , 6 ( 1 ): 37 - 53 .
易勇 , 何中市 , 李良炎 , 等 . 基于遗传算法改进诗词风格判别的研究 [J ] . 计算机科学 , 2005 , 32 ( 7 ): 156 - 158 .
YI Y , HE Z S , LI L Y , et al . A traditional Chinese poetry style identification calculation improvement model [J ] . Computer Science , 2005 , 32 ( 7 ): 156 - 158 .
黄永锋 , 李奇 . 基于特征项聚合的古典诗歌分类模型 [J ] . 东华大学学报(自然科学版) , 2014 , 40 ( 5 ): 599 - 604 .
HUANG Y F , LI Q . Classical poetry classification model based on feature terms clustered [J ] . Journal of Donghua University (Natural Science Edition) , 2014 , 40 ( 5 ): 599 - 604 .
周茜 , 赵明生 , 扈旻 . 中文文本分类中的特征选择研究 [J ] . 中文信息学报 , 2004 , 18 ( 3 ): 18 - 24 .
ZHOU Q , ZHAO M S , HU M . Study on feature selection in Chinese text categorization [J ] . Journal of Chinese Information Processing , 2004 , 18 ( 3 ): 18 - 24 .
代六玲 , 黄河燕 , 陈肇雄 . 中文文本分类中特征抽取方法的比较研究 [J ] . 中文信息学报 , 2004 , 18 ( 1 ): 26 - 32 .
DAI L L , HUANG H Y , CHEN Z X . A comparative study on feature selection in Chinese textcategorization [J ] . Journal of Chinese Information Processing , 2004 , 18 ( 1 ): 26 - 32 .
单丽莉 , 刘秉权 , 孙承杰 . 文本分类中特征选择方法的比较与改进 [J ] . 哈尔滨工业大学学报 , 2011 ( S1 ): 319 - 324 .
SHAN L L , LIU B Q , SUN C J . Comparison and improvement of feature selection method for text categorization [J ] . Journal of Harbin Institute of Technology , 2011 ( S1 ): 319 - 324 .
潘冠桦 , 张兴忠 . Sunday算法效率分析 [J ] . 计算机应用 , 2012 , 32 ( 11 ): 3082 - 3088 .
PAN G H , ZHANG X Z . Study on efficiency of Sunday algorithm [J ] . Journal of Computer Applications , 2012 , 32 ( 11 ): 3082 - 3088 .
黄承慧 , 印鉴 , 侯昉 . 一种结合词项语义信息和TF-IDF方法的文本相似度量方法 [J ] . 计算机学报 , 2011 ( 5 ): 856 - 864 .
HUANG C H , YIN J , HOU F . A text similarity measurement combining word semantic information with TF-IDF method [J ] . Chinese Journal of Computers , 2011 ( 5 ): 856 - 864 .
范珈瑜 . 基于文本挖掘的游客对古镇旅游态度的分析 [J ] . 大数据 , 2017 , 3 ( 6 ): 93 - 101 .
FAN J Y . Analysis of tourists’ attitude for ancient towns based on text mining [J ] . Big Data Research , 2017 , 3 ( 6 ): 93 - 101 .
ZHANG X , ZHAO J , LECUN Y . Characterlevel convolutional networks for text classification [C ] // The 28th International Conference on Neural Information Processing Systems . Cambridge:MIT Press , 2015 : 649 - 657 .
DEVLIN J , CHANG M W , LEE K , et al . Bert:pre-training of deep bidirectional transformers for language understanding [J ] . Computer Science , 2018 ,arXiv:1810.04805.
程学旗 , 兰艳艳 . 网络大数据的文本内容分析 [J ] . 大数据 , 2015 , 1 ( 3 ): 62 - 71 .
CHENG X Q , LAN Y Y . Text content analysis for web big data [J ] . Big Data Research , 2015 , 1 ( 3 ): 62 - 71 .
宋云生 . 一种情感判别分析体系在汽车品牌舆情管理中的应用 [J ] . 大数据 , 2017 , 3 ( 6 ): 55 - 64 .
SONG Y S . Application of an emotiondiscriminant analysis system in the management of automobile brand [J ] . Big Data Research , 2017 , 3 ( 6 ): 55 - 64 .
吴毅坚 , 陈士壮 , 葛佳丽 , 等 . 数据自治开放的软件开发和运行环境 [J ] . 大数据 , 2018 , 4 ( 2 ): 31 - 41 .
WU Y J , CHEN S Z , GE J L , et al . Software development and runtime environment for self-governing openness of data [J ] . Big Data Research , 2018 , 4 ( 2 ): 31 - 41 .
袁书寒 , 向阳 , 鄂世嘉 . 基于特征学习的文本大数据内容理解及其发展趋势 [J ] . 大数据 , 2015 , 1 ( 3 ): 72 - 81 .
YUAN S H , XIANG Y , E S J . Text big data content understanding and development trend based on feature learning [J ] . Big Data Research , 2015 , 1 ( 3 ): 72 - 81 .
0
浏览量
435
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构
京公网安备11010802024621