[ "周文桦(1997- ),男,浙江师范大学数学与计算机科学学院硕士生,主要研究方向为信息检索、哈希学习" ]
[ "刘华文(1977- ),男,博士,浙江师范大学数学与计算机科学学院教授,主要研究方向为数据挖掘" ]
[ "李恩慧(1996- ),女,浙江师范大学数学与计算机科学学院硕士生,主要研究方向为聚类、异常点分析" ]
网络首发:2021-11,
纸质出版:2021-11-15
移动端阅览
周文桦, 刘华文, 李恩慧. 基于特征选择的局部敏感哈希位选择算法[J]. 大数据, 2021,7(6):67-77.
Wenhua ZHOU, Huawen LIU, Enhui LI. Algorithm of locality sensitive hashing bit selection based on feature selection[J]. Big data research, 2021, 7(6): 67-77.
周文桦, 刘华文, 李恩慧. 基于特征选择的局部敏感哈希位选择算法[J]. 大数据, 2021,7(6):67-77. DOI: 10.11959/j.issn.2096-0271.2021061.
Wenhua ZHOU, Huawen LIU, Enhui LI. Algorithm of locality sensitive hashing bit selection based on feature selection[J]. Big data research, 2021, 7(6): 67-77. DOI: 10.11959/j.issn.2096-0271.2021061.
作为主流的信息检索方法,局部敏感哈希往往需要生成较长的哈希码才能达到检索要求。然而,长哈希码需要消耗巨大的存储空间且携带大量的冗余哈希位。为了解决此问题,采用特征工程中10种简单高效的选择算法从长局部敏感哈希码中选择信息量丰富的哈希位,去除冗余、无效的哈希位。这10种选择算法使用不同的方式来刻画每一个哈希位的性能或两个哈希位之间的相关性,如方差、汉明距离等。通过去除长哈希码中性能较差或具有高相关性的哈希位进行哈希位的选择。将选择后的哈希码与原哈希码的性能进行比较。在4个常用数据集上的实验结果表明,去除冗余哈希位后的哈希码与原哈希码的性能几乎相同,且其哈希位的去除比率能达到30%~70%。
Locality sensitive hashing is one of the most popular information retrieval methods
which needs to generate long hashing bits to meet the retrieval requirement.However
a long hashing bits requires huge storage space
and contains plenty of redundant hashing bits.In order to solve this problem
ten simple and efficient selection algorithms in feature engineering were adopted to extract the hashing bits which carry the largest amount of information from the long hashing bits which were generated by locality sensitive hashing
and the redundant and useless hash bits were removed.Those ten algorithms tried to capture the performance of each hashing bit or the correlation among bits
such as variance and hamming distance.During selection process
the useless or high-correlated hashing bits were removed.Then the selected hashing bits were compared with the original long hashing bits.The experimental results on four common datasets show that the selected hashing bits works as well as the original hashing bits
and their reduction ratio can reach from 30% to 70%.
KALANTIDIS Y , AVRITHIS Y . Locally optimized product quantization for approximate nearest neighbor search [C ] // Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition . Piscataway:IEEE Press , 2014 : 2329 - 2336 .
ANDONI A , INDYK P . Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions [C ] // Proceedings of the 2006 47th Annual IEEE Symposium on Foundations of Computer Science . Piscataway:IEEE Press , 2006 : 459 - 468 .
BENTLEY J L . Multidimensional binary search trees used for associative searching [J ] . Communications of the ACM , 1975 , 18 ( 9 ): 509 - 517 .
SILPA-ANAN C , HARTLEY R . Optimised KD-trees for fast image descriptor matching [C ] // Proceedings of the 2008 IEEE Conference on Computer Vision and Pattern Recognition . Piscataway:IEEE Press , 2008 : 1 - 8 .
WEISS Y , ANTONIO T , ROB F . Spectral hashing [C ] // Advances in Neural Information Processing Systems 21 . Cambridge:The MIT Press , 2008 : 1753 - 1760 .
GIONIS A , INDYK P , MOTWANI R . Similarity search in high dimensions via hashing [C ] // Proceedings of the 25th International Conference on Very Large Data Bases . California:Morgan Kaufmann Publishers , 1999 : 518 - 529 .
YAN C G , GONG B , WEI Y X , et al . Deep multi-view enhancement hashing for image retrieval [J ] . IEEE Transactions on Pattern Analysis and Machine Intelligence , 2021 , 43 ( 4 ): 1445 - 1451 .
WANG J D , ZHANG T , SONG J K , et al . A survey on learning to hash [J ] . IEEE Transactions on Pattern Analysis and Machine Intelligence , 2018 , 40 ( 4 ): 769 - 790 .
WANG Z , DUAN L Y , YUAN J S , et al . To project more or to quantize more:minimizing reconstruction bias for learning compact binary codes [C ] // Proceedings of the 25th International Joint Conference on Artificial Intelligence . California:AAAI Press , 2016 : 2181 - 2188 .
GONG Y C , LAZEBNIK S , GORDO A , et al . Iterative quantization:a procrustean approach to learning binary codes for largescale image retrieval [J ] . IEEE Transactions on Pattern Analysis and Machine Intelligence , 2013 , 35 ( 12 ): 2916 - 2929 .
NOROUZI M , FLEET D J . Minimal Loss Hashing for Compact Binary Codes [C ] // Proceedings of the 28th International Conference on Machine Learning . Madison:Omnipress , 2011 : 353 - 360 .
LIU W , WANG J , JI R R , et al . Supervised hashing with kernels [C ] // Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition . Piscataway:IEEE Press , 2012 : 2074 - 2081 .
WANG J , KUMAR S , CHANG S F . Semisupervised hashing for large-scale search [J ] . IEEE Transactions on Pattern Analysis and Machine Intelligence , 2012 , 34 ( 12 ): 2393 - 2406 .
ZHAO F , HUANG Y Z , WANG L , et al . Deep semantic ranking based hashing for multi-label image retrieval [C ] // Proceedings of the 2015 IEEE Conference on Computer Vision and Patten Recognition . Piscataway:IEEE Press , 2015 : 1556 - 1564 .
WANG J , LIU W , SUN A X , et al . Learning hash codes with listwise supervision [C ] // Proceedings of the 2013 IEEE International Conference on Computer Vision . Piscataway:IEEE Press , 2013 : 3032 - 3039 .
LIU X L , HE J F , CHANG S F . Hash bit selection for nearest neighbor search [J ] . IEEE Transactions on Image Processing , 2017 , 26 ( 11 ): 5367 - 5380 .
LIU X L , HE J F , LANG B , et al . Hash bit selection:a unified solution for selection problems in hashing [C ] // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Piscataway:IEEE Press , 2013 : 1570 - 1577 .
ZHANG D C , LIU X L , LANG B . Hash bit selection using Markov process for approximate nearest neighbor search [C ] // Proceedings of the International Conference on Advances in Mobile Computing . New York:ACM Press , 2013 : 205 - 208 .
AL-TASHI Q , ABDULKADIR S J , RAIS H M , et al . Approaches to multi-objective feature selection:a systematic literature review [J ] . IEEE Access , 2020 , 8 : 125076 - 125096 .
CAI J , LUO J W , WANG S L , et al . Feature selection in machine learning:a new perspective [J ] . Neurocomputing , 2018 , 300 : 70 - 79 .
HEO J P , LEE Y , HE J F , et al . Spherical hashing [C ] // Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition . Piscataway:IEEE Press , 2012 : 2957 - 2964 .
SENLIOL B , GULGEZEN G , YU L , et al . Fast correlation based filter (FCBF) with a different search strategy [C ] // Proceedings of the 2008 23rd International Symposium on Computer and Information Sciences . Piscataway:IEEE Press , 2008 : 1 - 4 .
NGUYEN H V , BAI L . Cosine similarity metric learning for face verification [C ] // Proceedings of the Asian Conference on Computer Vision . Heidelberg:Springer , 2010 : 709 - 720 .
GIERLICHS B , BATINA L , TUYLS P , et al . Mutual information analysis [C ] // Proceedings of the International Workshop on Cryptographic Hardware and Embedded Systems . Heidelberg:Springer , 2011 : 426 - 442 .
HE X F , CAI D , NIYOGI P . Laplacian score for feature selection [C ] // Advances in Neural Information Processing System .[S.l.:s.n ] , 2005 : 507 - 514 .
ZHENG K F , WANG X J . Feature selection method with joint maximal information entropy between features and class [J ] . Patten Recognition , 2018 , 77 : 20 - 29 .
ABOUELNAGA Y , ALI O S , RADY H , et al . CIFAR-10:KNN-based ensemble of classifiers [C ] // Proceedings of the 2016 International Conference on Computational Science and Computational Intelligence . Piscataway:IEEE Press , 2016 : 1192 - 1195 .
COHEN G , AFSHAR S , TAPSON J , et al . EMNIST:extending MNIST to handwritten letters [C ] // Proceedings of the 2017 International Joint Conference on Neural Networks . Piscataway:IEEE Press , 2017 : 2921 - 2926 .
RUSSELL B C , TORRALBA A , MURPHY K P , et al . LabelMe:a database and web-based tool for image annotation [J ] . International Journal of Computer Vision , 2008 , 77 : 157 - 173 .
TANG J Y , LEWIS P H . A study of quality issues for image-annotation with the Corel dataset [J ] . IEEE Transactions on Circuits and Systems for Video Technology , 2007 , 17 ( 3 ): 384 - 389 .
0
浏览量
679
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构
京公网安备11010802024621