A TAXONOMY FOR TEXT SUMMARIZATION

OTHMAN B.M.M.1*, HAGGAG M.2, BELAL M.3
1Institute of Statistical Studies and Research, Cairo University, Giza, Egypt.
2Department of Computer Sciences, Faculty of Computer Sciences and Information Systems, Helwan University, Cairo, Egypt.
3Department of Computer Sciences, Faculty of Computer Sciences and Information Systems, Helwan University, Cairo, Egypt.
* Corresponding Author : b.m.m.othman@gmail.com

Received : 03-11-2014     Accepted : 20-03-2014     Published : 31-03-2014
Volume : 3     Issue : 1       Pages : 43 - 50
Inform Sci Tech 3.1 (2014):43-50

Keywords : Text summarization, survey, taxonomy
Conflict of Interest : None declared

Cite - MLA : OTHMAN B.M.M., et al "A TAXONOMY FOR TEXT SUMMARIZATION." Information Science and Technology 3.1 (2014):43-50.

Cite - APA : OTHMAN B.M.M., HAGGAG M., BELAL M. (2014). A TAXONOMY FOR TEXT SUMMARIZATION. Information Science and Technology, 3 (1), 43-50.

Cite - Chicago : OTHMAN B.M.M., HAGGAG M., and BELAL M. "A TAXONOMY FOR TEXT SUMMARIZATION." Information Science and Technology 3, no. 1 (2014):43-50.

Copyright : © 2014, OTHMAN B.M.M., et al, Published by Bioinfo Publications. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution and reproduction in any medium, provided the original author and source are credited.

Abstract

Text summarization is the branch of NLP where a computer summarizes a text. A text is entered into the computer, a specific technique is applied, and then a summarized text is returned. This summary should be a non-redundant extract from the original text. There are many categories for summarization: single document, multi-document, extractive, abstractive, informative, indicative, user- focused, generic, statistical, linguistic, and machine learning approach based. However, most of the surveys that concerned with text summarization was covering a specific perspective of the field and didn’t clearly illustrate the whole picture of the state- of- the art they covered; the purpose of this survey is to clearly illustrate the whole picture of the previous work in the field of text summarization introducing a general taxonomy that covers all possible aspects of categorizing the text summarization field with clear comparison based on all aspects and features that the text summarization field could have. In this paper all approaches for single document and multi-document summarization, extractive and abstractive summary construction methods, and informative and indicative information content will be introduced. Additionally, query-based and generic summary triggers, statistical, linguistic and machine learning methods for choosing the most relevant sentences from documents had been explored. All these approaches will be introduced and discussed.

References

[1] Lloret E., Ferrández O., Munoz R. & Palomar M. (2008) NLPCS 22-31.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[2] Gupta V. & Lehal G.S. (2010) Journal of Emerging Technolo-gies in Web Intelligence, 2(3), 258-268  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[3] Israel Q.L., Han H. & Song I.Y. (2010) Journal of Computing Sciences in Colleges, 25(5), 10-20  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[4] Damova M. & Koychev I. (2010) Proc. of Int. Conference S3T'10 Track Intelligent Content and Semantic, Varna, 11-12.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[5] Das D. & Martins A.F. (2007) Literature Survey for the Lan-guage and Statistics, II Course at CMU, 4, 192-195  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[6] Ding Y. (2004) A Survey on Multi-Document Summarization. Department of Computer and Information Science University of Pennsylvania  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[7] Sekine S. & Nobata C. (2003) Proceedings of the HLT-NAACL 03 on Text Summarization Workshop, Association for Computa-tional Linguistics, 5, 65-72  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[8] Radev D.R., Hovy E. & McKeown K. (2002) Computational linguistics, 28(4), 399-408  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[9] Feldman R., Aumann Y., Finkelstein-Landau M., Hurvitz E., Regev Y. & Yaroshevich A. (2002) Computational Linguistics and Intelligent Text Processing, Springer Berlin Heidelberg, 349-359.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[10] Agarwal N., Reddy R.S., Gvr K. & Rosé C.P. (2011) Association for Computational Linguistics: Human Language Technologies, 8.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[11] Galgani F., Compton P. & Hoffmann A. (2012) Proceedings of the Workshop on Innovative Hybrid Approaches to the Pro-cessing of Textual Data, Association for Computational Linguis-tics, 115-123  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[12] Wan S., Paris C. & Dale R. (2009) Proceedings of the 9th ACM/IEEE-CS Joint Conference on Digital Libraries, 59-68  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[13] Madnani N., Zajic D., Dorr B., Ayan N.F. & Lin J. (2007) Pro-ceedings of Document Understanding Conference  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[14] Boudin F. & Moreno J.M.T. (2007) Computational Linguistics and Intelligent Text Processing, Springer Berlin Heidelberg, 551-562  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[15] Saggion H. & Lapalme G. (2002) Computational Linguistics, 28(4), 497-526  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[16] Saggion H. & Lapalme G. (2000) Proceedings of the NAACL-ANLP Workshop on Automatic Summarization, Association for Computational Linguistics, 1-10  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[17] Hovy E. & Lin C.Y. (1997) Automated Text Summarization in SUMMARIST  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[18] Kyoomarsi F., Khosravi H., Eslami E. & Khosravyan P. (2009) International Journal of Hybrid Information Technology, 2(2)  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[19] Kowsalya R., Priya R. & Nithiya P. (2011) International Journal of Computer Science Issues, 8(2).  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[20] Negi P.S., Rauthan M.M.S. & Dhami H.S. (2011) International Journal of Computer Applications, 21(10), 20-24.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[21] Kiyoumarsi F., Esfahani F.R. & Dehkordi P.K. (2011) Interna-tional Conference on Information Communication and Manage-ment, 16  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[22] Dehkordi P.K., Khosravi H. & Kumarci F. (2009) International Journal of Computing and ICT Research, 3(1), 57-64.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[23] Fattah M.A. & Ren F. (2008) Proceedings of World Academy of Science, Engineering and Technology, 27, 192-195.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[24] Hariharan S. (2010) International Journal of Computational Cognition, 8(4).  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[25] Hoang C.D.V. & Kan M.Y. (2010) Proceedings of the 23rd Inter-national Conference on Computational Linguistics, Association for Computational Linguistics, 427-435  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[26] Nishikawa H., Hasegawa T., Matsuo Y. & Kikui G. (2010) Pro-ceedings of the 23rd International Conference on Computation-al Linguistics, Association for Computational Linguistics, 910-918.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[27] Haghighi A. & Vanderwende L. (2009) Proceedings of Human Language Technologies: Annual Conference of the North Amer-ican Chapter of the Association for Computational Linguistics, 362-370.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[28] Gillick D. & Favre B. (2009) Proceedings of the Workshop on Integer Linear Programming for Natural Langauge Processing, Association for Computational Linguistics, 10-18  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[29] Sarkar K. (2009) International Journal of Recent Trends in Engi-neering, 1(1).  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[30] Nastase V. (2008) Proceedings of the Conference on Empirical Methods in Natural Language Processing, Association for Com-putational Linguistics, 763-772  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[31] Wong K.F., Wu M. & Li W. (2008) Proceedings of the 22nd International Conference on Computational Linguistics, Associ-ation for Computational Linguistics, 1, 985-992.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[32] Wang D., Li T., Zhu S. & Ding C. (2008) Proceedings of the 31st annual international ACM SIGIR Conference on Research and Development in Information Retrieval, 307-314  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[33] Wan X. & Yang J. (2008) Proceedings of the 31st Annual Inter-national ACM SIGIR Conference on Research and Develop-ment in Information Retrieval, 299-306  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[34] Hennig L. & Labor D.A.I. (2009) Recent Advances in Natural Language Processing  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[35] Yih W.T., Goodman J., Vanderwende L. & Suzuki H. (2007) International Joint Conference on Artificial Intelligence, 20  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[36] Nenkova A., Vanderwende L. & McKeown K. (2006) Proceed-ings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 573-580  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[37] Bollegala D., Okazaki N. & Ishizuka M. (2006) Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the ACL, 385-392, Sydney  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[38] Conroy J.M., Schlesinger J.D. & O'Leary D.P. (2006) Proceed-ings of the COLING/ACL, Association for Computational Lin-guistics, 152-159.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[39] Hachey B. & Grover C. (2005) Tenth International Conference on Artificial Intelligence and Law, Bologna, Italy.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[40] Seki Y. (2005) ACM SIGIR Forum, 39(1), 65-67.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[41] Sripada S., Kasturi V.G. & Parai G.K. (2005) Multi-document Extraction based Summarization, CS 224N, Final Project  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[42] Elhadad N. (2004) Proceedings of the 19th National Conference on Artifical Intelligence, 987-988.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[43] Jaoua M. & Hamadou A.B. (2003) Computational Linguistics and Intelligent Text Processing, Springer Berlin Heidelberg, 623-634  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[44] Schiffman B., Nenkova A. & McKeown K. (2002) Proceedings of the Second International Conference on Human Language Technology Research, 52-58  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[45] Nomoto T. & Matsumoto Y. (2001) Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 26-34  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[46] Barzilay R., Elhadad N. & McKeown K.R. (2001) Proceedings of the First International Conference on Human Language Tech-nology Research, Association for Computational Linguistics, 1-7.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[47] Jain H.J., Bewoor M.S. & Patil S.H. (2012) International Journal of Soft Computing and Engineering, 2(2), 301-304  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[48] Goldstein J., Mittal V., Carbonell J. & Kantrowitz M. (2000) Pro-ceedings of the NAACL-ANLP Workshop on Automatic Summa-rization, Association for Computational Linguistics, 4, 40-48  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[49] Berker M. (2011) Using Genetic Algorithms with Lexical Chains for Automatic Text Summarization, Doctoral dissertation, Bo-gaziçi University.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[50] Mitra M., Singhalz A. & Buckleyyy C. (1997) Compare, 22215, 26  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[51] Balabantaray R.C., Sahoo D.K., Sahoo B. & Swain M. (2012) International Journal of Computer Applications, 38(1), 10-14  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus