Combined approach of Word2Vec and 2-grams for the retrieval of semantically related real estate ads

Authors

  • José Federico Medrano Universidad Nacional de Jujuy - Argentina

DOI:

https://doi.org/10.33414/rtyc.39.195-206.2020

Keywords:

Semantic search, Word2Vec, Natural Language Processing, Text Mining

Abstract

The publication of real estate ads has become the preferred advertising medium for both individuals and real estate companies. This has caused a significant growth in the number of ads, making it difficult to search for a suitable property, much more so if the search is in a big city. This work proposes an approach based on text mining and natural language processing techniques for the recovery of semantically related classified ads. For this purpose, the ads published by the lavoz.com.ar website were collected, through a scraper. The title and description of these ads were used to form a textual corpus modeled using Word2Vec, evaluating the similarity using Word Mover's Distance. The use of 2-grams compared to other term grouping schemes offered the best results comparing the results with syntactic searches.

Downloads

Download data is not yet available.

Published

2020-12-03

How to Cite

Medrano, J. F. (2020). Combined approach of Word2Vec and 2-grams for the retrieval of semantically related real estate ads. Technology and Science Magazine, (39), 195–206. https://doi.org/10.33414/rtyc.39.195-206.2020