Combined approach of Word2Vec and 2-grams for the retrieval of semantically related real estate ads
DOI:
https://doi.org/10.33414/rtyc.39.195-206.2020Keywords:
Semantic search, Word2Vec, Natural Language Processing, Text MiningAbstract
The publication of real estate ads has become the preferred advertising medium for both individuals and real estate companies. This has caused a significant growth in the number of ads, making it difficult to search for a suitable property, much more so if the search is in a big city. This work proposes an approach based on text mining and natural language processing techniques for the recovery of semantically related classified ads. For this purpose, the ads published by the lavoz.com.ar website were collected, through a scraper. The title and description of these ads were used to form a textual corpus modeled using Word2Vec, evaluating the similarity using Word Mover's Distance. The use of 2-grams compared to other term grouping schemes offered the best results comparing the results with syntactic searches.