Typeerror: Doc2bow Expects An Array Of Unicode Tokens On Input, Not A Single String When Using Gensim.corpora.dictionary()
There is a dataframe like this: index terms 1345 ['jays', 'place', 'great', 'subway'] 1543 ['described', 'communicative', 'friendly'] 9874 ['great', 'sarah
Solution 1:
Each index needs to have its terms be in a sublist, all of which are nested within larger list.
theterms = [['jays', 'place', 'great', 'subway'],['described', 'communicative', 'friendly'], ['great', 'sarahs', 'apartament', 'back'],['great', 'sarahs', 'apartament', 'back']]
dictionary = corpora.Dictionary(theterms)
Solution 2:
First convert comments['terms']
using comments['terms'].tolist()
to a list and then run the corpora, it should work. You can do other preprocessing like stemming or stopwords removal etc. before creating your dictionary.
Post a Comment for "Typeerror: Doc2bow Expects An Array Of Unicode Tokens On Input, Not A Single String When Using Gensim.corpora.dictionary()"