wordcloud© Karobben

wordcloud

The color palette from the R package worldcloud2 is very awesome. But it has some bugs. I can not set the mask for the world cloud. In python, this package is much user-friendly.

To be notice, the mask picture is very important. You can only use the rgb format. The picture has “0, 0, 0” for the background, “255, 255, 255” for the background. rgbi format is not supported even if it is very similar to rgb.

from os import path
from PIL import Image
import numpy as np
import matplotlib.pyplot as plt

from wordcloud import WordCloud, STOPWORDS

## Read the whole text.
text = open('tmp.txt').read()

## read the mask image
## taken from
## http://www.stencilry.org/stencils/movies/alice%20in%20wonderland/255fk.jpg
alice_mask = np.array(Image.open("/home/ken/Downloads/cloud.png"))

stopwords = set(STOPWORDS)
stopwords.add("said")

wc = WordCloud(background_color="white",
max_words=512, mask=alice_mask,
max_font_size=10, # 根据你的图片大小定义
stopwords=stopwords)
## generate word cloud
wc.generate(text)

## store to file
wc.to_file("alice.png")

## show
plt.imshow(wc, interpolation='bilinear')
plt.axis("off")
plt.figure()
plt.imshow(alice_mask, cmap=plt.cm.gray, interpolation='bilinear')
plt.axis("off")
plt.show()


## remove punctuation
Special = "”"
TT = text.translate(str.maketrans(' ', ' ', string.punctuation))
TT = TT.translate(str.maketrans('', '', string.whitespace[1:]))
TT = TT.translate(str.maketrans('', '', string.digits))
TT = TT.lower()
TT = TT.split(" ")
TT = list(set(TT))
for i in string.ascii_lowercase:
try:
TT.remove(i)
except:
print(i)

f = open("list",'a')
for i in TT:
f.write(i+'\n')
f.close()

https://amueller.github.io/word_cloud/auto_examples/colored.html#colored-py

Author

Karobben

Posted on

2020-06-23

Updated on

2023-06-06

Licensed under

Comments