|
4
|
品詞情報を除去して、単語ごとに集計する。(handの頻度は、1+1=2 になる)
| 順位 |
単語 |
頻度 |
| 1 |
he |
5 |
| 2 |
do |
3 |
| 2 |
have |
3 |
| 4 |
a_lot_of |
2 |
| 4 |
hand |
2 |
| 4 |
me |
2 |
| 4 |
money |
2 |
| 4 |
the |
2 |
| 9 |
but |
1 |
| 9 |
by |
1 |
| 9 |
key |
1 |
| 9 |
love |
1 |
| 9 |
not |
1 |
| 9 |
take |
1 |
▼上記方法にて、口語コーパス(1億8417万語)を処理すると、以下の口語頻度リストを作りだすことができます。
|
順位
|
単語
|
品詞
|
頻度
|
| 1 |
be |
動 |
10,472,701 |
| 2 |
you |
代 |
6,449,544 |
| 3 |
I |
代 |
6,262,167 |
| 4 |
the |
定 |
5,830,875 |
| 5 |
to |
前 |
4,535,231 |
| 6 |
a |
不 |
3,541,387 |
| 7 |
it |
代 |
3,268,246 |
| 8 |
not |
副 |
3,095,364 |
| 9 |
and |
接 |
3,078,084 |
| 10 |
do |
動 |
3,054,088 |
|
11 |
that |
代 |
2,775,180 |
| 12 |
have |
動 |
2,338,004 |
|
(中略)
|
88,131 (最終) |
rongeur sewin serous shanti |
名 名 形 名 |
10 10 10 10 |
▼上記方法にて、すべての文語コーパス(5億2315万語)を処理すると、以下の文語頻度リストを作りだすことができます。
|
順位 |
単語 |
品詞 |
頻度 |
| 1 |
the |
定 |
31,711,942 |
| 2 |
be |
動 |
19,002,491 |
| 3 |
of |
前 |
14,280,484 |
| 4 |
and |
接 |
13,583,089 |
| 5 |
to |
前 |
13,279,684 |
| 6 |
a |
不 |
11,608,071 |
| 7 |
in |
前 |
10,070,093 |
| 8 |
have |
動 |
6,005,598 |
| 9 |
that |
代 |
5,709,325 |
| 10 |
for |
前 |
5,027,641 |
| 11 |
it |
代 |
4,024,984 |
| (中略) |
252,785 (最終) |
wimpiness widdershins yellowwood
Yorkish |
名 名 名 形 |
10 10 10 10 |
|
|
5 |
1. 口語頻度リストと文語頻度リストをマージ(結合)して、 2. 品詞情報を除去して、
3. 単語ごとに頻度を集計し、 4. その頻度をキーにして、降順ソートを行う。
↓
総合頻度リスト
|
順位 |
単語 |
頻度 |
| 1 |
the |
37,542,817 |
| 2 |
be |
29,475,192 |
| 3 |
to |
17,814,915 |
| 4 |
and |
16,661,173 |
| 5 |
of |
16,239,976 |
| 6 |
a |
15,149,458 |
| 7 |
in |
11,892,667 |
| 8 |
I |
8,960,833 |
| 9 |
you |
8,845,425 |
| 10 |
that |
8,484,505 |
| 11 |
have |
8,343,602 |
| 12 |
it |
7,293,230 |
| 13 |
for |
6,415,933 |
| 14 |
not |
6,300,062 |
| 15 |
do |
5,273,575 |
| 16 |
on |
4,764,869 |
| 17 |
with |
4,677,971 |
| 18 |
he |
4,270,984 |
| 19 |
this |
4,102,635 |
| 20 |
say |
3,394,130 |
| |
(中略) |
|
| 995 |
request |
80,870 |
| 996 |
Saturday |
80,839 |
| 997 |
fill |
80,822 |
| 998 |
award |
80,820 |
| 999 |
cash |
80,804 |
| 1,000 |
particularly |
80,428 |
| 1,001 |
hundred |
80,330 |
| 1,002 |
ability |
80,222 |
| |
(中略) |
|
| 2,993 |
uh-huh |
20,896 |
| 2,994 |
rapid |
20,867 |
| 2,995 |
apparent |
20,864 |
| 2,996 |
academic |
20,855 |
| 2,997 |
efficient |
20,845 |
| 2,998 |
athlete |
20,825 |
| 2,999 |
registration |
20,817 |
| 3,000 |
impressive |
20,806 |
| |
(中略) |
|
| 4,000 |
happiness |
13,849 |
| |
(中略) |
|
| 5,000 |
allocate |
9,880 |
| |
(中略) |
|
| 6,005 |
rehabilitation |
7,374 |
| |
(中略) |
|
| 7,995 |
affirmative |
4,585 |
| |
(中略) |
|
| 8,001 |
extensively |
4,582 |
| |
(中略) |
|
| 9,986 |
Beethoven |
3,259 |
| |
(中略) |
|
| 10,002 |
glaze |
3,250 |
| |
(中略) |
|
| 15,003 |
presumption |
1,664 |
| (以下省略) |
|