Skip to content
GitLab
Menu
Projects
Groups
Snippets
/
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
Mshs-Poitiers
FoReLLIS
amalgameClassifier
Commits
7cb92e6c
Commit
7cb92e6c
authored
Sep 07, 2021
by
Michael Nauge
Browse files
Create quantiAnalyse.ipynb
script pour des mesures quantitatives
parent
c076fcf9
Changes
1
Hide whitespace changes
Inline
Side-by-side
notebooks/quantiAnalyse.ipynb
0 → 100644
View file @
7cb92e6c
{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Amalgame (graphie)</th>\n",
" <th>Amalgame (phonie)</th>\n",
" <th>Mot 1</th>\n",
" <th>Mot 2</th>\n",
" <th>mot1clean</th>\n",
" <th>mot2clean</th>\n",
" <th>Amalgameclean</th>\n",
" <th>solvedBy</th>\n",
" <th>findedOverlap</th>\n",
" <th>deletedSegmentMot1</th>\n",
" <th>deletedSegmentMot2</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>acquihire</td>\n",
" <td>[ˌækwɪˈhaɪ‿ə]</td>\n",
" <td>[ˌækwɪˈzɪʃən]</td>\n",
" <td>[ˈhaɪ‿ə]</td>\n",
" <td>ækwɪzɪʃən</td>\n",
" <td>haɪə</td>\n",
" <td>ækwɪhaɪə</td>\n",
" <td>isSolveByClip1Concat</td>\n",
" <td>NaN</td>\n",
" <td>zɪʃən</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>ASCIIbetical</td>\n",
" <td>[ˌæskiˈbetɪkəl]</td>\n",
" <td>[ˈæski]</td>\n",
" <td>[ˌӕlfəˈbetɪkəl]</td>\n",
" <td>æski</td>\n",
" <td>ælfəbetɪkəl</td>\n",
" <td>æskibetɪkəl</td>\n",
" <td>isSolveByClip2Concat</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>ælfə</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>automagically / automagicly</td>\n",
" <td>[ˌɔːtəʊˈmædʒɪkli]</td>\n",
" <td>[ˌɔːtəˈmӕtɪkli]</td>\n",
" <td>[ˈmӕdʒɪkli]</td>\n",
" <td>ɔtəmætɪkli</td>\n",
" <td>mædʒɪkli</td>\n",
" <td>ɔtəʊmædʒɪkli</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>avatard</td>\n",
" <td>[ˈævətɑːd]</td>\n",
" <td>[ˈævətɑː]</td>\n",
" <td>[ˈtɑːd]</td>\n",
" <td>ævətɑ</td>\n",
" <td>tɑd</td>\n",
" <td>ævətɑd</td>\n",
" <td>isSolveByOverlap</td>\n",
" <td>tɑ</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>babelicious</td>\n",
" <td>[ˌbeɪbəˈlɪʃəs]</td>\n",
" <td>[beɪb]</td>\n",
" <td>[diˈlɪʃəs] / [dəˈlɪʃəs]</td>\n",
" <td>beɪb</td>\n",
" <td>dilɪʃəs</td>\n",
" <td>beɪbəlɪʃəs</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>97</th>\n",
" <td>webinar</td>\n",
" <td>[ˈwebɪnɑː]</td>\n",
" <td>[web]</td>\n",
" <td>[ˈsemɪnɑː]</td>\n",
" <td>web</td>\n",
" <td>semɪnɑ</td>\n",
" <td>webɪnɑ</td>\n",
" <td>isSolveByClip2Concat</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>sem</td>\n",
" </tr>\n",
" <tr>\n",
" <th>98</th>\n",
" <td>webstagram</td>\n",
" <td>[ˈwebstəgræm]</td>\n",
" <td>[web]</td>\n",
" <td>[ˈɪnstəgræm]</td>\n",
" <td>web</td>\n",
" <td>ɪnstəgræm</td>\n",
" <td>webstəgræm</td>\n",
" <td>isSolveByClip2Concat</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>ɪn</td>\n",
" </tr>\n",
" <tr>\n",
" <th>99</th>\n",
" <td>whack</td>\n",
" <td>[wæk]</td>\n",
" <td>[ˈwaɪ‿ələs]</td>\n",
" <td>[hæk]</td>\n",
" <td>waɪələs</td>\n",
" <td>hæk</td>\n",
" <td>wæk</td>\n",
" <td>isSolveByClip1Clip2Concat</td>\n",
" <td>NaN</td>\n",
" <td>aɪələs</td>\n",
" <td>h</td>\n",
" </tr>\n",
" <tr>\n",
" <th>100</th>\n",
" <td>wigger / whigger</td>\n",
" <td>[ˈwɪɡə]</td>\n",
" <td>[waɪt]</td>\n",
" <td>[ˈnɪɡə]</td>\n",
" <td>waɪt</td>\n",
" <td>nɪɡə</td>\n",
" <td>wɪɡə</td>\n",
" <td>isSolveByClip1Clip2Concat</td>\n",
" <td>NaN</td>\n",
" <td>aɪt</td>\n",
" <td>n</td>\n",
" </tr>\n",
" <tr>\n",
" <th>101</th>\n",
" <td>Winblows</td>\n",
" <td>[ˈwɪnbləʊz]</td>\n",
" <td>[ˈwɪndəʊz]</td>\n",
" <td>[bləʊ]</td>\n",
" <td>wɪndəʊz</td>\n",
" <td>bləʊ</td>\n",
" <td>wɪnbləʊz</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>102 rows × 11 columns</p>\n",
"</div>"
],
"text/plain": [
" Amalgame (graphie) Amalgame (phonie) Mot 1 \\\n",
"0 acquihire [ˌækwɪˈhaɪ‿ə] [ˌækwɪˈzɪʃən] \n",
"1 ASCIIbetical [ˌæskiˈbetɪkəl] [ˈæski] \n",
"2 automagically / automagicly [ˌɔːtəʊˈmædʒɪkli] [ˌɔːtəˈmӕtɪkli] \n",
"3 avatard [ˈævətɑːd] [ˈævətɑː] \n",
"4 babelicious [ˌbeɪbəˈlɪʃəs] [beɪb] \n",
".. ... ... ... \n",
"97 webinar [ˈwebɪnɑː] [web] \n",
"98 webstagram [ˈwebstəgræm] [web] \n",
"99 whack [wæk] [ˈwaɪ‿ələs] \n",
"100 wigger / whigger [ˈwɪɡə] [waɪt] \n",
"101 Winblows [ˈwɪnbləʊz] [ˈwɪndəʊz] \n",
"\n",
" Mot 2 mot1clean mot2clean Amalgameclean \\\n",
"0 [ˈhaɪ‿ə] ækwɪzɪʃən haɪə ækwɪhaɪə \n",
"1 [ˌӕlfəˈbetɪkəl] æski ælfəbetɪkəl æskibetɪkəl \n",
"2 [ˈmӕdʒɪkli] ɔtəmætɪkli mædʒɪkli ɔtəʊmædʒɪkli \n",
"3 [ˈtɑːd] ævətɑ tɑd ævətɑd \n",
"4 [diˈlɪʃəs] / [dəˈlɪʃəs] beɪb dilɪʃəs beɪbəlɪʃəs \n",
".. ... ... ... ... \n",
"97 [ˈsemɪnɑː] web semɪnɑ webɪnɑ \n",
"98 [ˈɪnstəgræm] web ɪnstəgræm webstəgræm \n",
"99 [hæk] waɪələs hæk wæk \n",
"100 [ˈnɪɡə] waɪt nɪɡə wɪɡə \n",
"101 [bləʊ] wɪndəʊz bləʊ wɪnbləʊz \n",
"\n",
" solvedBy findedOverlap deletedSegmentMot1 \\\n",
"0 isSolveByClip1Concat NaN zɪʃən \n",
"1 isSolveByClip2Concat NaN NaN \n",
"2 NaN NaN NaN \n",
"3 isSolveByOverlap tɑ NaN \n",
"4 NaN NaN NaN \n",
".. ... ... ... \n",
"97 isSolveByClip2Concat NaN NaN \n",
"98 isSolveByClip2Concat NaN NaN \n",
"99 isSolveByClip1Clip2Concat NaN aɪələs \n",
"100 isSolveByClip1Clip2Concat NaN aɪt \n",
"101 NaN NaN NaN \n",
"\n",
" deletedSegmentMot2 \n",
"0 NaN \n",
"1 ælfə \n",
"2 NaN \n",
"3 NaN \n",
"4 NaN \n",
".. ... \n",
"97 sem \n",
"98 ɪn \n",
"99 h \n",
"100 n \n",
"101 NaN \n",
"\n",
"[102 rows x 11 columns]"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pathData = \"./../datas/classified_amalgames.xlsx\"\n",
"df = pd.read_excel(pathData)\n",
"df"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"ɪn 3\n",
"əʊ 3\n",
"æk 3\n",
"rəʊ 3\n",
"t 3\n",
"æ 2\n",
"p 2\n",
"æp 1\n",
"eɪk 1\n",
"lɒ 1\n",
"tɑ 1\n",
"en 1\n",
"et 1\n",
"in 1\n",
"l 1\n",
"æn 1\n",
"s 1\n",
"ʌ 1\n",
"eə 1\n",
"eɪt 1\n",
"əlɒs 1\n",
"i 1\n",
"e 1\n",
"eks 1\n",
"ip 1\n",
"z 1\n",
"f 1\n",
"Name: findedOverlap, dtype: int64"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df['findedOverlap'].value_counts()"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"d 2\n",
"eɪk 2\n",
"rət 1\n",
"ɪŋ 1\n",
"kɪŋ 1\n",
"græm 1\n",
"rikɪŋ 1\n",
"eɪdʒə 1\n",
"əfə 1\n",
"stfid 1\n",
"ud 1\n",
"aɪələs 1\n",
"v 1\n",
"zɪʃən 1\n",
"əʊtəʊ 1\n",
"keɪʃən 1\n",
"g 1\n",
"ɪtəl 1\n",
"əʊk 1\n",
"t 1\n",
"ɒg 1\n",
"ɪtə 1\n",
"əʊn 1\n",
"ɒnəbi 1\n",
"bʊk 1\n",
"ʌm 1\n",
"aɪt 1\n",
"ʌt 1\n",
"aɪəl 1\n",
"ɔɪs 1\n",
"Name: deletedSegmentMot1, dtype: int64"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df['deletedSegmentMot1'].value_counts()"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"p 6\n",
"t 5\n",
"ɪn 4\n",
"b 4\n",
"s 3\n",
"n 2\n",
"tʌm 1\n",
"dʒ 1\n",
"sɪl 1\n",
"ælfə 1\n",
"sem 1\n",
"ɡɒd 1\n",
"ɪk 1\n",
"sɪ 1\n",
"æd 1\n",
"ɒntrə 1\n",
"v 1\n",
"bl 1\n",
"sn 1\n",
"k 1\n",
"ɒn 1\n",
"sel 1\n",
"kl 1\n",
"mæn 1\n",
"kæn 1\n",
"self 1\n",
"pæp 1\n",
"ɒb 1\n",
"junɪ 1\n",
"pæ 1\n",
"lɪt 1\n",
"d 1\n",
"eθ 1\n",
"e 1\n",
"helɪ 1\n",
"m 1\n",
"veɪ 1\n",
"kɒm 1\n",
"ek 1\n",
"ɑ 1\n",
"entə 1\n",
"h 1\n",
"kæ 1\n",
"f 1\n",
"ri 1\n",
"Name: deletedSegmentMot2, dtype: int64"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df['deletedSegmentMot2'].value_counts()"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"isSolveByClip2Concat 23\n",
"isSolveByClip2Overlap 19\n",
"isSolveByClip1Clip2Concat 17\n",
"isSolveByOverlap 9\n",
"isSolveByClip1Overlap 7\n",
"isSolveByClip1Concat 4\n",
"isSolveByClip1Clip2Overlap 4\n",
"Name: solvedBy, dtype: int64"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df['solvedBy'].value_counts()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.5"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
%% Cell type:code id: tags:
```
python
import
pandas
as
pd
```
%% Cell type:code id: tags:
```
python
pathData
=
"./../datas/classified_amalgames.xlsx"
df
=
pd
.
read_excel
(
pathData
)
df
```
%% Output
Amalgame (graphie) Amalgame (phonie) Mot 1 \
0 acquihire [ˌækwɪˈhaɪ‿ə] [ˌækwɪˈzɪʃən]
1 ASCIIbetical [ˌæskiˈbetɪkəl] [ˈæski]
2 automagically / automagicly [ˌɔːtəʊˈmædʒɪkli] [ˌɔːtəˈmӕtɪkli]
3 avatard [ˈævətɑːd] [ˈævətɑː]
4 babelicious [ˌbeɪbəˈlɪʃəs] [beɪb]
.. ... ... ...
97 webinar [ˈwebɪnɑː] [web]
98 webstagram [ˈwebstəgræm] [web]
99 whack [wæk] [ˈwaɪ‿ələs]
100 wigger / whigger [ˈwɪɡə] [waɪt]
101 Winblows [ˈwɪnbləʊz] [ˈwɪndəʊz]
Mot 2 mot1clean mot2clean Amalgameclean \
0 [ˈhaɪ‿ə] ækwɪzɪʃən haɪə ækwɪhaɪə
1 [ˌӕlfəˈbetɪkəl] æski ælfəbetɪkəl æskibetɪkəl
2 [ˈmӕdʒɪkli] ɔtəmætɪkli mædʒɪkli ɔtəʊmædʒɪkli
3 [ˈtɑːd] ævətɑ tɑd ævətɑd
4 [diˈlɪʃəs] / [dəˈlɪʃəs] beɪb dilɪʃəs beɪbəlɪʃəs
.. ... ... ... ...
97 [ˈsemɪnɑː] web semɪnɑ webɪnɑ
98 [ˈɪnstəgræm] web ɪnstəgræm webstəgræm
99 [hæk] waɪələs hæk wæk
100 [ˈnɪɡə] waɪt nɪɡə wɪɡə
101 [bləʊ] wɪndəʊz bləʊ wɪnbləʊz
solvedBy findedOverlap deletedSegmentMot1 \
0 isSolveByClip1Concat NaN zɪʃən
1 isSolveByClip2Concat NaN NaN
2 NaN NaN NaN
3 isSolveByOverlap tɑ NaN
4 NaN NaN NaN
.. ... ... ...
97 isSolveByClip2Concat NaN NaN
98 isSolveByClip2Concat NaN NaN
99 isSolveByClip1Clip2Concat NaN aɪələs
100 isSolveByClip1Clip2Concat NaN aɪt
101 NaN NaN NaN
deletedSegmentMot2
0 NaN
1 ælfə
2 NaN
3 NaN
4 NaN
.. ...
97 sem
98 ɪn
99 h
100 n
101 NaN
[102 rows x 11 columns]
%% Cell type:code id: tags:
```
python
df
[
'findedOverlap'
].
value_counts
()
```
%% Output
ɪn 3
əʊ 3
æk 3
rəʊ 3
t 3
æ 2
p 2
æp 1
eɪk 1
lɒ 1
tɑ 1
en 1
et 1
in 1
l 1
æn 1
s 1
ʌ 1
eə 1
eɪt 1
əlɒs 1
i 1
e 1
eks 1
ip 1
z 1
f 1
Name: findedOverlap, dtype: int64
%% Cell type:code id: tags:
```
python
df
[
'deletedSegmentMot1'
].
value_counts
()
```
%% Output
d 2
eɪk 2
rət 1
ɪŋ 1
kɪŋ 1
græm 1
rikɪŋ 1
eɪdʒə 1
əfə 1
stfid 1
ud 1
aɪələs 1
v 1
zɪʃən 1
əʊtəʊ 1
keɪʃən 1
g 1
ɪtəl 1
əʊk 1
t 1
ɒg 1
ɪtə 1
əʊn 1
ɒnəbi 1
bʊk 1
ʌm 1
aɪt 1
ʌt 1
aɪəl 1
ɔɪs 1
Name: deletedSegmentMot1, dtype: int64
%% Cell type:code id: tags:
```
python
df
[
'deletedSegmentMot2'
].
value_counts
()
```
%% Output
p 6
t 5
ɪn 4
b 4
s 3
n 2
tʌm 1
dʒ 1
sɪl 1
ælfə 1
sem 1
ɡɒd 1
ɪk 1