{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "31c4b0ff",
   "metadata": {},
   "source": [
    "# 8.3 Statistik mit Pandas\n",
    "\n",
    "## Lernziele\n",
    "\n",
    "```{admonition} Lernziele\n",
    ":class: admonition-goals\n",
    "* Sie können sich mit **describe** eine Übersicht über statistische Kennzahlen\n",
    "  verschaffen.\n",
    "* Sie wissen, wie Sie die Anzahl der gültigen Einträge mit **count** ermitteln.\n",
    "* Sie kennen die statistischen Kennzahlen Mittelwert und Standardabweichung und\n",
    "  wissen, wie diese mit **mean** und **std** berechnet werden.\n",
    "* Sie können das Minimum und das Maximum mit **min** und **max** bestimmen.\n",
    "* Sie wissen wie ein Quantil interpretiert wird und wie es mit **quantile**\n",
    "  berechnet wird.\n",
    "```\n",
    "\n",
    "\n",
    "## Schnelle Übersicht mit .describe()\n",
    "\n",
    "So wie die Methode `.info()` uns einen schnellen Überblick über die Daten eines\n",
    "DataFrame-Objektes gibt, so liefert die Methode `.describe()` eine schnelle\n",
    "Übersicht über statistische Kennzahlen. Wir bleiben bei unserem Beispiel der\n",
    "Spielerdaten der Top7-Fußballvereine der Bundesligasaison 2020/21."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "1f1c3a90",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Club</th>\n",
       "      <th>Nationality</th>\n",
       "      <th>Position</th>\n",
       "      <th>Age</th>\n",
       "      <th>Matches</th>\n",
       "      <th>Starts</th>\n",
       "      <th>Mins</th>\n",
       "      <th>Goals</th>\n",
       "      <th>Assists</th>\n",
       "      <th>Penalty_Goals</th>\n",
       "      <th>Penalty_Attempted</th>\n",
       "      <th>xG</th>\n",
       "      <th>xA</th>\n",
       "      <th>Yellow_Cards</th>\n",
       "      <th>Red_Cards</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Name</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Manuel Neuer</th>\n",
       "      <td>Bayern Munich</td>\n",
       "      <td>GER</td>\n",
       "      <td>GK</td>\n",
       "      <td>34</td>\n",
       "      <td>33</td>\n",
       "      <td>33</td>\n",
       "      <td>2970</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.00</td>\n",
       "      <td>0.01</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Thomas Müller</th>\n",
       "      <td>Bayern Munich</td>\n",
       "      <td>GER</td>\n",
       "      <td>MF</td>\n",
       "      <td>30</td>\n",
       "      <td>32</td>\n",
       "      <td>31</td>\n",
       "      <td>2674</td>\n",
       "      <td>11</td>\n",
       "      <td>19</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0.24</td>\n",
       "      <td>0.39</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>David Alaba</th>\n",
       "      <td>Bayern Munich</td>\n",
       "      <td>AUT</td>\n",
       "      <td>DF,MF</td>\n",
       "      <td>28</td>\n",
       "      <td>32</td>\n",
       "      <td>30</td>\n",
       "      <td>2675</td>\n",
       "      <td>2</td>\n",
       "      <td>4</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.04</td>\n",
       "      <td>0.08</td>\n",
       "      <td>4</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Jérôme Boateng</th>\n",
       "      <td>Bayern Munich</td>\n",
       "      <td>GER</td>\n",
       "      <td>DF</td>\n",
       "      <td>31</td>\n",
       "      <td>29</td>\n",
       "      <td>29</td>\n",
       "      <td>2368</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.01</td>\n",
       "      <td>0.02</td>\n",
       "      <td>6</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Robert Lewandowski</th>\n",
       "      <td>Bayern Munich</td>\n",
       "      <td>POL</td>\n",
       "      <td>FW</td>\n",
       "      <td>31</td>\n",
       "      <td>29</td>\n",
       "      <td>28</td>\n",
       "      <td>2458</td>\n",
       "      <td>41</td>\n",
       "      <td>7</td>\n",
       "      <td>8</td>\n",
       "      <td>9</td>\n",
       "      <td>1.16</td>\n",
       "      <td>0.13</td>\n",
       "      <td>4</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Joshua Kimmich</th>\n",
       "      <td>Bayern Munich</td>\n",
       "      <td>GER</td>\n",
       "      <td>MF</td>\n",
       "      <td>25</td>\n",
       "      <td>27</td>\n",
       "      <td>25</td>\n",
       "      <td>2194</td>\n",
       "      <td>4</td>\n",
       "      <td>10</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.10</td>\n",
       "      <td>0.27</td>\n",
       "      <td>4</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Kingsley Coman</th>\n",
       "      <td>Bayern Munich</td>\n",
       "      <td>FRA</td>\n",
       "      <td>FW,MF</td>\n",
       "      <td>24</td>\n",
       "      <td>29</td>\n",
       "      <td>23</td>\n",
       "      <td>1752</td>\n",
       "      <td>5</td>\n",
       "      <td>10</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.21</td>\n",
       "      <td>0.34</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Benjamin Pavard</th>\n",
       "      <td>Bayern Munich</td>\n",
       "      <td>FRA</td>\n",
       "      <td>DF</td>\n",
       "      <td>24</td>\n",
       "      <td>24</td>\n",
       "      <td>22</td>\n",
       "      <td>1943</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.02</td>\n",
       "      <td>0.09</td>\n",
       "      <td>3</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Alphonso Davies</th>\n",
       "      <td>Bayern Munich</td>\n",
       "      <td>CAN</td>\n",
       "      <td>DF</td>\n",
       "      <td>19</td>\n",
       "      <td>23</td>\n",
       "      <td>22</td>\n",
       "      <td>1763</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.01</td>\n",
       "      <td>0.04</td>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Serge Gnabry</th>\n",
       "      <td>Bayern Munich</td>\n",
       "      <td>GER</td>\n",
       "      <td>FW,MF</td>\n",
       "      <td>25</td>\n",
       "      <td>27</td>\n",
       "      <td>20</td>\n",
       "      <td>1644</td>\n",
       "      <td>10</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.44</td>\n",
       "      <td>0.25</td>\n",
       "      <td>4</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                             Club Nationality Position  Age  Matches  Starts  \\\n",
       "Name                                                                           \n",
       "Manuel Neuer        Bayern Munich         GER       GK   34       33      33   \n",
       "Thomas Müller       Bayern Munich         GER       MF   30       32      31   \n",
       "David Alaba         Bayern Munich         AUT    DF,MF   28       32      30   \n",
       "Jérôme Boateng      Bayern Munich         GER       DF   31       29      29   \n",
       "Robert Lewandowski  Bayern Munich         POL       FW   31       29      28   \n",
       "Joshua Kimmich      Bayern Munich         GER       MF   25       27      25   \n",
       "Kingsley Coman      Bayern Munich         FRA    FW,MF   24       29      23   \n",
       "Benjamin Pavard     Bayern Munich         FRA       DF   24       24      22   \n",
       "Alphonso Davies     Bayern Munich         CAN       DF   19       23      22   \n",
       "Serge Gnabry        Bayern Munich         GER    FW,MF   25       27      20   \n",
       "\n",
       "                    Mins  Goals  Assists  Penalty_Goals  Penalty_Attempted  \\\n",
       "Name                                                                         \n",
       "Manuel Neuer        2970      0        0              0                  0   \n",
       "Thomas Müller       2674     11       19              1                  1   \n",
       "David Alaba         2675      2        4              0                  0   \n",
       "Jérôme Boateng      2368      1        1              0                  0   \n",
       "Robert Lewandowski  2458     41        7              8                  9   \n",
       "Joshua Kimmich      2194      4       10              0                  0   \n",
       "Kingsley Coman      1752      5       10              0                  0   \n",
       "Benjamin Pavard     1943      0        0              0                  0   \n",
       "Alphonso Davies     1763      1        2              0                  0   \n",
       "Serge Gnabry        1644     10        2              0                  0   \n",
       "\n",
       "                      xG    xA  Yellow_Cards  Red_Cards  \n",
       "Name                                                     \n",
       "Manuel Neuer        0.00  0.01             1          0  \n",
       "Thomas Müller       0.24  0.39             0          0  \n",
       "David Alaba         0.04  0.08             4          0  \n",
       "Jérôme Boateng      0.01  0.02             6          0  \n",
       "Robert Lewandowski  1.16  0.13             4          0  \n",
       "Joshua Kimmich      0.10  0.27             4          0  \n",
       "Kingsley Coman      0.21  0.34             1          0  \n",
       "Benjamin Pavard     0.02  0.09             3          0  \n",
       "Alphonso Davies     0.01  0.04             2          1  \n",
       "Serge Gnabry        0.44  0.25             4          0  "
      ]
     },
     "execution_count": 1,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd\n",
    "\n",
    "data = pd.read_csv('bundesliga_top7_offensive.csv', index_col=0)\n",
    "data.head(10)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e17c54de",
   "metadata": {},
   "source": [
    "Die Anwendung der `.describe()`-Methode liefert fogende Ausgabe:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "fded62b0",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Age</th>\n",
       "      <th>Matches</th>\n",
       "      <th>Starts</th>\n",
       "      <th>Mins</th>\n",
       "      <th>Goals</th>\n",
       "      <th>Assists</th>\n",
       "      <th>Penalty_Goals</th>\n",
       "      <th>Penalty_Attempted</th>\n",
       "      <th>xG</th>\n",
       "      <th>xA</th>\n",
       "      <th>Yellow_Cards</th>\n",
       "      <th>Red_Cards</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>count</th>\n",
       "      <td>177.000000</td>\n",
       "      <td>177.000000</td>\n",
       "      <td>177.000000</td>\n",
       "      <td>177.000000</td>\n",
       "      <td>177.000000</td>\n",
       "      <td>177.000000</td>\n",
       "      <td>177.000000</td>\n",
       "      <td>177.000000</td>\n",
       "      <td>177.000000</td>\n",
       "      <td>177.000000</td>\n",
       "      <td>177.000000</td>\n",
       "      <td>177.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>mean</th>\n",
       "      <td>24.903955</td>\n",
       "      <td>19.858757</td>\n",
       "      <td>14.740113</td>\n",
       "      <td>1321.604520</td>\n",
       "      <td>2.542373</td>\n",
       "      <td>2.005650</td>\n",
       "      <td>0.214689</td>\n",
       "      <td>0.271186</td>\n",
       "      <td>0.157571</td>\n",
       "      <td>0.106328</td>\n",
       "      <td>2.254237</td>\n",
       "      <td>0.056497</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>std</th>\n",
       "      <td>4.309983</td>\n",
       "      <td>10.116219</td>\n",
       "      <td>10.526494</td>\n",
       "      <td>899.843857</td>\n",
       "      <td>4.911681</td>\n",
       "      <td>3.117941</td>\n",
       "      <td>0.976548</td>\n",
       "      <td>1.115447</td>\n",
       "      <td>0.226989</td>\n",
       "      <td>0.130900</td>\n",
       "      <td>2.258267</td>\n",
       "      <td>0.231534</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>min</th>\n",
       "      <td>15.000000</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>5.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25%</th>\n",
       "      <td>22.000000</td>\n",
       "      <td>13.000000</td>\n",
       "      <td>5.000000</td>\n",
       "      <td>456.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.020000</td>\n",
       "      <td>0.010000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>50%</th>\n",
       "      <td>25.000000</td>\n",
       "      <td>22.000000</td>\n",
       "      <td>15.000000</td>\n",
       "      <td>1359.000000</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.090000</td>\n",
       "      <td>0.080000</td>\n",
       "      <td>2.000000</td>\n",
       "      <td>0.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>75%</th>\n",
       "      <td>28.000000</td>\n",
       "      <td>29.000000</td>\n",
       "      <td>24.000000</td>\n",
       "      <td>2044.000000</td>\n",
       "      <td>3.000000</td>\n",
       "      <td>3.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.220000</td>\n",
       "      <td>0.160000</td>\n",
       "      <td>4.000000</td>\n",
       "      <td>0.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>max</th>\n",
       "      <td>36.000000</td>\n",
       "      <td>34.000000</td>\n",
       "      <td>34.000000</td>\n",
       "      <td>3060.000000</td>\n",
       "      <td>41.000000</td>\n",
       "      <td>19.000000</td>\n",
       "      <td>8.000000</td>\n",
       "      <td>9.000000</td>\n",
       "      <td>2.020000</td>\n",
       "      <td>1.230000</td>\n",
       "      <td>10.000000</td>\n",
       "      <td>1.000000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "              Age     Matches      Starts         Mins       Goals  \\\n",
       "count  177.000000  177.000000  177.000000   177.000000  177.000000   \n",
       "mean    24.903955   19.858757   14.740113  1321.604520    2.542373   \n",
       "std      4.309983   10.116219   10.526494   899.843857    4.911681   \n",
       "min     15.000000    1.000000    0.000000     5.000000    0.000000   \n",
       "25%     22.000000   13.000000    5.000000   456.000000    0.000000   \n",
       "50%     25.000000   22.000000   15.000000  1359.000000    1.000000   \n",
       "75%     28.000000   29.000000   24.000000  2044.000000    3.000000   \n",
       "max     36.000000   34.000000   34.000000  3060.000000   41.000000   \n",
       "\n",
       "          Assists  Penalty_Goals  Penalty_Attempted          xG          xA  \\\n",
       "count  177.000000     177.000000         177.000000  177.000000  177.000000   \n",
       "mean     2.005650       0.214689           0.271186    0.157571    0.106328   \n",
       "std      3.117941       0.976548           1.115447    0.226989    0.130900   \n",
       "min      0.000000       0.000000           0.000000    0.000000    0.000000   \n",
       "25%      0.000000       0.000000           0.000000    0.020000    0.010000   \n",
       "50%      1.000000       0.000000           0.000000    0.090000    0.080000   \n",
       "75%      3.000000       0.000000           0.000000    0.220000    0.160000   \n",
       "max     19.000000       8.000000           9.000000    2.020000    1.230000   \n",
       "\n",
       "       Yellow_Cards   Red_Cards  \n",
       "count    177.000000  177.000000  \n",
       "mean       2.254237    0.056497  \n",
       "std        2.258267    0.231534  \n",
       "min        0.000000    0.000000  \n",
       "25%        0.000000    0.000000  \n",
       "50%        2.000000    0.000000  \n",
       "75%        4.000000    0.000000  \n",
       "max       10.000000    1.000000  "
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data.describe()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e16fc542",
   "metadata": {},
   "source": [
    "Da es sich eingebürgert hat, Daten zeilenweise abzuspeichern und die Eigenschaft\n",
    "pro einzelnem Datensatz in den Spalten zu speichern, wertet `.describe()` jede\n",
    "Spalte für sich aus. Für jede Eigenschaft werden dann die statistischen\n",
    "Kennzahlen\n",
    "\n",
    "* count\n",
    "* mean\n",
    "* std\n",
    "* min\n",
    "* max\n",
    "* Quantile 25 %, 50 % und 75 %\n",
    "* max\n",
    "\n",
    "ausgegeben.\n",
    "\n",
    "Die Bedeutung der Kennzahlen wird in der\n",
    "[Pandas-Dokumentation/DataFrame.describe\n",
    "](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.describe.html)\n",
    "erläutert. Wir gehen dennoch jede Kennzahl einzeln durch.\n",
    "\n",
    "\n",
    "## Anzahl count\n",
    "\n",
    "Mit `.count()` wird die Anzahl der Einträge bestimmt, die *nicht* 'NA' sind. Der\n",
    "Begriff 'NA' stammt dabei aus dem Bereich Data Science. Gemeint sind fehlende\n",
    "Einträge, wobei die fehlenden Einträge verschiedene Ursachen haben können:\n",
    "\n",
    "* NA = not available (der Messsensor hat versagt)\n",
    "* NA = not applicable (es ist sinnlos bei einem Mann nachzufragen, ob er\n",
    "  schwanger ist)\n",
    "* NA = no answer (eine Person hat bei dem Umfrage nichts angegeben)\n",
    "\n",
    "Wir können auch direkt auf diesen Wert zugreifen, wenn wir beispielsweise wissen\n",
    "wollen, bei wie vielen Fußballspielern ein Alter eingetragen ist. Wird die\n",
    "Methode `.count()` direkt auf den kompletten DataFrame angewendet, so erhalten\n",
    "wir ein Pandas-Series-Objekt."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "0f40c4da",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Club                 177\n",
      "Nationality          177\n",
      "Position             177\n",
      "Age                  177\n",
      "Matches              177\n",
      "Starts               177\n",
      "Mins                 177\n",
      "Goals                177\n",
      "Assists              177\n",
      "Penalty_Goals        177\n",
      "Penalty_Attempted    177\n",
      "xG                   177\n",
      "xA                   177\n",
      "Yellow_Cards         177\n",
      "Red_Cards            177\n",
      "dtype: int64\n"
     ]
    }
   ],
   "source": [
    "print( data.count() )"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7a8fc8e7",
   "metadata": {},
   "source": [
    "Um jetzt an die Anzahl gültiger Altersangaben zu kommen, können wir entweder\n",
    "erst die Spalte mit dem Alter heraussgreifen und darauf `.count()` anwenden."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "8d71fef3",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "177\n"
     ]
    }
   ],
   "source": [
    "methode01 = data.loc[:, 'Age'].count()\n",
    "print(methode01)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9fabbbf2",
   "metadata": {},
   "source": [
    "Oder wir wenden zuerst `.count()`an und wählen dann im Series-Objekt das Alter\n",
    "'Age' aus."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "5310740f",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "177\n"
     ]
    }
   ],
   "source": [
    "methode02 = data.count().loc['Age']\n",
    "print(methode02)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e1799785",
   "metadata": {},
   "source": [
    "## Mittelwert mean\n",
    "\n",
    "Mittelwert heißt auf Englisch mean. Daher ist es nicht verwunderlich, dass die Methode `.mean()` den Mittelwert der Einträge in jeder Spalte berechnet."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "10cc8d4f",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Age                    24.903955\n",
      "Matches                19.858757\n",
      "Starts                 14.740113\n",
      "Mins                 1321.604520\n",
      "Goals                   2.542373\n",
      "Assists                 2.005650\n",
      "Penalty_Goals           0.214689\n",
      "Penalty_Attempted       0.271186\n",
      "xG                      0.157571\n",
      "xA                      0.106328\n",
      "Yellow_Cards            2.254237\n",
      "Red_Cards               0.056497\n",
      "dtype: float64\n"
     ]
    }
   ],
   "source": [
    "mittelwert = data.mean(numeric_only=True)\n",
    "print(mittelwert)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "495b6d28",
   "metadata": {},
   "source": [
    "An der Stelle ist es wichtig, die Option `numeric_only=True` zu setzen, damit\n",
    "nur von numerischen Werten, also Zahlen, der Mittelwert gebildet wird.\n",
    "\n",
    "Wir entnehmen der Statistik, dass Fußballer der Top7-Vereine im Mittel 24.9\n",
    "Jahre alt sind und 1321.6 Minuten im Einsatz waren.\n",
    "\n",
    "Falls Sie prinzipiell nochmal die Berechnung des Mittelwertes wiederholen\n",
    "wollen, können Sie folgendes Video ansehen.\n",
    "\n",
    "<iframe width=\"560\" height=\"315\" src=\"https://www.youtube.com/embed/IKfsGPwACnU\" title=\"YouTube video player\" frameborder=\"0\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture\" allowfullscreen></iframe>\n",
    "\n",
    "\n",
    "## Standardabweichung std\n",
    "\n",
    "Das 'st' in `.std()`für Standard steht, ist nachvollziehbar. Der dritte\n",
    "Buchstabe 'd' kommt von 'deviation', also Abweichung. Somit ist wiederum die\n",
    "Methode nach dem englischen Fachbegriff 'standard deviation' benannt.  Welche\n",
    "Standardabweichung erhalten wir beim Alter?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "6f1f1118",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Age                    4.309983\n",
      "Matches               10.116219\n",
      "Starts                10.526494\n",
      "Mins                 899.843857\n",
      "Goals                  4.911681\n",
      "Assists                3.117941\n",
      "Penalty_Goals          0.976548\n",
      "Penalty_Attempted      1.115447\n",
      "xG                     0.226989\n",
      "xA                     0.130900\n",
      "Yellow_Cards           2.258267\n",
      "Red_Cards              0.231534\n",
      "dtype: float64\n"
     ]
    }
   ],
   "source": [
    "standardabweichung = data.std(numeric_only=True)\n",
    "print(standardabweichung)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "61d7254a",
   "metadata": {},
   "source": [
    "Es sind 4.3 Jahre. Das haben wir jetzt der Ausgabe abgelsen. Wenn wir den Wert\n",
    "extrahieren wollen, gibt es wieder die beiden Methoden. Entweder erst Spalte und\n",
    "dann `.std()` oder erst `.std()`und dann Selektion nach 'Age'. Probieren wir es\n",
    "aus."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "75738264",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "4.309982619427105\n"
     ]
    }
   ],
   "source": [
    "alter_std = data.loc[:, 'Age'].std()\n",
    "print(alter_std) "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7828cf0a",
   "metadata": {},
   "source": [
    "Was war eigentlich nochmal die Standardabweichung? Falls Sie dazu eine kurze\n",
    "Wiederholung der Theorie benötigen, empfehle ich Ihnen dieses Video.\n",
    "\n",
    "<iframe width=\"560\" height=\"315\" src=\"https://www.youtube.com/embed/QNNt7BvmUJM\" title=\"YouTube video player\" frameborder=\"0\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture\" allowfullscreen></iframe>\n",
    "\n",
    "\n",
    "## Minimum und Maximum mit min und max\n",
    "\n",
    "Die Namen der Methoden `.min()` und `max()` sind fast schon wieder\n",
    "selbsterklärend. Die Methode `.min()` liefert den kleinsten Werte zurück, der in\n",
    "einer Spalte gefunden wird. Umgekehrt liefert `.max()` den größten Eintrag, der\n",
    "in jeder Spalte gefunden wird. Wie häufig die minimalen und maximalen Werte\n",
    "vorkommen, ist dabei egal. \n",
    "\n",
    "Schauen wir uns an, was die minimale Anzahl von Toren ist, die geschossen wurden\n",
    "(haben Sie eine Vermutung). Und dann schauen wir gleich nach, was die maximale\n",
    "Anzahl von Toren ist."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "0bf896e8",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0\n",
      "41\n"
     ]
    }
   ],
   "source": [
    "tore_min = data.loc[:, 'Goals'].min()\n",
    "print(tore_min)\n",
    "\n",
    "tore_max = data.loc[:, 'Goals'].max()\n",
    "print(tore_max)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "77645b31",
   "metadata": {},
   "source": [
    "Wenig verwunderlich ist die minimale Anzahl an Toren 0 und die maximale Anzahl\n",
    "an Toren, die ein oder mehrere Spieler der Top7 2020/21 geschossen haben, war\n",
    "41. (Wahrscheinlich wissen Sie aber, dass nur ein Spieler 41 Tore geschafft hat,\n",
    "natürlich Lewandowski).\n",
    "\n",
    "Von Verteidigern wird nicht erwartet, Tore zu schieen, sondern von Stürmern. Was\n",
    "ist denn das Minimum an Toren bei den Stürmern? Die Positionen sind in der\n",
    "Spalte 'Position'. Dabei bedeutet FW = forward = Stürmer, MF = mid field =\n",
    "Mittelfeld, DF = defensive = Verteidigung und GK = goalkeeper = Torwart. Bei\n",
    "manchen Spielern stehen zwei Positionen, konzentrieren wir uns auf diejenigen,\n",
    "bei denen nur 'FW' eingetragen ist."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "6ca72379",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Stürmer\n",
      "Name\n",
      "Robert Lewandowski    41\n",
      "Joshua Zirkzee         0\n",
      "Mickaël Cuisance       0\n",
      "Erling Haaland        27\n",
      "Steffen Tigges         0\n",
      "Yussuf Poulsen         5\n",
      "Alexander Sørloth      5\n",
      "Justin Kluivert        3\n",
      "Wout Weghorst         20\n",
      "Josip Brekalo          7\n",
      "André Silva           28\n",
      "Bas Dost               4\n",
      "Patrik Schick          9\n",
      "Lucas Alario          11\n",
      "Demarai Gray           1\n",
      "Paulinho               0\n",
      "Taiwo Awoniyi          5\n",
      "Joel Pohjanpalo        6\n",
      "Petar Musa             1\n",
      "Akaki Gogia            0\n",
      "Leon Dajaku            0\n",
      "Joshua Mees            0\n",
      "Name: Goals, dtype: int64\n",
      "==============\n",
      "Minimale Tore: 0\n"
     ]
    }
   ],
   "source": [
    "filter = data.loc[:, 'Position'] == 'FW'\n",
    "stuermer = data.loc[filter, 'Goals']\n",
    "\n",
    "print('Stürmer')\n",
    "print(stuermer)\n",
    "\n",
    "print('==============')\n",
    "print('Minimale Tore: {}'.format(stuermer.min()))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6294ae43",
   "metadata": {},
   "source": [
    "## Quantil mit quantile\n",
    "\n",
    "Das Quantil $p \\%$ ist der Wert, bei dem $p %$ der Einträge kleiner als diese\n",
    "Zahl sind und $100 \\% - p \\%$ sind größer. Meist werden nicht Prozentzahlen\n",
    "verwendet, sondern p ist zwischen 0 und 1, wobei die 1 für 100 % steht. \n",
    "\n",
    "Angenommen, wir würden gerne das 0.5-Quantil (auch Median genannt) der gelben\n",
    "Karten wissen. Mit der Methode `.quantile()` können wir diesen Wert leicht aus\n",
    "den Daten holen."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "560104a9",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "2.0\n"
     ]
    }
   ],
   "source": [
    "gelbe_karten_50prozent_quantil = data.loc[:, 'Yellow_Cards'].quantile(0.5)\n",
    "print(gelbe_karten_50prozent_quantil)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "206ac6a9",
   "metadata": {},
   "source": [
    "Das 50 % -Quantil liegt bei 2 gelben Karten. 50 % aller Spieler haben also\n",
    "weniger als 2 gelbe Karten kassiert. Und 50 % aller Spieler haben 2 oder mehr\n",
    "gelbe Karten kassiert. Wir schauen uns jetzt das 75 % Quantil an."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "743a0675",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "4.0\n"
     ]
    }
   ],
   "source": [
    "gelbe_karten_75prozent_quantil = data.loc[:, 'Yellow_Cards'].quantile(0.75)\n",
    "print(gelbe_karten_75prozent_quantil)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d45c0800",
   "metadata": {},
   "source": [
    "75 % aller Spieler haben weniger als 4 gelbe Karten bekommen. SChauen wir uns\n",
    "die Gelbkarten-Spieler an. Ob da vielleicht mehrheitlich Defensivspieler dabei\n",
    "sind?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "a7ec9a3d",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "                    Position  Yellow_Cards\n",
      "Name                                      \n",
      "Sebastian Rode            MF            10\n",
      "Obite N'Dicka             DF            10\n",
      "Christopher Trimmel       DF             9\n",
      "Marcel Sabitzer           MF             8\n",
      "Grischa Prömel            MF             7\n",
      "Paulo Otávio              DF             7\n",
      "Kevin Mbabu               DF             7\n",
      "Xaver Schlager            MF             7\n",
      "Maximilian Arnold         MF             7\n",
      "Leon Bailey            FW,MF             6\n",
      "Djibril Sow               MF             6\n",
      "Mats Hummels              DF             6\n",
      "Jérôme Boateng            DF             6\n",
      "Kevin Kampl               MF             6\n",
      "Dayot Upamecano           DF             6\n",
      "Thomas Delaney            MF             6\n",
      "Jude Bellingham           MF             6\n",
      "Maxence Lacroix           DF             5\n",
      "Edmond Tapsoba            DF             5\n",
      "Robert Andrich            MF             5\n",
      "Emre Can               DF,MF             5\n",
      "Nadiem Amiri              MF             5\n",
      "Moussa Diaby           FW,MF             5\n",
      "David Abraham             DF             5\n",
      "Aymen Barkok           FW,MF             5\n",
      "John Brooks               DF             5\n",
      "Amin Younes            MF,FW             5\n",
      "Tuta                      DF             5\n",
      "Nordi Mukiele             DF             5\n",
      "Makoto Hasebe          DF,MF             5\n"
     ]
    }
   ],
   "source": [
    "filter = data.loc[:, 'Yellow_Cards'] > 4.0\n",
    "gelbkarten_spieler = data.loc[filter, ['Position', 'Yellow_Cards']]\n",
    "print(gelbkarten_spieler.sort_values(by='Yellow_Cards', ascending=False))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "39203c17",
   "metadata": {},
   "source": [
    "## Zusammenfassung\n",
    "\n",
    "In diesem Abschnitt haben wir uns mit einfachen statistischen Kennzahlen\n",
    "beschäftigt, die Pandas mit der Methode `.describe()` zusammenfasst, die aber\n",
    "auch einzeln über \n",
    "\n",
    "* `.count()`\n",
    "* `.mean()`\n",
    "* `.std()`\n",
    "* `.min()` und `.max()`\n",
    "* `.quantile()`\n",
    "\n",
    "berechnet und ausgegeben werden können."
   ]
  }
 ],
 "metadata": {
  "jupytext": {
   "formats": "ipynb,md:myst",
   "text_representation": {
    "extension": ".md",
    "format_name": "myst",
    "format_version": 0.13,
    "jupytext_version": "1.13.8"
   }
  },
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.0"
  },
  "source_map": [
   13,
   39,
   44,
   48,
   50,
   89,
   91,
   97,
   100,
   105,
   108,
   114,
   117,
   138,
   141,
   148,
   151,
   171,
   177,
   191,
   200,
   212,
   215,
   221,
   224,
   230,
   234
  ]
 },
 "nbformat": 4,
 "nbformat_minor": 5
}