{ "cells": [ { "cell_type": "markdown", "id": "31c4b0ff", "metadata": {}, "source": [ "# 8.3 Statistik mit Pandas\n", "\n", "## Lernziele\n", "\n", "```{admonition} Lernziele\n", ":class: admonition-goals\n", "* Sie können sich mit **describe** eine Übersicht über statistische Kennzahlen\n", " verschaffen.\n", "* Sie wissen, wie Sie die Anzahl der gültigen Einträge mit **count** ermitteln.\n", "* Sie kennen die statistischen Kennzahlen Mittelwert und Standardabweichung und\n", " wissen, wie diese mit **mean** und **std** berechnet werden.\n", "* Sie können das Minimum und das Maximum mit **min** und **max** bestimmen.\n", "* Sie wissen wie ein Quantil interpretiert wird und wie es mit **quantile**\n", " berechnet wird.\n", "```\n", "\n", "\n", "## Schnelle Übersicht mit .describe()\n", "\n", "So wie die Methode `.info()` uns einen schnellen Überblick über die Daten eines\n", "DataFrame-Objektes gibt, so liefert die Methode `.describe()` eine schnelle\n", "Übersicht über statistische Kennzahlen. Wir bleiben bei unserem Beispiel der\n", "Spielerdaten der Top7-Fußballvereine der Bundesligasaison 2020/21." ] }, { "cell_type": "code", "execution_count": 1, "id": "1f1c3a90", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ClubNationalityPositionAgeMatchesStartsMinsGoalsAssistsPenalty_GoalsPenalty_AttemptedxGxAYellow_CardsRed_Cards
Name
Manuel NeuerBayern MunichGERGK343333297000000.000.0110
Thomas MüllerBayern MunichGERMF30323126741119110.240.3900
David AlabaBayern MunichAUTDF,MF283230267524000.040.0840
Jérôme BoatengBayern MunichGERDF312929236811000.010.0260
Robert LewandowskiBayern MunichPOLFW3129282458417891.160.1340
Joshua KimmichBayern MunichGERMF2527252194410000.100.2740
Kingsley ComanBayern MunichFRAFW,MF2429231752510000.210.3410
Benjamin PavardBayern MunichFRADF242422194300000.020.0930
Alphonso DaviesBayern MunichCANDF192322176312000.010.0421
Serge GnabryBayern MunichGERFW,MF2527201644102000.440.2540
\n", "
" ], "text/plain": [ " Club Nationality Position Age Matches Starts \\\n", "Name \n", "Manuel Neuer Bayern Munich GER GK 34 33 33 \n", "Thomas Müller Bayern Munich GER MF 30 32 31 \n", "David Alaba Bayern Munich AUT DF,MF 28 32 30 \n", "Jérôme Boateng Bayern Munich GER DF 31 29 29 \n", "Robert Lewandowski Bayern Munich POL FW 31 29 28 \n", "Joshua Kimmich Bayern Munich GER MF 25 27 25 \n", "Kingsley Coman Bayern Munich FRA FW,MF 24 29 23 \n", "Benjamin Pavard Bayern Munich FRA DF 24 24 22 \n", "Alphonso Davies Bayern Munich CAN DF 19 23 22 \n", "Serge Gnabry Bayern Munich GER FW,MF 25 27 20 \n", "\n", " Mins Goals Assists Penalty_Goals Penalty_Attempted \\\n", "Name \n", "Manuel Neuer 2970 0 0 0 0 \n", "Thomas Müller 2674 11 19 1 1 \n", "David Alaba 2675 2 4 0 0 \n", "Jérôme Boateng 2368 1 1 0 0 \n", "Robert Lewandowski 2458 41 7 8 9 \n", "Joshua Kimmich 2194 4 10 0 0 \n", "Kingsley Coman 1752 5 10 0 0 \n", "Benjamin Pavard 1943 0 0 0 0 \n", "Alphonso Davies 1763 1 2 0 0 \n", "Serge Gnabry 1644 10 2 0 0 \n", "\n", " xG xA Yellow_Cards Red_Cards \n", "Name \n", "Manuel Neuer 0.00 0.01 1 0 \n", "Thomas Müller 0.24 0.39 0 0 \n", "David Alaba 0.04 0.08 4 0 \n", "Jérôme Boateng 0.01 0.02 6 0 \n", "Robert Lewandowski 1.16 0.13 4 0 \n", "Joshua Kimmich 0.10 0.27 4 0 \n", "Kingsley Coman 0.21 0.34 1 0 \n", "Benjamin Pavard 0.02 0.09 3 0 \n", "Alphonso Davies 0.01 0.04 2 1 \n", "Serge Gnabry 0.44 0.25 4 0 " ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas as pd\n", "\n", "data = pd.read_csv('bundesliga_top7_offensive.csv', index_col=0)\n", "data.head(10)" ] }, { "cell_type": "markdown", "id": "e17c54de", "metadata": {}, "source": [ "Die Anwendung der `.describe()`-Methode liefert fogende Ausgabe:" ] }, { "cell_type": "code", "execution_count": 2, "id": "fded62b0", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
AgeMatchesStartsMinsGoalsAssistsPenalty_GoalsPenalty_AttemptedxGxAYellow_CardsRed_Cards
count177.000000177.000000177.000000177.000000177.000000177.000000177.000000177.000000177.000000177.000000177.000000177.000000
mean24.90395519.85875714.7401131321.6045202.5423732.0056500.2146890.2711860.1575710.1063282.2542370.056497
std4.30998310.11621910.526494899.8438574.9116813.1179410.9765481.1154470.2269890.1309002.2582670.231534
min15.0000001.0000000.0000005.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.000000
25%22.00000013.0000005.000000456.0000000.0000000.0000000.0000000.0000000.0200000.0100000.0000000.000000
50%25.00000022.00000015.0000001359.0000001.0000001.0000000.0000000.0000000.0900000.0800002.0000000.000000
75%28.00000029.00000024.0000002044.0000003.0000003.0000000.0000000.0000000.2200000.1600004.0000000.000000
max36.00000034.00000034.0000003060.00000041.00000019.0000008.0000009.0000002.0200001.23000010.0000001.000000
\n", "
" ], "text/plain": [ " Age Matches Starts Mins Goals \\\n", "count 177.000000 177.000000 177.000000 177.000000 177.000000 \n", "mean 24.903955 19.858757 14.740113 1321.604520 2.542373 \n", "std 4.309983 10.116219 10.526494 899.843857 4.911681 \n", "min 15.000000 1.000000 0.000000 5.000000 0.000000 \n", "25% 22.000000 13.000000 5.000000 456.000000 0.000000 \n", "50% 25.000000 22.000000 15.000000 1359.000000 1.000000 \n", "75% 28.000000 29.000000 24.000000 2044.000000 3.000000 \n", "max 36.000000 34.000000 34.000000 3060.000000 41.000000 \n", "\n", " Assists Penalty_Goals Penalty_Attempted xG xA \\\n", "count 177.000000 177.000000 177.000000 177.000000 177.000000 \n", "mean 2.005650 0.214689 0.271186 0.157571 0.106328 \n", "std 3.117941 0.976548 1.115447 0.226989 0.130900 \n", "min 0.000000 0.000000 0.000000 0.000000 0.000000 \n", "25% 0.000000 0.000000 0.000000 0.020000 0.010000 \n", "50% 1.000000 0.000000 0.000000 0.090000 0.080000 \n", "75% 3.000000 0.000000 0.000000 0.220000 0.160000 \n", "max 19.000000 8.000000 9.000000 2.020000 1.230000 \n", "\n", " Yellow_Cards Red_Cards \n", "count 177.000000 177.000000 \n", "mean 2.254237 0.056497 \n", "std 2.258267 0.231534 \n", "min 0.000000 0.000000 \n", "25% 0.000000 0.000000 \n", "50% 2.000000 0.000000 \n", "75% 4.000000 0.000000 \n", "max 10.000000 1.000000 " ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data.describe()" ] }, { "cell_type": "markdown", "id": "e16fc542", "metadata": {}, "source": [ "Da es sich eingebürgert hat, Daten zeilenweise abzuspeichern und die Eigenschaft\n", "pro einzelnem Datensatz in den Spalten zu speichern, wertet `.describe()` jede\n", "Spalte für sich aus. Für jede Eigenschaft werden dann die statistischen\n", "Kennzahlen\n", "\n", "* count\n", "* mean\n", "* std\n", "* min\n", "* max\n", "* Quantile 25 %, 50 % und 75 %\n", "* max\n", "\n", "ausgegeben.\n", "\n", "Die Bedeutung der Kennzahlen wird in der\n", "[Pandas-Dokumentation/DataFrame.describe\n", "](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.describe.html)\n", "erläutert. Wir gehen dennoch jede Kennzahl einzeln durch.\n", "\n", "\n", "## Anzahl count\n", "\n", "Mit `.count()` wird die Anzahl der Einträge bestimmt, die *nicht* 'NA' sind. Der\n", "Begriff 'NA' stammt dabei aus dem Bereich Data Science. Gemeint sind fehlende\n", "Einträge, wobei die fehlenden Einträge verschiedene Ursachen haben können:\n", "\n", "* NA = not available (der Messsensor hat versagt)\n", "* NA = not applicable (es ist sinnlos bei einem Mann nachzufragen, ob er\n", " schwanger ist)\n", "* NA = no answer (eine Person hat bei dem Umfrage nichts angegeben)\n", "\n", "Wir können auch direkt auf diesen Wert zugreifen, wenn wir beispielsweise wissen\n", "wollen, bei wie vielen Fußballspielern ein Alter eingetragen ist. Wird die\n", "Methode `.count()` direkt auf den kompletten DataFrame angewendet, so erhalten\n", "wir ein Pandas-Series-Objekt." ] }, { "cell_type": "code", "execution_count": 3, "id": "0f40c4da", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Club 177\n", "Nationality 177\n", "Position 177\n", "Age 177\n", "Matches 177\n", "Starts 177\n", "Mins 177\n", "Goals 177\n", "Assists 177\n", "Penalty_Goals 177\n", "Penalty_Attempted 177\n", "xG 177\n", "xA 177\n", "Yellow_Cards 177\n", "Red_Cards 177\n", "dtype: int64\n" ] } ], "source": [ "print( data.count() )" ] }, { "cell_type": "markdown", "id": "7a8fc8e7", "metadata": {}, "source": [ "Um jetzt an die Anzahl gültiger Altersangaben zu kommen, können wir entweder\n", "erst die Spalte mit dem Alter heraussgreifen und darauf `.count()` anwenden." ] }, { "cell_type": "code", "execution_count": 4, "id": "8d71fef3", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "177\n" ] } ], "source": [ "methode01 = data.loc[:, 'Age'].count()\n", "print(methode01)" ] }, { "cell_type": "markdown", "id": "9fabbbf2", "metadata": {}, "source": [ "Oder wir wenden zuerst `.count()`an und wählen dann im Series-Objekt das Alter\n", "'Age' aus." ] }, { "cell_type": "code", "execution_count": 5, "id": "5310740f", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "177\n" ] } ], "source": [ "methode02 = data.count().loc['Age']\n", "print(methode02)" ] }, { "cell_type": "markdown", "id": "e1799785", "metadata": {}, "source": [ "## Mittelwert mean\n", "\n", "Mittelwert heißt auf Englisch mean. Daher ist es nicht verwunderlich, dass die Methode `.mean()` den Mittelwert der Einträge in jeder Spalte berechnet." ] }, { "cell_type": "code", "execution_count": 6, "id": "10cc8d4f", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Age 24.903955\n", "Matches 19.858757\n", "Starts 14.740113\n", "Mins 1321.604520\n", "Goals 2.542373\n", "Assists 2.005650\n", "Penalty_Goals 0.214689\n", "Penalty_Attempted 0.271186\n", "xG 0.157571\n", "xA 0.106328\n", "Yellow_Cards 2.254237\n", "Red_Cards 0.056497\n", "dtype: float64\n" ] } ], "source": [ "mittelwert = data.mean(numeric_only=True)\n", "print(mittelwert)" ] }, { "cell_type": "markdown", "id": "495b6d28", "metadata": {}, "source": [ "An der Stelle ist es wichtig, die Option `numeric_only=True` zu setzen, damit\n", "nur von numerischen Werten, also Zahlen, der Mittelwert gebildet wird.\n", "\n", "Wir entnehmen der Statistik, dass Fußballer der Top7-Vereine im Mittel 24.9\n", "Jahre alt sind und 1321.6 Minuten im Einsatz waren.\n", "\n", "Falls Sie prinzipiell nochmal die Berechnung des Mittelwertes wiederholen\n", "wollen, können Sie folgendes Video ansehen.\n", "\n", "\n", "\n", "\n", "## Standardabweichung std\n", "\n", "Das 'st' in `.std()`für Standard steht, ist nachvollziehbar. Der dritte\n", "Buchstabe 'd' kommt von 'deviation', also Abweichung. Somit ist wiederum die\n", "Methode nach dem englischen Fachbegriff 'standard deviation' benannt. Welche\n", "Standardabweichung erhalten wir beim Alter?" ] }, { "cell_type": "code", "execution_count": 7, "id": "6f1f1118", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Age 4.309983\n", "Matches 10.116219\n", "Starts 10.526494\n", "Mins 899.843857\n", "Goals 4.911681\n", "Assists 3.117941\n", "Penalty_Goals 0.976548\n", "Penalty_Attempted 1.115447\n", "xG 0.226989\n", "xA 0.130900\n", "Yellow_Cards 2.258267\n", "Red_Cards 0.231534\n", "dtype: float64\n" ] } ], "source": [ "standardabweichung = data.std(numeric_only=True)\n", "print(standardabweichung)" ] }, { "cell_type": "markdown", "id": "61d7254a", "metadata": {}, "source": [ "Es sind 4.3 Jahre. Das haben wir jetzt der Ausgabe abgelsen. Wenn wir den Wert\n", "extrahieren wollen, gibt es wieder die beiden Methoden. Entweder erst Spalte und\n", "dann `.std()` oder erst `.std()`und dann Selektion nach 'Age'. Probieren wir es\n", "aus." ] }, { "cell_type": "code", "execution_count": 8, "id": "75738264", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "4.309982619427105\n" ] } ], "source": [ "alter_std = data.loc[:, 'Age'].std()\n", "print(alter_std) " ] }, { "cell_type": "markdown", "id": "7828cf0a", "metadata": {}, "source": [ "Was war eigentlich nochmal die Standardabweichung? Falls Sie dazu eine kurze\n", "Wiederholung der Theorie benötigen, empfehle ich Ihnen dieses Video.\n", "\n", "\n", "\n", "\n", "## Minimum und Maximum mit min und max\n", "\n", "Die Namen der Methoden `.min()` und `max()` sind fast schon wieder\n", "selbsterklärend. Die Methode `.min()` liefert den kleinsten Werte zurück, der in\n", "einer Spalte gefunden wird. Umgekehrt liefert `.max()` den größten Eintrag, der\n", "in jeder Spalte gefunden wird. Wie häufig die minimalen und maximalen Werte\n", "vorkommen, ist dabei egal. \n", "\n", "Schauen wir uns an, was die minimale Anzahl von Toren ist, die geschossen wurden\n", "(haben Sie eine Vermutung). Und dann schauen wir gleich nach, was die maximale\n", "Anzahl von Toren ist." ] }, { "cell_type": "code", "execution_count": 9, "id": "0bf896e8", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0\n", "41\n" ] } ], "source": [ "tore_min = data.loc[:, 'Goals'].min()\n", "print(tore_min)\n", "\n", "tore_max = data.loc[:, 'Goals'].max()\n", "print(tore_max)" ] }, { "cell_type": "markdown", "id": "77645b31", "metadata": {}, "source": [ "Wenig verwunderlich ist die minimale Anzahl an Toren 0 und die maximale Anzahl\n", "an Toren, die ein oder mehrere Spieler der Top7 2020/21 geschossen haben, war\n", "41. (Wahrscheinlich wissen Sie aber, dass nur ein Spieler 41 Tore geschafft hat,\n", "natürlich Lewandowski).\n", "\n", "Von Verteidigern wird nicht erwartet, Tore zu schieen, sondern von Stürmern. Was\n", "ist denn das Minimum an Toren bei den Stürmern? Die Positionen sind in der\n", "Spalte 'Position'. Dabei bedeutet FW = forward = Stürmer, MF = mid field =\n", "Mittelfeld, DF = defensive = Verteidigung und GK = goalkeeper = Torwart. Bei\n", "manchen Spielern stehen zwei Positionen, konzentrieren wir uns auf diejenigen,\n", "bei denen nur 'FW' eingetragen ist." ] }, { "cell_type": "code", "execution_count": 10, "id": "6ca72379", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Stürmer\n", "Name\n", "Robert Lewandowski 41\n", "Joshua Zirkzee 0\n", "Mickaël Cuisance 0\n", "Erling Haaland 27\n", "Steffen Tigges 0\n", "Yussuf Poulsen 5\n", "Alexander Sørloth 5\n", "Justin Kluivert 3\n", "Wout Weghorst 20\n", "Josip Brekalo 7\n", "André Silva 28\n", "Bas Dost 4\n", "Patrik Schick 9\n", "Lucas Alario 11\n", "Demarai Gray 1\n", "Paulinho 0\n", "Taiwo Awoniyi 5\n", "Joel Pohjanpalo 6\n", "Petar Musa 1\n", "Akaki Gogia 0\n", "Leon Dajaku 0\n", "Joshua Mees 0\n", "Name: Goals, dtype: int64\n", "==============\n", "Minimale Tore: 0\n" ] } ], "source": [ "filter = data.loc[:, 'Position'] == 'FW'\n", "stuermer = data.loc[filter, 'Goals']\n", "\n", "print('Stürmer')\n", "print(stuermer)\n", "\n", "print('==============')\n", "print('Minimale Tore: {}'.format(stuermer.min()))" ] }, { "cell_type": "markdown", "id": "6294ae43", "metadata": {}, "source": [ "## Quantil mit quantile\n", "\n", "Das Quantil $p \\%$ ist der Wert, bei dem $p %$ der Einträge kleiner als diese\n", "Zahl sind und $100 \\% - p \\%$ sind größer. Meist werden nicht Prozentzahlen\n", "verwendet, sondern p ist zwischen 0 und 1, wobei die 1 für 100 % steht. \n", "\n", "Angenommen, wir würden gerne das 0.5-Quantil (auch Median genannt) der gelben\n", "Karten wissen. Mit der Methode `.quantile()` können wir diesen Wert leicht aus\n", "den Daten holen." ] }, { "cell_type": "code", "execution_count": 11, "id": "560104a9", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "2.0\n" ] } ], "source": [ "gelbe_karten_50prozent_quantil = data.loc[:, 'Yellow_Cards'].quantile(0.5)\n", "print(gelbe_karten_50prozent_quantil)" ] }, { "cell_type": "markdown", "id": "206ac6a9", "metadata": {}, "source": [ "Das 50 % -Quantil liegt bei 2 gelben Karten. 50 % aller Spieler haben also\n", "weniger als 2 gelbe Karten kassiert. Und 50 % aller Spieler haben 2 oder mehr\n", "gelbe Karten kassiert. Wir schauen uns jetzt das 75 % Quantil an." ] }, { "cell_type": "code", "execution_count": 12, "id": "743a0675", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "4.0\n" ] } ], "source": [ "gelbe_karten_75prozent_quantil = data.loc[:, 'Yellow_Cards'].quantile(0.75)\n", "print(gelbe_karten_75prozent_quantil)" ] }, { "cell_type": "markdown", "id": "d45c0800", "metadata": {}, "source": [ "75 % aller Spieler haben weniger als 4 gelbe Karten bekommen. SChauen wir uns\n", "die Gelbkarten-Spieler an. Ob da vielleicht mehrheitlich Defensivspieler dabei\n", "sind?" ] }, { "cell_type": "code", "execution_count": 13, "id": "a7ec9a3d", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " Position Yellow_Cards\n", "Name \n", "Sebastian Rode MF 10\n", "Obite N'Dicka DF 10\n", "Christopher Trimmel DF 9\n", "Marcel Sabitzer MF 8\n", "Grischa Prömel MF 7\n", "Paulo Otávio DF 7\n", "Kevin Mbabu DF 7\n", "Xaver Schlager MF 7\n", "Maximilian Arnold MF 7\n", "Leon Bailey FW,MF 6\n", "Djibril Sow MF 6\n", "Mats Hummels DF 6\n", "Jérôme Boateng DF 6\n", "Kevin Kampl MF 6\n", "Dayot Upamecano DF 6\n", "Thomas Delaney MF 6\n", "Jude Bellingham MF 6\n", "Maxence Lacroix DF 5\n", "Edmond Tapsoba DF 5\n", "Robert Andrich MF 5\n", "Emre Can DF,MF 5\n", "Nadiem Amiri MF 5\n", "Moussa Diaby FW,MF 5\n", "David Abraham DF 5\n", "Aymen Barkok FW,MF 5\n", "John Brooks DF 5\n", "Amin Younes MF,FW 5\n", "Tuta DF 5\n", "Nordi Mukiele DF 5\n", "Makoto Hasebe DF,MF 5\n" ] } ], "source": [ "filter = data.loc[:, 'Yellow_Cards'] > 4.0\n", "gelbkarten_spieler = data.loc[filter, ['Position', 'Yellow_Cards']]\n", "print(gelbkarten_spieler.sort_values(by='Yellow_Cards', ascending=False))" ] }, { "cell_type": "markdown", "id": "39203c17", "metadata": {}, "source": [ "## Zusammenfassung\n", "\n", "In diesem Abschnitt haben wir uns mit einfachen statistischen Kennzahlen\n", "beschäftigt, die Pandas mit der Methode `.describe()` zusammenfasst, die aber\n", "auch einzeln über \n", "\n", "* `.count()`\n", "* `.mean()`\n", "* `.std()`\n", "* `.min()` und `.max()`\n", "* `.quantile()`\n", "\n", "berechnet und ausgegeben werden können." ] } ], "metadata": { "jupytext": { "formats": "ipynb,md:myst", "text_representation": { "extension": ".md", "format_name": "myst", "format_version": 0.13, "jupytext_version": "1.13.8" } }, "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.0" }, "source_map": [ 13, 39, 44, 48, 50, 89, 91, 97, 100, 105, 108, 114, 117, 138, 141, 148, 151, 171, 177, 191, 200, 212, 215, 221, 224, 230, 234 ] }, "nbformat": 4, "nbformat_minor": 5 }