697 lines
22 KiB
Plaintext
697 lines
22 KiB
Plaintext
{
|
||
"cells": [
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"lang": "sl"
|
||
},
|
||
"source": [
|
||
"## Napovedovanje vrednosti\n",
|
||
"\n",
|
||
"Podatkovno rudarjenje, naloga, `27. april 2025`\n",
|
||
"**`Gašper Dobrovoljc`**"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"lang": "sl"
|
||
},
|
||
"source": [
|
||
"Spoznali bomo praktično uporabo enostavnih metod nadzorovanega modeliranja oz.\n",
|
||
"napovedovanja. Skupna lastnost vseh omenjenih metod je, da s pomočjo\n",
|
||
"naključnih spremenljivk (atributov) modelirajo vrednosti posebne spremenljivke,\n",
|
||
"ki ji pravimo *razred* (v kontekstu uvrščanja v razrede, klasifikacije)\n",
|
||
"ali *odziv* (v kontekstu regresije). Osnovne razlike med kontekstoma smo\n",
|
||
"spoznali na predavanjih in vajah.\n",
|
||
"\n",
|
||
"Praktična cilja, ki ju bomo zasledovali sta:\n",
|
||
"* modeliranje ocen posameznega uporabnika (odziva) s pomočjo vseh ostalih uporabnikov,\n",
|
||
"* primerjava metod nadzorovanega modeliranja."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"lang": "sl"
|
||
},
|
||
"source": [
|
||
"### Podatki\n",
|
||
"\n",
|
||
"Opis podatkovne zbirke MovieLens ostaja enak prvi nalogi."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"lang": "sl"
|
||
},
|
||
"source": [
|
||
"### Predpriprava podatkov\n",
|
||
"\n",
|
||
"Za potrebe te naloge bomo podatke pripravili na naslednji način:\n",
|
||
"1. Izberi $m$ filmov z vsaj 100 ogledi.\n",
|
||
"2. Izberi $n$ uporabnikov, ki si je ogledalo vsaj 100 filmov.\n",
|
||
"3. Pripravi matriko $X$ velikosti $m \\times n$, kjer vrstice predstavljajo filme, stolpci pa uporabnike. Neznane vrednosti zamenjaj z $0$.\n",
|
||
"\n",
|
||
"Za vsakega od izbranih $n$ uporabnikov bo zgrajen regresijski model, \n",
|
||
"katerega cilj bo napoved ocen za filme. "
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"<table>\n",
|
||
" <tr style=\"background-color: white;\">\n",
|
||
" <td style=\"border-right: 1px solid #000;\"></td>\n",
|
||
" <td></td>\n",
|
||
" <td style=\"border-right: 1px solid #000; border-left: 1px solid #000;\">$y^{(0)}$</td>\n",
|
||
" <td colspan=3 style=\"text-align:center;\">$X^{(0)}$</td>\n",
|
||
" </tr>\n",
|
||
" <tr style=\"border-bottom: 1px solid #000;\">\n",
|
||
" <td style=\"border-right: 1px solid #000;\"></td>\n",
|
||
" <td>Film/uporabnik</td>\n",
|
||
" <td style=\"border-right: 1px solid #000; border-left: 1px solid #000;\">$u_0$</td>\n",
|
||
" <td>$u_1$</td>\n",
|
||
" <td>$u_2$</td>\n",
|
||
" <td>$\\cdots$</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <td style=\"border-right: 1px solid #000;\">${f_1}$</td>\n",
|
||
" <td>Twelve Monkeys (a.k.a. 12 Monkeys) (1995)</td>\n",
|
||
" <td style=\"border-right: 1px solid #000; border-left: 1px solid #000;\">0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>2.5</td>\n",
|
||
" <td>$\\cdots$</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <td style=\"border-right: 1px solid #000;\">${f_2}$</td>\n",
|
||
" <td>Dances with Wolves (1990) </td>\n",
|
||
" <td style=\"border-right: 1px solid #000; border-left: 1px solid #000;\">4</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>$\\cdots$</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <td style=\"border-right: 1px solid #000;\">${f_3}$</td>\n",
|
||
" <td>Apollo 13 (1995)</td>\n",
|
||
" <td style=\"border-right: 1px solid #000; border-left: 1px solid #000;\">0</td>\n",
|
||
" <td>2</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>$\\cdots$</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <td style=\"border-right: 1px solid #000;\">${f_4}$</td>\n",
|
||
" <td>Sixth Sense, The (1999)</td><td style=\"border-right: 1px solid #000; border-left: 1px solid #000;\">3</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>4</td>\n",
|
||
" <td>$\\cdots$</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <td style=\"border-right: 1px solid #000;\">$\\cdots$</td>\n",
|
||
" <td>$\\cdots$</td>\n",
|
||
" <td style=\"border-right: 1px solid #000; border-left: 1px solid #000;\">$\\cdots$</td>\n",
|
||
" <td>$\\cdots$</td>\n",
|
||
" <td>$\\cdots$</td>\n",
|
||
" <td>$\\cdots$</td>\n",
|
||
" </tr>\n",
|
||
"</table>\n",
|
||
"\n",
|
||
"<table>\n",
|
||
" <tr style=\"background-color: white;\">\n",
|
||
" <td style=\"border-right: 1px solid #000;\"></td>\n",
|
||
" <td></td>\n",
|
||
" <td style=\"border-right: 1px solid #000; border-left: 1px solid #000;\">$y^{(1)}$</td>\n",
|
||
" <td colspan=3 style=\"text-align:center;\">$X^{(1)}$</td>\n",
|
||
" </tr>\n",
|
||
" <tr style=\"border-bottom: 1px solid #000;\">\n",
|
||
" <td style=\"border-right: 1px solid #000;\"></td>\n",
|
||
" <td>Film/uporabnik</td>\n",
|
||
" <td style=\"border-right: 1px solid #000; border-left: 1px solid #000;\">$u_1$</td>\n",
|
||
" <td>$u_0$</td>\n",
|
||
" <td>$u_2$</td>\n",
|
||
" <td>$\\cdots$</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <td style=\"border-right: 1px solid #000;\">${f_1}$</td>\n",
|
||
" <td>Twelve Monkeys (a.k.a. 12 Monkeys) (1995)</td>\n",
|
||
" <td style=\"border-right: 1px solid #000; border-left: 1px solid #000;\">0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>2.5</td>\n",
|
||
" <td>$\\cdots$</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <td style=\"border-right: 1px solid #000;\">${f_2}$</td>\n",
|
||
" <td>Dances with Wolves (1990) </td>\n",
|
||
" <td style=\"border-right: 1px solid #000; border-left: 1px solid #000;\">0</td>\n",
|
||
" <td>4</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>$\\cdots$</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <td style=\"border-right: 1px solid #000;\">${f_3}$</td>\n",
|
||
" <td>Apollo 13 (1995)</td>\n",
|
||
" <td style=\"border-right: 1px solid #000; border-left: 1px solid #000;\">2</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>$\\cdots$</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <td style=\"border-right: 1px solid #000;\">${f_4}$</td>\n",
|
||
" <td>Sixth Sense, The (1999)</td><td style=\"border-right: 1px solid #000; border-left: 1px solid #000;\">0</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>4</td>\n",
|
||
" <td>$\\cdots$</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <td style=\"border-right: 1px solid #000;\">$\\cdots$</td>\n",
|
||
" <td>$\\cdots$</td>\n",
|
||
" <td style=\"border-right: 1px solid #000; border-left: 1px solid #000;\">$\\cdots$</td>\n",
|
||
" <td>$\\cdots$</td>\n",
|
||
" <td>$\\cdots$</td>\n",
|
||
" <td>$\\cdots$</td>\n",
|
||
" </tr>\n",
|
||
"</table>"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": "Razdelitev podatkov za model uporabnika $u_0$ (zgorja matrika) in uporabnika $u_1$ (spodaj matrika).\n"
|
||
},
|
||
{
|
||
"metadata": {
|
||
"ExecuteTime": {
|
||
"end_time": "2025-04-27T16:16:12.001788Z",
|
||
"start_time": "2025-04-27T16:16:11.631245Z"
|
||
}
|
||
},
|
||
"cell_type": "code",
|
||
"source": [
|
||
"import pandas as pd\n",
|
||
"\n",
|
||
"ratings = pd.read_csv('./podatki/ml-latest-small/ratings.csv')\n",
|
||
"movies = pd.read_csv('./podatki/ml-latest-small/movies.csv')"
|
||
],
|
||
"outputs": [],
|
||
"execution_count": 1
|
||
},
|
||
{
|
||
"metadata": {
|
||
"ExecuteTime": {
|
||
"end_time": "2025-04-27T16:16:12.098867Z",
|
||
"start_time": "2025-04-27T16:16:12.082840Z"
|
||
}
|
||
},
|
||
"cell_type": "code",
|
||
"source": [
|
||
"movie_rating_count = ratings.groupby('movieId')['rating'].count().reset_index()\n",
|
||
"movie_rating_count = movie_rating_count[movie_rating_count['rating'] >= 100]\n",
|
||
"\n",
|
||
"user_rating_count = ratings.groupby('userId')['rating'].count().reset_index()\n",
|
||
"user_rating_count = user_rating_count[user_rating_count['rating'] >= 100]\n",
|
||
"\n",
|
||
"filtered_ratings = ratings[ratings['movieId'].isin(movie_rating_count['movieId'])]\n",
|
||
"filtered_ratings = filtered_ratings[filtered_ratings['userId'].isin(user_rating_count['userId'])]\n",
|
||
"\n",
|
||
"matrix = filtered_ratings.pivot_table(index='movieId', columns='userId', values='rating', fill_value=0)"
|
||
],
|
||
"outputs": [],
|
||
"execution_count": 2
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"lang": "sl"
|
||
},
|
||
"source": [
|
||
"### Vprašanja"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"lang": "sl"
|
||
},
|
||
"source": [
|
||
"#### 1. Regresija (100%) \n",
|
||
"Za vsakega uporabnika postavite regresijski model. Uporabite eno ali več metod za učenje regresijskih modelov (linearna regresija, Ridge, Lasso, itd.).\n",
|
||
"\n",
|
||
"Za vsakega od $n$ uporabnikov izberite ustrezni stolpec v matriki podatkov. Za uporabnika $i$ imamo torej:\n",
|
||
"\n",
|
||
"* Vektor odziva $y^{(i)}$,\n",
|
||
"* Matriko podatkov $X^{(i)}$, ki vsebuje vse stolpce *razen* $i$.\n",
|
||
" \n",
|
||
"Za lažjo predstavo si oglej zgornji tabeli. Nekajkrat (npr., trikrat) ponovite postopek preverjanja s pomočjo učne in testne množice:\n",
|
||
"\n",
|
||
"\n",
|
||
"* Množico filmov, ki si jih je uporabnik ogledal, *naključno* razdelite v razmerju 75% (učna množica) in 25% (testna množica).\n",
|
||
"* Naučite regresijski model na učni množici (izberite ustrezne vrstice v $X$ in $y$).\n",
|
||
"* Ovrednotite model na testni množici (ponovno izberite ustrezne vrstice v $X$ in $y$).\n",
|
||
"\n",
|
||
"Oceno vrednotenja nato delite s številom poizkusov, da dobite končno oceno.\n",
|
||
"\n",
|
||
"Poročajte o uspešnosti vašega modela. Pri tem se osredotočite na naslednja vprašanja:\n",
|
||
"* Utemeljite ustrezno mero vrednotenja. Ali model dobro napoveduje ocene?\n",
|
||
"* Z izbrano mero ocenite modele za vseh $n$ uporabnikov.\n",
|
||
"\n",
|
||
"Kodo za odgovore lahko razdelite v več celic."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"metadata": {
|
||
"ExecuteTime": {
|
||
"end_time": "2025-04-27T16:16:12.948299Z",
|
||
"start_time": "2025-04-27T16:16:12.110729Z"
|
||
}
|
||
},
|
||
"source": [
|
||
"import numpy as np\n",
|
||
"from sklearn.linear_model import LinearRegression, Ridge, Lasso\n",
|
||
"from sklearn.metrics import mean_squared_error\n",
|
||
"from sklearn.model_selection import train_test_split\n",
|
||
"\n",
|
||
"def eval_user(user_id, model_cls, model_kwargs=None, num_trials=3):\n",
|
||
" if model_kwargs is None:\n",
|
||
" model_kwargs = {}\n",
|
||
"\n",
|
||
" y = matrix[user_id].values\n",
|
||
" x_others = matrix.drop(columns=user_id).values\n",
|
||
"\n",
|
||
" rated_indices = np.where(y > 0)[0]\n",
|
||
"\n",
|
||
" rmse_list = []\n",
|
||
"\n",
|
||
" for _ in range(num_trials):\n",
|
||
" train_idx, test_idx = train_test_split(rated_indices, test_size=0.25, random_state=None)\n",
|
||
"\n",
|
||
" x_train = x_others[train_idx]\n",
|
||
" y_train = y[train_idx]\n",
|
||
"\n",
|
||
" x_test = x_others[test_idx]\n",
|
||
" y_test = y[test_idx]\n",
|
||
"\n",
|
||
" model = model_cls(**model_kwargs)\n",
|
||
" model.fit(x_train, y_train)\n",
|
||
"\n",
|
||
" y_pred = model.predict(x_test)\n",
|
||
" rmse = mean_squared_error(y_test, y_pred)\n",
|
||
" rmse_list.append(rmse)\n",
|
||
"\n",
|
||
" return np.mean(rmse_list)"
|
||
],
|
||
"outputs": [],
|
||
"execution_count": 3
|
||
},
|
||
{
|
||
"metadata": {
|
||
"ExecuteTime": {
|
||
"end_time": "2025-04-27T16:16:14.998843Z",
|
||
"start_time": "2025-04-27T16:16:12.962010Z"
|
||
}
|
||
},
|
||
"cell_type": "code",
|
||
"source": [
|
||
"from tqdm import tqdm\n",
|
||
"\n",
|
||
"user_rmse_results = {}\n",
|
||
"\n",
|
||
"models = {\n",
|
||
" \"Linear\": (LinearRegression, {}),\n",
|
||
" \"Ridge\": (Ridge, {\"alpha\": 1.0}),\n",
|
||
" \"Lasso\": (Lasso, {\"alpha\": 0.1, \"max_iter\": 10000}),\n",
|
||
"}\n",
|
||
"\n",
|
||
"for model_name, (model_cls, model_kwargs) in models.items():\n",
|
||
" print(f\"Evaluating model: {model_name}\")\n",
|
||
" model_rmse = {}\n",
|
||
"\n",
|
||
" for user_id in tqdm(matrix.columns):\n",
|
||
" avg_rmse = eval_user(user_id, model_cls, model_kwargs)\n",
|
||
" if avg_rmse is not None:\n",
|
||
" model_rmse[user_id] = avg_rmse\n",
|
||
"\n",
|
||
" user_rmse_results[model_name] = model_rmse"
|
||
],
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"Evaluating model: Linear\n"
|
||
]
|
||
},
|
||
{
|
||
"name": "stderr",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"100%|██████████| 263/263 [00:00<00:00, 369.67it/s]\n"
|
||
]
|
||
},
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"Evaluating model: Ridge\n"
|
||
]
|
||
},
|
||
{
|
||
"name": "stderr",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"100%|██████████| 263/263 [00:00<00:00, 526.27it/s]\n"
|
||
]
|
||
},
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"Evaluating model: Lasso\n"
|
||
]
|
||
},
|
||
{
|
||
"name": "stderr",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"100%|██████████| 263/263 [00:00<00:00, 328.20it/s]\n"
|
||
]
|
||
}
|
||
],
|
||
"execution_count": 4
|
||
},
|
||
{
|
||
"metadata": {
|
||
"ExecuteTime": {
|
||
"end_time": "2025-04-27T16:16:15.018908Z",
|
||
"start_time": "2025-04-27T16:16:15.015890Z"
|
||
}
|
||
},
|
||
"cell_type": "code",
|
||
"source": [
|
||
"for model_name, result in user_rmse_results.items():\n",
|
||
" rmse_values = list(result.values())\n",
|
||
" avg_rmse = np.mean(rmse_values)\n",
|
||
" print(f\"{model_name} – Povprečni RMSE za vse uporabnike: {avg_rmse:.4f}\")"
|
||
],
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"Linear – Povprečni RMSE za vse uporabnike: 0.7776\n",
|
||
"Ridge – Povprečni RMSE za vse uporabnike: 0.7747\n",
|
||
"Lasso – Povprečni RMSE za vse uporabnike: 0.8388\n"
|
||
]
|
||
}
|
||
],
|
||
"execution_count": 5
|
||
},
|
||
{
|
||
"metadata": {},
|
||
"cell_type": "markdown",
|
||
"source": "Ustrezna mera vrednotenja je koren povprečne kvadratne napake (RMSE), ker večja odstopanja bolj kaznuje."
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"lang": "sl"
|
||
},
|
||
"source": [
|
||
"#### Bonus vprašanje (15%)\n",
|
||
"Ustvarite novega uporabnika, ki predstavlja vaše ocene\n",
|
||
"filmov. Ocenite nekaj filmov po lastnem okusu in preverite, kako modeli ocenijo neizbrane filme.\n",
|
||
"Ali se vam zdijo napovedi primerne?\n",
|
||
"\n",
|
||
"Kodo za odgovore lahko razdelite v več celic."
|
||
]
|
||
},
|
||
{
|
||
"metadata": {
|
||
"ExecuteTime": {
|
||
"end_time": "2025-04-27T16:16:15.055360Z",
|
||
"start_time": "2025-04-27T16:16:15.041567Z"
|
||
}
|
||
},
|
||
"cell_type": "code",
|
||
"source": [
|
||
"ratings = {\n",
|
||
" 75: 4.0,\n",
|
||
" 1: 4.8,\n",
|
||
" 316: 4.5,\n",
|
||
" 364: 4.9,\n",
|
||
" 541: 4.7,\n",
|
||
" 124: 3.7,\n",
|
||
" 3114: 4.7,\n",
|
||
" 4306: 5.0,\n",
|
||
" 5349: 4.7\n",
|
||
"}\n",
|
||
"\n",
|
||
"ratings_series = pd.Series(0, index=matrix.index, dtype=float)\n",
|
||
"for movie_id, rating in ratings.items():\n",
|
||
" if movie_id in ratings_series.index:\n",
|
||
" ratings_series.loc[movie_id] = rating\n",
|
||
"\n",
|
||
"matrix2 = matrix.copy()\n",
|
||
"matrix2[\"me\"] = ratings_series\n",
|
||
"\n",
|
||
"y_me = matrix2[\"me\"].values\n",
|
||
"x_other = matrix2.drop(columns=\"me\").values\n",
|
||
"\n",
|
||
"unrated_idx = np.where(y_me == 0)[0]\n",
|
||
"\n",
|
||
"rated_idx = np.where(y_me > 0)[0]\n",
|
||
"x_train = x_other[rated_idx]\n",
|
||
"y_train = y_me[rated_idx]\n",
|
||
"\n",
|
||
"x_test = x_other[unrated_idx]\n",
|
||
"\n",
|
||
"model = Ridge(alpha=1.0)\n",
|
||
"model.fit(x_train, y_train)\n",
|
||
"\n",
|
||
"y_pred = model.predict(x_test)\n",
|
||
"\n",
|
||
"(pd.DataFrame({\n",
|
||
" \"movieId\": matrix.index[unrated_idx],\n",
|
||
" \"predictedRating\": y_pred,\n",
|
||
"})\n",
|
||
" .merge(movies, on=\"movieId\")\n",
|
||
" .sort_values(by=\"predictedRating\", ascending=False)\n",
|
||
" .head(10))"
|
||
],
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
" movieId predictedRating \\\n",
|
||
"106 2571 4.935168 \n",
|
||
"30 356 4.903377 \n",
|
||
"129 4993 4.884689 \n",
|
||
"24 296 4.876314 \n",
|
||
"117 2959 4.859066 \n",
|
||
"133 5952 4.856036 \n",
|
||
"25 318 4.847719 \n",
|
||
"111 2762 4.834292 \n",
|
||
"40 527 4.832231 \n",
|
||
"114 2858 4.825916 \n",
|
||
"\n",
|
||
" title \\\n",
|
||
"106 Matrix, The (1999) \n",
|
||
"30 Forrest Gump (1994) \n",
|
||
"129 Lord of the Rings: The Fellowship of the Ring,... \n",
|
||
"24 Pulp Fiction (1994) \n",
|
||
"117 Fight Club (1999) \n",
|
||
"133 Lord of the Rings: The Two Towers, The (2002) \n",
|
||
"25 Shawshank Redemption, The (1994) \n",
|
||
"111 Sixth Sense, The (1999) \n",
|
||
"40 Schindler's List (1993) \n",
|
||
"114 American Beauty (1999) \n",
|
||
"\n",
|
||
" genres \n",
|
||
"106 Action|Sci-Fi|Thriller \n",
|
||
"30 Comedy|Drama|Romance|War \n",
|
||
"129 Adventure|Fantasy \n",
|
||
"24 Comedy|Crime|Drama|Thriller \n",
|
||
"117 Action|Crime|Drama|Thriller \n",
|
||
"133 Adventure|Fantasy \n",
|
||
"25 Crime|Drama \n",
|
||
"111 Drama|Horror|Mystery \n",
|
||
"40 Drama|War \n",
|
||
"114 Drama|Romance "
|
||
],
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>movieId</th>\n",
|
||
" <th>predictedRating</th>\n",
|
||
" <th>title</th>\n",
|
||
" <th>genres</th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>106</th>\n",
|
||
" <td>2571</td>\n",
|
||
" <td>4.935168</td>\n",
|
||
" <td>Matrix, The (1999)</td>\n",
|
||
" <td>Action|Sci-Fi|Thriller</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>30</th>\n",
|
||
" <td>356</td>\n",
|
||
" <td>4.903377</td>\n",
|
||
" <td>Forrest Gump (1994)</td>\n",
|
||
" <td>Comedy|Drama|Romance|War</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>129</th>\n",
|
||
" <td>4993</td>\n",
|
||
" <td>4.884689</td>\n",
|
||
" <td>Lord of the Rings: The Fellowship of the Ring,...</td>\n",
|
||
" <td>Adventure|Fantasy</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>24</th>\n",
|
||
" <td>296</td>\n",
|
||
" <td>4.876314</td>\n",
|
||
" <td>Pulp Fiction (1994)</td>\n",
|
||
" <td>Comedy|Crime|Drama|Thriller</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>117</th>\n",
|
||
" <td>2959</td>\n",
|
||
" <td>4.859066</td>\n",
|
||
" <td>Fight Club (1999)</td>\n",
|
||
" <td>Action|Crime|Drama|Thriller</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>133</th>\n",
|
||
" <td>5952</td>\n",
|
||
" <td>4.856036</td>\n",
|
||
" <td>Lord of the Rings: The Two Towers, The (2002)</td>\n",
|
||
" <td>Adventure|Fantasy</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>25</th>\n",
|
||
" <td>318</td>\n",
|
||
" <td>4.847719</td>\n",
|
||
" <td>Shawshank Redemption, The (1994)</td>\n",
|
||
" <td>Crime|Drama</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>111</th>\n",
|
||
" <td>2762</td>\n",
|
||
" <td>4.834292</td>\n",
|
||
" <td>Sixth Sense, The (1999)</td>\n",
|
||
" <td>Drama|Horror|Mystery</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>40</th>\n",
|
||
" <td>527</td>\n",
|
||
" <td>4.832231</td>\n",
|
||
" <td>Schindler's List (1993)</td>\n",
|
||
" <td>Drama|War</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>114</th>\n",
|
||
" <td>2858</td>\n",
|
||
" <td>4.825916</td>\n",
|
||
" <td>American Beauty (1999)</td>\n",
|
||
" <td>Drama|Romance</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"</div>"
|
||
]
|
||
},
|
||
"execution_count": 6,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"execution_count": 6
|
||
},
|
||
{
|
||
"metadata": {},
|
||
"cell_type": "markdown",
|
||
"source": "Napovedi se mi zdijo smiselne, saj bi predlagane filme ocenil podobno kot predvidene ocene."
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"lang": "sl"
|
||
},
|
||
"source": [
|
||
"### Zapiski\n",
|
||
"\n",
|
||
"Implementacijo, opis in vrednotenje metod za nadzorovanjo učenje vsebujejo knjižnice `sklearn` ali `Orange`."
|
||
]
|
||
}
|
||
],
|
||
"metadata": {
|
||
"kernelspec": {
|
||
"display_name": "Python 3",
|
||
"language": "python",
|
||
"name": "python3"
|
||
},
|
||
"language_info": {
|
||
"codemirror_mode": {
|
||
"name": "ipython",
|
||
"version": 3
|
||
},
|
||
"file_extension": ".py",
|
||
"mimetype": "text/x-python",
|
||
"name": "python",
|
||
"nbconvert_exporter": "python",
|
||
"pygments_lexer": "ipython3",
|
||
"version": "3.6.9"
|
||
},
|
||
"latex_envs": {
|
||
"LaTeX_envs_menu_present": true,
|
||
"autocomplete": true,
|
||
"bibliofile": "biblio.bib",
|
||
"cite_by": "apalike",
|
||
"current_citInitial": 1,
|
||
"eqLabelWithNumbers": true,
|
||
"eqNumInitial": 1,
|
||
"hotkeys": {
|
||
"equation": "Ctrl-E",
|
||
"itemize": "Ctrl-I"
|
||
},
|
||
"labels_anchors": false,
|
||
"latex_user_defs": false,
|
||
"report_style_numbering": false,
|
||
"user_envs_cfg": false
|
||
},
|
||
"nbTranslate": {
|
||
"displayLangs": [
|
||
"sl"
|
||
],
|
||
"hotkey": "alt-t",
|
||
"langInMainMenu": true,
|
||
"sourceLang": "sl",
|
||
"targetLang": "en",
|
||
"useGoogleTranslate": true
|
||
}
|
||
},
|
||
"nbformat": 4,
|
||
"nbformat_minor": 2
|
||
}
|