{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# 结课项目\n", "\n", "在此项目中,你将扮演侦探,运用你的机器学习技能构建一个算法,通过公开的安然财务和邮件数据集,找出有欺诈嫌疑的安然雇员。\n", "\n", "## 我要如何完成此项目?\n", "\n", "在开始之前,你应该注意,此迷你项目需要大量数据点才能给出直观的结果,并且良好地运行起来。 **此项目更为棘手的原因在于,我们使用了真实的数据,这些数据可以是杂乱无章的,而且在进行机器学习时不具有我们所希望的大量数据点。** \n", "\n", "不要失去信心——作为数据分析师,你只需要习惯不完美的数据!如果你遇到之前没有见过的事物,请退后一步想想聪明的解决之道。要相信自己!\n", "\n", "## 项目概述\n", "\n", "安然曾是 2000 年美国最大的公司之一。2002 年,由于其存在大量的企业欺诈行为,这个昔日的大集团土崩瓦解。 在随后联邦进行的调查过程中,大量有代表性的保密信息进入了公众的视线,包括成千上万涉及高管的邮件和详细的财务数据。 \n", "\n", "你将在此项目中扮演侦探,运用你的新技能,**根据安然丑闻中公开的财务和邮件数据来构建相关人士识别符。** 为了协助你进行侦查工作,我们已将数据与手动整理出来的欺诈案涉案人员列表进行了合并, 这意味着被起诉的人员要么达成和解,要么向政府签署认罪协议,再或者出庭作证以获得免受起诉的豁免权。\n", "\n", "### 相关文件如下所示:\n", "\n", "*poi_id.py*:用于 POI 识别符的初始代码,你将在此处撰写你的分析报告。你也将提交此文件的副本,用于评估人员检验你的算法和结果。\n", "\n", "*final_project_dataset.pkl*:项目数据集,详情如下。\n", "\n", "*tester.py*:在你提交供优达学城评估的分析报告时,你将随附算法、数据集和你使用的特征列表(这些是在 poi_id.py 中自动创建的)。 评估人员将在此后使用这一代码来测试你的结果,以确保性能与你在报告中所述类似。你无需处理这一代码,我们只是将它呈现出来供你参考。\n", "\n", "*emails_by_address*:该目录包含许多文本文件,每个文件又包含特定邮箱的往来邮件。 你可以进行参考,并且可以根据邮件数据集的详细信息创建更多的高级特征。你无需处理电子邮件语料库来完成项目。\n", "\n", "## 迈向成功\n", "\n", "我们将给予你可读入数据的初始代码,将你选择的特征放入 numpy 数组中,该数组是大多数 sklearn 函数假定的输入表单。 **你要做的就是设计特征,选择并调整算法,用以测试和评估识别符。** 我们在设计数个迷你项目之初就想到了这个最终的项目,因此请记得借助你已完成的工作成果。\n", "\n", "在预处理此项目时,我们已将安然邮件和财务数据与字典结合在一起,字典中的每对键值对应一个人。 字典键是人名,值是另一个字典(包含此人的所有特征名和对应的值)。 数据中的特征分为三大类,即财务特征、邮件特征和 POI 标签。\n", "\n", "**财务特征**: ['salary', 'deferral_payments', 'total_payments', 'loan_advances', 'bonus', 'restricted_stock_deferred', 'deferred_income', 'total_stock_value', 'expenses', 'exercised_stock_options', 'other', 'long_term_incentive', 'restricted_stock', 'director_fees'] (单位均是美元)\n", "\n", "**邮件特征**: ['to_messages', 'email_address', 'from_poi_to_this_person', 'from_messages', 'from_this_person_to_poi', 'shared_receipt_with_poi'] (单位通常是电子邮件的数量,明显的例外是 ‘email_address’,这是一个字符串)\n", "\n", "**POI 标签**: [‘poi’] (boolean,整数)\n", "\n", "我们鼓励你在启动器功能中制作,转换或重新调整新函数功能。如果这样做,你应该把新功能存储到my_dataset,如果你想在最终算法中使用新功能,你还应该将功能名称添加到 my_feature_list,以便于你的评估者可以在测试期间访问它。关于如何在数据集中添加具体的新要素的例子,可以参考“特征选择”这一课。\n", "\n", "此外,我们还建议你可以在完成项目过程中做一些记号。你可以写出系列问题的答案(在下一页),将这个作为提交的项目的一部分,以便于评估者了解到你对于不同方面分析的方法。你的思维过程在很大程度上比你的最终项目更重要,我们将通过你在这些问题的解答中了解你的思维过程。\n", "\n", "# 开始\n", "\n", "首先把所有需要导入的模块都放在第一个 cell,只需要导入一次。" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "ExecuteTime": { "end_time": "2019-09-15T08:57:42.573006Z", "start_time": "2019-09-15T08:57:40.517186Z" } }, "outputs": [], "source": [ "#!/usr/bin/python\n", "\n", "# poi_id.py\n", "import sys\n", "from sklearn.model_selection import train_test_split\n", "from sklearn.naive_bayes import GaussianNB\n", "from tester import dump_classifier_and_data\n", "sys.path.append(\"../tools/\")\n", "from feature_format import featureFormat, targetFeatureSplit\n", "import pickle\n", "from poi_email_addresses import poiEmails\n", "import matplotlib.pyplot as plt\n", "%matplotlib inline\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Task1:确定需要使用的特征\n", "\n", "这个部分我初步认为欺诈案嫌疑人的特征首先会体现在收入水平上,所以先使用薪资、奖金和总股票持有量这三个财务特征。稍后再看邮件特征。" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "ExecuteTime": { "end_time": "2019-09-15T08:57:54.296374Z", "start_time": "2019-09-15T08:57:54.287372Z" } }, "outputs": [], "source": [ "# Task 1: Select what features you'll use.\n", "# features_list is a list of strings, each of which is a feature name.\n", "# The first feature must be \"poi\".\n", "\n", "features_list = ['poi', 'salary', 'bonus', 'total_payments'] # You will need to use more features\n", "\n", "# Load the dictionary containing the dataset\n", "with open(\"final_project_dataset.pkl\", \"rb\") as data_file:\n", " data_dict = pickle.load(data_file)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Task2:移除数据中的异常值" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "ExecuteTime": { "end_time": "2019-09-15T08:59:49.667494Z", "start_time": "2019-09-15T08:59:48.491166Z" } }, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAABI8AAABzCAYAAAAR6iJzAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjAsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+17YcXAAAR9klEQVR4nO3de7BdZX3G8e9DEtNSEQTiDZHQGsF4QzlArbY1RQZEx8BUJdQLVhwFxaKUaakdL4Ojo9bGesViRYMtl4C3SK0OkeBlRjQniCLG1KhRolEiUQLeMPDrH2ulHI97kZ2cfc5J9vl+ZvbsvdZ617t+e3jPYu8n71o7VYUkSZIkSZLUy17TXYAkSZIkSZJ2X4ZHkiRJkiRJ6mR4JEmSJEmSpE6GR5IkSZIkSepkeCRJkiRJkqROhkeSJEmSJEnqZHgkSZI0QEkqycOnuw5JkqRBMTySJEmSJElSJ8MjSZKk3UCS2dNdgyRJUi+GR5IkSR2S/GOSHya5Pcm6JMcmOTrJl5L8PMmmJO9Ocp+O/Z+e5KtJtia5Ocnrx2yb317idnqSHwDXJPnvJK8Y18fXk5w0ue9UkiSpm+GRJElSD0kOA84CjqqqfYDjgQ3AXcCrgAOBJwLHAi/r6OYXwAuA/YCnA2f2CIL+Enhk2/8y4HljangccBDwqYG8KUmSpF1geCRJktTbXcBcYGGSOVW1oaq+U1Vrquq6qtpWVRuAf6cJgH5PVV1bVTdW1d1V9XXg0h5tX19Vv6iqXwGfABYkWdBuez5weVXdORlvUJIkqR+GR5IkST1U1XrglcDrgVuSXJbkIUkekeSqJD9OshV4E80spN+T5Jgkq5JsTnIbcEaPtjePOeZvgOXA85LsBZwKfHjgb06SJGknGB5JkiR1qKpLqurJwCFAAW8BLgC+BSyoqvsBrwbS0cUlwArg4KraF3hfj7Y1bnkZ8Fyay+F+WVVfGsR7kSRJ2lWGR5IkST0kOSzJXyWZC/wa+BXNpWz7AFuBO5IcDpx5L93sA2ypql8nORr4mx0dtw2L7gb+FWcdSZKk3YDhkSRJUm9zgTcDPwV+DDyAZpbRuTQh0O3A+4HL76WPlwHnJ7kdeC3NJWn9uBh4DPCfu1S5JEnSAKVq/ExpSZIkTackLwBe0l4yJ0mSNK2ceSRJkrQbSbI3zYylC6e7FkmSJDA8kiRJ2m0kOR7YDPyE5mbbkiRJ087L1iRJkiRJktTJmUeSJEmSJEnqNHsQnSQ5AXgHMAv4j6p687jtc2l+NeRI4FbglKrakGQ+sBZY1za9rqrO2NHxDjzwwJo/f/4gSpckSZIkSRKwZs2an1bVvPHrJxweJZkFvAc4DtgIrE6yoqq+OabZ6cDPqurhSZYAbwFOabd9p6qO2Jljzp8/n9HR0YmWLkmSJEmSpFaS7/daP4jL1o4G1lfVd6vqTuAyYPG4NouBZe3rK4Fjk2QAx5YkSZIkSdIkGkR4dBBw85jlje26nm2qahtwG3BAu+3QJF9N8rkkf951kCQvSTKaZHTz5s0DKFuSJEmSJEk7MojwqNcMovE/4dbVZhPwsKp6PHAOcEmS+/U6SFVdWFUjVTUyb97vXX4nSZIkSZKkSTCI8GgjcPCY5YcCP+pqk2Q2sC+wpap+U1W3AlTVGuA7wCMGUJMkSZIkSZIGYBDh0WpgQZJDk9wHWAKsGNdmBXBa+/pZwDVVVUnmtTfcJskfAwuA7w6gJkmSJEmSJA3AhH9traq2JTkL+AwwC7ioqm5Kcj4wWlUrgA8AH06yHthCEzAB/AVwfpJtwF3AGVW1ZaI1SZIkSZIkaTBSNf72RLu/kZGRGh0dne4yJEmSJEmShkaSNVU1Mn79IC5bkyRJkiRJ0pAyPJIkSZIkSVInwyNJkiRJkiR1MjySJEmSJElSJ8MjSZIkSZIkdTI8kiRJkiRJUifDI0mSJEmSJHUyPJIkSZIkSVInwyNJkiRJkiR1MjySJEmSJElSJ8MjSZIkSZIkdTI8kiRJkiRJUifDI0mSJEmSJHUyPJIkSZIkSVInwyNJkiRJkiR1Gkh4lOSEJOuSrE9yXo/tc5Nc3m7/cpL5Y7b9U7t+XZLjB1GPJEmSJEmSBmPC4VGSWcB7gKcBC4FTkywc1+x04GdV9XDg7cBb2n0XAkuARwEnAO9t+xtab30rvPSlsGoVHH447LcfLFwIj3wkzJ4NSX+PvfaCWbOa54UL4QEPgBNPbPo95pjmGGOtWtUcux8nnghLlzbtV61q1i1dek///fYjSZIkSdIwGvt9ebth/r48iJlHRwPrq+q7VXUncBmweFybxcCy9vWVwLFJ0q6/rKp+U1XfA9a3/Q2to46Cyy6Dk0+Gww6D226DtWvhW9+Cu+7qv58quPvu5nntWti8GebPb/r9xjfg8svvGcirVsFzntMcux9PfSqcey5s2NDs97KXNcvz5+9cP5IkSZIkDaOjjmq+H+/q9+49zewB9HEQcPOY5Y3AMV1tqmpbktuAA9r1143b96AB1LTbWrQIPv5xOOkkWLmymT20M6FRlzlzYNmyZvbSVVc1657zHDjzTLjgAli+vDl2P845p3k+91x49KOb/Y87Dq64Yuf6kSRJkiRpGC1a1Hw/3tXv3XuaQcw8So911WebfvZtOkhekmQ0yejmzZt3ssTdy6JFcPbZ8MtfNsHR/e43sf4OPhh++9umv7PPbvpftKgZwG94Q/O8swP4nHPgyU+GG29s+r/66l3rR5IkSZKkYTTR7917kkGERxuBg8csPxT4UVebJLOBfYEtfe4LQFVdWFUjVTUyb968AZQ9fVatgne8A/beu5l5tHXrxPq7+eZm5tHeezf9rlrVPC64AF7zmuZ5/LWYO7J0KXzxi/CYxzT9H3fcrvUjSZIkSdIwmuj37j3JIMKj1cCCJIcmuQ/NDbBXjGuzAjitff0s4Jqqqnb9kvbX2A4FFgBfGUBNu61Vq5pL1pLm3kKDuGQNmplHp53W9PuMZzT3Plq+HM4//56pdP0O5KVLm0vWzjgDNm1qEtSVK+HZz965fiRJkiRJGkbb73G0q9+79zQTDo+qahtwFvAZYC2wvKpuSnJ+kme2zT4AHJBkPXAOcF67703AcuCbwKeBl1fVgOKU3dPq1bBkCXzsY7BuHey7b/NLa4cf3sxC6tf2X1xLmv3nzWtucP2xjzX3KTrllHumzG2/FnP16v76XrkS3va25gbZy5fDe9/bLG/YsHP9SJIkSZI0jFav/t17HO3s9+49TZoJQHuWkZGRGh0dne4yJEmSJEmShkaSNVU1Mn79IC5bkyRJkiRJ0pAyPJIkSZIkSVInwyNJkiRJkiR1MjySJEmSJElSJ8MjSZIkSZIkdTI8kiRJkiRJUifDI0mSJEmSJHUyPJIkSZIkSVInwyNJkiRJkiR1MjySJEmSJElSJ8MjSZIkSZIkdTI8kiRJkiRJUifDI0mSJEmSJHUyPJIkSZIkSVInwyNJkiRJkiR1MjySJEmSJElSpwmFR0n2T3J1km+3z/fvaHda2+bbSU4bs/7aJOuS3NA+HjCReiRJkiRJkjRYE515dB7w2apaAHy2Xf4dSfYHXgccAxwNvG5cyPTcqjqifdwywXokSZIkSZI0QBMNjxYDy9rXy4CTerQ5Hri6qrZU1c+Aq4ETJnhcSZIkSZIkTYGJhkcPrKpNAO1zr8vODgJuHrO8sV233QfbS9ZekyRdB0rykiSjSUY3b948wbIlSZIkSZLUj9k7apBkJfCgHpv+uc9j9AqEqn1+blX9MMk+wEeA5wMX9+qkqi4ELgQYGRmpXm0kSZIkSZI0WDsMj6rqqV3bkvwkyYOralOSBwO97lm0EXjKmOWHAte2ff+wfb49ySU090TqGR5JkiRJkiRp6qVq1yfxJPkX4NaqenOS84D9q+ofxrXZH1gDPKFddT1wJLAV2K+qfppkDnApsLKq3tfHcTcD39/lwncfBwI/ne4ipCnmuNdM5LjXTOS410zkuNdM5LgfLodU1bzxKycaHh0ALAceBvwAeHZVbUkyApxRVS9u270IeHW72xur6oNJ/gj4PDAHmAWsBM6pqrt2uaA9TJLRqhqZ7jqkqeS410zkuNdM5LjXTOS410zkuJ8ZdnjZ2r2pqluBY3usHwVePGb5IuCicW1+QTMDSZIkSZIkSbupif7amiRJkiRJkoaY4dH0unC6C5CmgeNeM5HjXjOR414zkeNeM5HjfgaY0D2PJEmSJEmSNNyceSRJkiRJkqROhkeSJEmSJEnqZHg0BZKckGRdkvVJzuuxfW6Sy9vtX04yf+qrlAarj3H/wiSbk9zQPl7cqx9pT5LkoiS3JPlGx/YkeWf7d/H1JE+Y6hqlQepjzD8lyW1jzvWvneoapUFLcnCSVUnWJrkpydk92ni+11Dpc9x7zh9is6e7gGGXZBbwHuA4YCOwOsmKqvrmmGanAz+rqocnWQK8BThl6quVBqPPcQ9weVWdNeUFSpPnQ8C7gYs7tj8NWNA+jgEuaJ+lPdWHuPcxD/CFqnrG1JQjTYltwN9X1fVJ9gHWJLl63Occz/caNv2Me/CcP7SceTT5jgbWV9V3q+pO4DJg8bg2i4Fl7esrgWOTZAprlAatn3EvDZ2q+jyw5V6aLAYursZ1wH5JHjw11UmD18eYl4ZOVW2qquvb17cDa4GDxjXzfK+h0ue41xAzPJp8BwE3j1neyO//kf1/m6raBtwGHDAl1UmTo59xD/DX7VTuK5McPDWlSdOq378NaZg8McnXkvxPkkdNdzHSILW3m3g88OVxmzzfa2jdy7gHz/lDy/Bo8vWaQVS70Ebak/Qzpj8JzK+qxwIruWf2nTTMPN9rprkeOKSqHge8C/j4NNcjDUyS+wIfAV5ZVVvHb+6xi+d77fF2MO495w8xw6PJtxEYO6PiocCPutokmQ3si1PAtWfb4bivqlur6jft4vuBI6eoNmk69fP/BGloVNXWqrqjff0pYE6SA6e5LGnCksyh+QL9X1X10R5NPN9r6Oxo3HvOH26GR5NvNbAgyaFJ7gMsAVaMa7MCOK19/SzgmqryXya0J9vhuB933f8zaa6blobdCuAF7a/w/ClwW1Vtmu6ipMmS5EHb7+OY5Giaz563Tm9V0sS0Y/oDwNqqWtrRzPO9hko/495z/nDz19YmWVVtS3IW8BlgFnBRVd2U5HxgtKpW0PwRfjjJepoZR0umr2Jp4voc93+X5Jk0v9ywBXjhtBUsDUiSS4GnAAcm2Qi8DpgDUFXvAz4FnAisB34J/O30VCoNRh9j/lnAmUm2Ab8ClvgPZBoCTwKeD9yY5IZ23auBh4Hnew2tfsa95/whFv9bSpIkSZIkqYuXrUmSJEmSJKmT4ZEkSZIkSZI6GR5JkiRJkiSpk+GRJEmSJEmSOhkeSZIkSZIk7cGSXJTkliTf6KPt25Pc0D7+N8nPd7SP4ZEkSVIfklyb5Phx616Z5L33ss8dk1+ZJEkSHwJO6KdhVb2qqo6oqiOAdwEf3dE+hkeSJEn9uRRYMm7dkna9JEnStKmqzwNbxq5L8idJPp1kTZIvJDm8x66n0sdnGcMjSZKk/lwJPCPJXIAk84GHADck+WyS65PcmGTx+B2TPCXJVWOW353khe3rI5N8rv1g95kkD56KNyNJkobehcArqupI4Fzgd2ZLJzkEOBS4ZkcdzZ6U8iRJkoZMVd2a5Cs0U8I/QTPr6HLgV8DJVbU1yYHAdUlWVFXtqM8kc2imiy+uqs1JTgHeCLxo0t6IJEkaeknuC/wZcEWS7avnjmu2BLiyqu7aUX+GR5IkSf3bfuna9vDoRUCANyX5C+Bu4CDggcCP++jvMODRwNXtB7tZwKbBly1JkmaYvYCft/c16rIEeHk/nRkeSZIk9e/jwNIkTwD+sKquby8/mwccWVW/TbIB+INx+23jd28XsH17gJuq6omTW7YkSZpJ2hnR30vy7Kq6Is2/Uj22qr4GkOQw4P7Al/rpz3seSZIk9amq7gCuBS7inptL7gvc0gZHi4BDeuz6fWBhkrlJ9gWObdevA+YleSI0l7EledRkvgdJkjR8klxKEwQdlmRjktOB5wKnJ/kacBMw9r6MpwKX9XOZPUD6bCdJkiQgyck0P2n7yKr6Vnufo08Cc4AbgCcBT6uqDUnuqKr7tvu9leZD27eBO4EVVfWhJEcA76QJoWYD/1ZV75/yNyZJktTB8EiSJEmSJEmdvGxNkiRJkiRJnQyPJEmSJEmS1MnwSJIkSZIkSZ0MjyRJkiRJktTJ8EiSJEmSJEmdDI8kSZIkSZLUyfBIkiRJkiRJnf4P2g6M3BfXPLYAAAAASUVORK5CYII=\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAABI8AAABzCAYAAAAR6iJzAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjAsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+17YcXAAAToElEQVR4nO3dfbRddX3n8ffHJDxrqCStPBqr1AqoAbJA22knTpwxqJh2xtZQFKhYhj4446oul2MftHTsSGf65ENHY5vyUEGQdtlYoRapka7VqFyQqhEYIuUhgBAIBKISCH7nj7OvHk/Ozj3JPffcm3vfr7XOuvvht3/7e876sdnnk733SVUhSZIkSZIk9fOM6S5AkiRJkiRJM5fhkSRJkiRJkloZHkmSJEmSJKmV4ZEkSZIkSZJaGR5JkiRJkiSpleGRJEmSJEmSWhkeSZKkOS/JRUn+53TXIUmSNBMZHkmSpH1CkjuTvHLYbeeSJMuTbJ7uOiRJ0r7F8EiSJEmSJEmtDI8kSdKMl+RS4Bjg00m2J3lnktcl2Zjk0STrk7yorW2z/JNJvpVkW5Lrkxy/hzUsT7I5ybuTPNRc3XRm1/rXJPlKkseS3JPkvV3rPpPkrT39fTXJzzXTleTXktye5PEkv5/k+Uk2NP1dmWS/rm1fm+Tm5r3/S5KXdK27M8k7mv63JbkiyQFJDgauAY5oPpftSY5IckqSsWY/DyT54z35XCRJ0uxneCRJkma8qnoTcDdwelUdAnwKuBx4G7AYuJpOWLRfb9uq+sOmm2uAY4EfBW4CPr4XpTwHWAQcCZwNrEnywmbdt4GzgEOB1wC/Oh4OARcDbxzvJMlLmz6u7up7JXAy8DLgncAa4EzgaOAE4Ixm25OAtcB/BQ4DPgqsS7J/V1+/2PT3POAlwDlV9W3gNOC+5nM5pKruA/4M+LOqehbwfODKvfhcJEnSLGZ4JEmS9kVvAD5TVddW1VPA/wEOBH6qbYOqWltVj1fVDuC9wEuTLNyLff9OVe2oqi8An6ET1FBV66vqa1X1var6Kp1w69832/wdcGySY5v5NwFXVNWTXf1eWFWPVdVG4OvAP1bVHVW1jU7wdWLT7leAj1bVl6rq6aq6GNhBJ3Qa94Gquq+qtgKfBpbu5v08BbwgyaKq2l5VX9yLz0SSJM1ihkeSJGlfdARw1/hMVX0PuIfO1Ty7SDIvyfuTfDPJY8CdzapFe7jfR5oreMbd1dRCklOTfD7JliTbgPPH+28CqyuBNyZ5Bp2riC7t6fuBrunv9pk/pJl+LvD25pa1R5M8SufqpCO62n+ra/o7Xdv2cy7wE8CtSW5I8trdtJUkSXOQ4ZEkSdpXVNf0fXRCFACShE6Acm+ftgC/BKwCXgksBJaMb7qHNfxI8+ygccc0tQBcBqwDjq6qhcBHevq/mM5taCuA71TVhj3c97h7gPdV1aFdr4Oq6vIBtu39XKiq26vqDDq3810IXNXzHiVJ0hxneCRJkvYVDwA/3kxfCbwmyYokC4C307l161/6tAV4ZrP+YeAg4A8mUcfvJdkvyc8ArwU+2bWPrVX1RJJT6ARW39eERd8D/ohdrzraEx8Dzm+udEqSg5uHdT9zgG0fAA7rvl0vyRuTLG6u3nq0Wfz0JOqTJEmzjOGRJEnaV/wv4Leb27ROp/MA6g8CDzXzp3c9Q+j7bZO8A7iEzi1m9wLfAPb2uT7fAh6hc7XRx4Hzq+rWZt2vARckeRz4Xfo/ePoS4MXAX+/l/qmqMTrPPfpQU8sm4JwBt72VzrOY7mg+myPoPFh7Y5LtdB6evbqqntjb+iRJ0uyTql2uXpYkSVKPJMuBv66qoybRx1nAeVX174ZWmCRJ0hTzyiNJkqQRSHIQnauT1kx3LZIkSXvC8EiSJKmR5N1Jtvd5XTPJfl8FbKHzzKHLhlKsJEnSiHjbmiRJkiRJklp55ZEkSZIkSZJazR9GJ0lW0vl1jnnAX1TV+3vW70/n10VOpvMTuW+oqjuTLAFuAW5rmn6xqs6faH+LFi2qJUuWDKN0SZIkSZIkATfeeONDVbW4d/mkw6Mk84APA/8R2AzckGRdVX2jq9m5wCNV9YIkq4ELgTc0675ZVUv3ZJ9LlixhbGxssqVLkiRJkiSpkeSufsuHcdvaKcCmqrqjqp4EPgGs6mmzCri4mb4KWJEkQ9i3JEmSJEmSptAwwqMjgXu65jc3y/q2qaqdwDbgsGbd85J8JckXkvxM206SnJdkLMnYli1bhlC2JEmSJEmSJjKM8KjfFUS9P+HW1uZ+4JiqOhH4TeCyJM/qt5OqWlNVy6pq2eLFu9x+J0mSJEmSpCkwjPBoM3B01/xRwH1tbZLMBxYCW6tqR1U9DFBVNwLfBH5iCDVJkiRJkiRpCIYRHt0AHJvkeUn2A1YD63rarAPObqZfD/xTVVWSxc0Dt0ny48CxwB1DqEmSJEmSJElDMOlfW6uqnUl+A/gsMA9YW1Ubk1wAjFXVOuAvgUuTbAK20gmYAH4WuCDJTuBp4Pyq2jrZmiRJkiRJkjQcqep9PNHMt2zZshobG5vuMiRJkiRJkmaNJDdW1bLe5cO4bU2SJEmSJEmzlOGRJEmSJEmSWhkeSZIkSZIkqZXhkSRJkiRJkloZHkmSJEmSJKmV4ZEkSZIkSZJaGR5JkiRJkiSpleGRJEmSJEmSWhkeSZIkSZIkqZXhkSRJkiRJkloZHkmSJEmSJKmV4ZEkSZIkSZJaGR5JkiRJkiSpleGRJEmSJEmSWhkeSZIkSZIkqdVQwqMkK5PclmRTknf1Wb9/kiua9V9KsqRr3f9olt+W5FXDqEeSJEmSJEnDMenwKMk84MPAacBxwBlJjutpdi7wSFW9APgT4MJm2+OA1cDxwErgz5v+JEmSJEmSNAPMH0IfpwCbquoOgCSfAFYB3+hqswp4bzN9FfChJGmWf6KqdgD/lmRT09+GIdQ1s516Knz5yxO3W7AADj4YnngCDjoITjihs/zuu+HAA+H00+GxxzrLzjoLXv7y/v1s2ACXXNK/3YYNsH49LF/evv3utG0/2X4laSIeZyRJkjSd5sj56DDCoyOBe7rmNwOntrWpqp1JtgGHNcu/2LPtkUOoaWYbNDgCeOopePTRzvQTT8D11//w+ltu+cH02rWdQds7YDdsgFe8Anbs2LXdhg2wYgU8+STstx9cd92eDfi27SfbryRNxOOMJEmSptMcOh8dxjOP0mdZDdhmkG07HSTnJRlLMrZly5Y9LHGGuemmqen3qac6oVCv9es7g7lfu/F1Tz/d+dtv+91p236y/UrSRDzOSJIkaTrNofPRYYRHm4Gju+aPAu5ra5NkPrAQ2DrgtgBU1ZqqWlZVyxYvXjyEsqfRSSdNTb8LFnQuleu1fHknBe3XbnzdvHmdv/2235227SfbryRNxOOMJEmSptMcOh9NVd8LfQbvoBMG/T9gBXAvcAPwS1W1savNrwMvrqrzk6wG/nNV/WKS44HL6Dzn6AjgOuDYqnp6d/tctmxZjY2NTaruaeczjyRp8jzOSJIkaTrNsvPRJDdW1bJdlk82PGo6fzXwp8A8YG1VvS/JBcBYVa1LcgBwKXAinSuOVnc9YPu3gDcDO4G3VdU1E+1vVoRHkiRJkiRJM8iUhkejZngkSZIkSZI0XG3h0TCeeSRJkiRJkqRZyvBIkiRJkiRJrQyPJEmSJEmS1MrwSJIkSZIkSa0MjyRJkiRJktTK8EiSJEmSJEmtDI8kSZIkSZLUyvBIkiRJkiRJrQyPJEmSJEmS1MrwSJIkSZIkSa0MjyRJkiRJktTK8EiSJEmSJEmtDI8kSZIkSZLUyvBIkiRJkiRJrQyPJEmSJEmS1MrwSJIkSZIkSa0mFR4leXaSa5Pc3vz9kZZ2Zzdtbk9ydtfy9UluS3Jz8/rRydQjSZIkSZKk4ZrslUfvAq6rqmOB65r5H5Lk2cB7gFOBU4D39IRMZ1bV0ub14CTrkSRJkiRJ0hBNNjxaBVzcTF8M/FyfNq8Crq2qrVX1CHAtsHKS+5UkSZIkSdIITDY8+rGquh+g+dvvtrMjgXu65jc3y8b9VXPL2u8kSduOkpyXZCzJ2JYtWyZZtiRJkiRJkgYxf6IGST4HPKfPqt8acB/9AqFq/p5ZVfcmeSbwN8CbgEv6dVJVa4A1AMuWLat+bSRJkiRJkjRcE4ZHVfXKtnVJHkhyeFXdn+RwoN8zizYDy7vmjwLWN33f2/x9PMlldJ6J1Dc8kiRJkiRJ0uilau8v4knyv4GHq+r9Sd4FPLuq3tnT5tnAjcBJzaKbgJOBx4BDq+qhJAuAy4HPVdVHBtjvFuCuvS585lgEPDTdRUgNx6NmEsejZhLHo2YKx6JmEsejZhLH4/A8t6oW9y6cbHh0GHAlcAxwN/ALVbU1yTLg/Kp6S9PuzcC7m83eV1V/leRg4HpgATAP+Bzwm1X19F4XtI9JMlZVy6a7Dgkcj5pZHI+aSRyPmikci5pJHI+aSRyPU2/C29Z2p6oeBlb0WT4GvKVrfi2wtqfNt+lcgSRJkiRJkqQZarK/tiZJkiRJkqRZzPBoeq2Z7gKkLo5HzSSOR80kjkfNFI5FzSSOR80kjscpNqlnHkmSJEmSJGl288ojSZIkSZIktTI8kiRJkiRJUivDoxFIsjLJbUk2JXlXn/X7J7miWf+lJEtGX6XmigHG4zlJtiS5uXm9pV8/0mQlWZvkwSRfb1mfJB9oxupXk5w06ho1dwwwHpcn2dZ1bPzdUdeouSHJ0Uk+n+SWJBuT/Pc+bTw+aiQGHI8eHzXlkhyQ5MtJ/rUZi7/Xp43fq6eQ4dEUSzIP+DBwGnAccEaS43qanQs8UlUvAP4EuHC0VWquGHA8AlxRVUub11+MtEjNJRcBK3ez/jTg2OZ1HvB/R1CT5q6L2P14BPjnrmPjBSOoSXPTTuDtVfUi4GXAr/f5f7XHR43KIOMRPD5q6u0A/kNVvRRYCqxM8rKeNn6vnkKGR1PvFGBTVd1RVU8CnwBW9bRZBVzcTF8FrEiSEdaouWOQ8SiNRFVdD2zdTZNVwCXV8UXg0CSHj6Y6zTUDjEdpJKrq/qq6qZl+HLgFOLKnmcdHjcSA41Gacs3xbnszu6B59f76l9+rp5Dh0dQ7Erina34zux5wv9+mqnYC24DDRlKd5ppBxiPAf2kug78qydGjKU3axaDjVRqVlzeXy1+T5PjpLkazX3PLxYnAl3pWeXzUyO1mPILHR41AknlJbgYeBK6tqtZjo9+rh8/waOr1Szp7E9JB2kjDMMhY+zSwpKpeAnyOH6T30qh5bNRMchPw3OZy+Q8Cn5rmejTLJTkE+BvgbVX1WO/qPpt4fNSUmWA8enzUSFTV01W1FDgKOCXJCT1NPDZOIcOjqbcZ6L5y4yjgvrY2SeYDC/HSeU2NCcdjVT1cVTua2Y8BJ4+oNqnXIMdPaSSq6rHxy+Wr6mpgQZJF01yWZqkkC+h8Uf94Vf1tnyYeHzUyE41Hj48atap6FFjPrs8q9Hv1FDI8mno3AMcmeV6S/YDVwLqeNuuAs5vp1wP/VFUmpJoKE47HnmcmvI7Ove3SdFgHnNX8qtDLgG1Vdf90F6W5Kclzxp+bkOQUOudQD09vVZqNmnH2l8AtVfXHLc08PmokBhmPHh81CkkWJzm0mT4QeCVwa08zv1dPofnTXcBsV1U7k/wG8FlgHrC2qjYmuQAYq6p1dA7IlybZRCcZXT19FWs2G3A8/rckr6Pz6xpbgXOmrWDNakkuB5YDi5JsBt5D5+GHVNVHgKuBVwObgO8Avzw9lWouGGA8vh741SQ7ge8Cqz0h1RT5aeBNwNeaZ3sAvBs4Bjw+auQGGY8eHzUKhwMXN78e/Qzgyqr6e79Xj07871qSJEmSJEltvG1NkiRJkiRJrQyPJEmSJEmS1MrwSJIkSZIkSa0MjyRJkiRJktTK8EiSJEmSJGkflmRtkgeTfH2Atsck+XySryT5apJXT7SN4ZEkSdIAkqxP8qqeZW9L8ue72Wb71FcmSZLERcDKAdv+NnBlVZ0IrAZaz2XGGR5JkiQN5nI6J1jdVjfLJUmSpk1VXQ9s7V6W5PlJ/iHJjUn+OclPjjcHntVMLwTum6h/wyNJkqTBXAW8Nsn+AEmWAEcANye5LslNSb6WZFXvhkmWJ/n7rvkPJTmnmT45yReaE7vPJjl8FG9GkiTNemuAt1bVycA7+MEVRu8F3phkM3A18NaJOjI8kiRJGkBVPQx8mR9cEr4auAL4LvDzVXUS8Argj5JkkD6TLAA+CLy+ObFbC7xv2LVLkqS5JckhwE8Bn0xyM/BRYPwfqM4ALqqqo4BXA5cm2W0+NH8qi5UkSZplxm9d+7vm75uBAH+Q5GeB7wFHAj8GfGuA/l4InABc2+RN84D7h1+2JEmaY54BPFpVS/usO5fmH8OqakOSA4BFwIO760ySJEmD+RSwIslJwIFVdRNwJrAYOLk5QXsAOKBnu5388HnX+PoAG6tqafN6cVX9p6l9C5IkabarqseAf0vyCwDpeGmz+m5gRbP8RXTOS7bsrj/DI0mSpAFV1XZgPZ3by8YflL0QeLCqnkryCuC5fTa9Czguyf5JFtKcsAG3AYuTvBw6t7ElOX4q34MkSZp9klwObABemGRzknPp/APXuUn+FdgIjD+X8e3ArzTLLwfOqarabf8TrJckSVKXJD8P/C3woqq6Ncki4NPAAuBm4KeB06rqziTbq+qQZrs/pHPSdjvwJLCuqi5KshT4AJ0Qaj7wp1X1sZG/MUmSpBaGR5IkSZIkSWrlbWuSJEmSJElqZXgkSZIkSZKkVoZHkiRJkiRJamV4JEmSJEmSpFaGR5IkSZIkSWpleCRJkiRJkqRWhkeSJEmSJElq9f8BnjKw5XdDq7AAAAAASUVORK5CYII=\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAABI8AAABzCAYAAAAR6iJzAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjAsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+17YcXAAASIklEQVR4nO3de7BdZX3G8e9jSAAvoJioQICoIBW0gpxB0alFYzVeqfUWxVvFOs6oI6PW8dbROmK91Gq9jUUN4AUQxVqkogIa0Wm8nABaESl4QQLRBAIqSkkCv/6x1ik7J3udc5Kzw945+X5m9ux1edf7/vaZrMnez17r3akqJEmSJEmSpH7uMuwCJEmSJEmSNLoMjyRJkiRJktTJ8EiSJEmSJEmdDI8kSZIkSZLUyfBIkiRJkiRJnQyPJEmSJEmS1MnwSJIkSZIkSZ0MjyRJ0i4lyalJ3jkCdbwkyXfvxPFWJnnZnTWeJEmaOwyPJEnSyEnyqySPH3TbQbizx5MkSRo2wyNJkiRJkiR1MjySJEkjJclngAOBryS5Ockbkjw9yWVJbmpvv3pwV9t2+xeS/CbJ75JclOTwbaxhYZJz2/E2JPlOkrtMMV7f+tp9ByT5UpL1SW5I8pGOMd+X5LtJ9u7Yv3vb/0N6ti1KckuS+yS5V1vz+iQ3tsuLO/p6e5LP9qwvSVJJdmvX907yqSRrk1yb5J1J5m3L31CSJM0dhkeSJGmkVNULgV8DT6uquwNfBs4ATgQWAV+lCW8WTG5bVe9tuzkPOAS4D3Ax8LltLON1wJp2vPsCb25K23q8JA/qqq8NXM4FrgaWAPsDZ/YO1IZSnwD+HHhCVf2u4+9yK/Al4Hk9m58DfLuq1tG8rzsFOIgm4LoF6BtUzcBpwGbgYOBI4AmA8yVJkrSLMjySJEmj7rnAf1bV+VW1CfhnYE/gUV0HVNWKqvpDG7i8HXhY1xU9HTYB+wIHVdWmqvpOVdV21Hc0sB/w91X1x6r636rqnSR7Pk3wtA9NIPWnaeo6nS3Do+e326iqG6rq7Kr6U1X9ATgJ+MtteM0AJLkv8CTgxLbmdcAHgOXb2pckSZobdht2AZIkSdPYj+bKHQCq6vYk19BcxbOV9mqfk4Bn01wJdHu7ayHQ96qePt5HEzp9IwnAyVX17u2obxNwdVVt7jj2YOBhwNFVtXEGdX0T2DPJI4DfAEcA/w6Q5K40Ic8y4F5t+3skmVdVt82g7wkH0YRaa9vXDs0XjtdsQx+SJGkO8cojSZI0inqv8rmOJtAAIE2icQBwbZ+20FyNcxzweGBvmtvFAMIMtVctva6qHgA8DXhtkqUd401V3zXAgRNzCfVxOfC3wHlJDp1BXbcDZ9FcffR84Nz2KiNobrU7FHhEVe0FPGaipD5d/RG4a8/6/XqWrwFuBRZW1T3bx15VtU3zRkmSpLnD8EiSJI2i3wIPaJfPAp6SZGmS+TQhya3Af/VpC3CPdv8NNAHJu7Z18CRPTXJwGwT9HritffQbb6r6fgCsBd6d5G5J9kjy6N6xquoMmjmVLkjywBmUdzrNrXLHt8sT7kEzz9FNSfYB3jZFH5cCj0lyYHs735t66lkLfAN4f5K92jmZHphkm2+BkyRJc4PhkSRJGkX/BLw1yU00V/68APgwcH27/rSe27z+v22S1wOfprmN7Frgp8D3tmP8Q4ALgJuBVcDHqmplv/Gq6oqu+trbxZ5Gc3var2km4X7u5MGq6jTgHcA3kyyZqrCq+j7NlUP70UwMPuGDNHMtXU/zmr82RR/nA58HfgysppnUu9eLgAU0f78bgS/SzAElSZJ2Qeme+1GSJEmSJEm7Oq88kiRJkiRJUifDI0mStEtK8uYkN/d5nDf90Tu0ro931PXxYdYlSZJ2Xd62JkmSJEmSpE5dPxu7TZIsA/4VmAd8sqrePWn/7jSTVx5F88snz62qX7UTQl4OXNE2/V5VvWK68RYuXFhLliwZROmSJEmSJEkCVq9efX1VLZq8fdbhUZJ5wEeBv6L5BZEfJjmnqn7a0+wE4MaqOjjJcuA93PFLIz+vqiO2ZcwlS5YwPj4+29IlSZIkSZLUSnJ1v+2DmPPoaOCqqvpF+5O5ZwLHTWpzHHBau/xFYGmSDGBsSZIkSZIk7UCDCI/2B67pWV/Tbuvbpqo2A78D7t3uu3+SS5J8O8lfdA2S5OVJxpOMr1+/fgBlS5IkSZIkaTqDCI/6XUE0eRburjZrgQOr6kjgtcDpSfbqN0hVnVxVY1U1tmjRVrffSZIkSZIkaQcYRHi0BjigZ30xcF1XmyS7AXsDG6rq1qq6AaCqVgM/Bx40gJokSZIkSZI0AIMIj34IHJLk/kkWAMuBcya1OQd4cbv8LOCbVVVJFrUTbpPkAcAhwC8GUJMkSZIkSZIGYNa/tlZVm5O8Cvg6MA9YUVWXJXkHMF5V5wCfAj6T5CpgA03ABPAY4B1JNgO3Aa+oqg2zrUmSJEmSJEmDkarJ0xONvrGxsRofHx92GZIkSZIkSXNGktVVNTZ5+yBuW5MkSZIkSdIcZXgkSZIkSZKkToZHkiRJkiRJ6mR4JEmSJEmSpE6GR5IkSZIkSepkeCRJkiRJkqROhkeSJEmSJEnqZHgkSZIkSZKkToZHkiRJkiRJ6mR4JEmSJEmSpE6GR5IkSZIkSepkeCRJkiRJkqROhkeSJEmSJEnqZHgkSZIkSZKkToZHkiRJkiRJ6jSQ8CjJsiRXJLkqyRv77N89yefb/d9PsqRn35va7VckeeIg6pEkSZIkSdJgzDo8SjIP+CjwJOAw4HlJDpvU7ATgxqo6GPgA8J722MOA5cDhwDLgY21/kiRJkiRJGgG7DaCPo4GrquoXAEnOBI4DftrT5jjg7e3yF4GPJEm7/cyquhX4ZZKr2v5WDaCu0bZqFTzqUd3799wT5s2DBI48Eo4/Hs47Dy65BG68ETZuhAMOgMMP3/K4+90PXvQiOOaYZoyVK+HYY5t9E8vHHLN1LZ/+dLM8cez2vqapxuvdP90Y29J22HamWiVJkiRJg7OLfB4cRHi0P3BNz/oa4BFdbapqc5LfAfdut39v0rH7D6Cm0TZdcARwyy13LF90UfOY7Morm8dkp5wCH/oQnHhiEzJNhFCbN8OCBXDhhVsGOo99LNx6a7O+YkXzD39b/9GvWgVLl3aPB3fsn1zDVH1N13bYdqZaJUmSJEmDswt9HhzEnEfps61m2GYmxzYdJC9PMp5kfP369dtY4ohZuXLH9r9xI5x9dvN8222wadMdyxs3bjn+ypXNtgmbNm1ffRP9dI3Xu39yDVP1NV3bYduZapUkSZIkDc4u9HlwEOHRGuCAnvXFwHVdbZLsBuwNbJjhsQBU1clVNVZVY4sWLRpA2UM0cVvXjrJgATzzmc3zvHkwf/4dywsWbDn+scc22ybMn7999U300zVe7/7JNUzV13Rth21nqlWSJEmSNDi70OfBVPW90GfmHTRh0P8AS4FrgR8Cz6+qy3ravBJ4aFW9Isly4G+q6jlJDgdOp5nnaD/gQuCQqrptqjHHxsZqfHx8VnUPnXMezbyvUb/sb2eqVZIkSZI0OHPs82CS1VU1ttX22YZHbedPBj4IzANWVNVJSd4BjFfVOUn2AD4DHElzxdHyngm23wK8FNgMnFhV50033pwIjyRJkiRJkkbIDg2P7myGR5IkSZIkSYPVFR4NYs4jSZIkSZIkzVGGR5IkSZIkSepkeCRJkiRJkqROhkeSJEmSJEnqZHgkSZIkSZKkToZHkiRJkiRJ6mR4JEmSJEmSpE6GR5IkSZIkSepkeCRJkiRJkqROhkeSJEmSJEnqZHgkSZIkSZKkToZHkiRJkiRJ6mR4JEmSJEmSpE6GR5IkSZIkSepkeCRJkiRJkqROhkeSJEmSJEnqNKvwKMk+Sc5PcmX7fK+Odi9u21yZ5MU921cmuSLJpe3jPrOpR5IkSZIkSYM12yuP3ghcWFWHABe261tIsg/wNuARwNHA2yaFTMdX1RHtY90s65EkSZIkSdIAzTY8Og44rV0+DfjrPm2eCJxfVRuq6kbgfGDZLMeVJEmSJEnSnWC24dF9q2otQPvc77az/YFretbXtNsmnNLesvYPSdI1UJKXJxlPMr5+/fpZli1JkiRJkqSZ2G26BkkuAO7XZ9dbZjhGv0Co2ufjq+raJPcAzgZeCHy6XydVdTJwMsDY2Fj1ayNJkiRJkqTBmjY8qqrHd+1L8tsk+1bV2iT7Av3mLFoDHNuzvhhY2fZ9bfv8hySn08yJ1Dc8kiRJkiRJ0p0vVdt/EU+S9wE3VNW7k7wR2Keq3jCpzT7AauDh7aaLgaOA3wP3rKrrk8wHzgAuqKqPz2Dc9cDV2134aFkIXD/sIiR18hyVRpvnqDT6PE+l0eY5ql4HVdWiyRtnGx7dGzgLOBD4NfDsqtqQZAx4RVW9rG33UuDN7WEnVdUpSe4GXATMB+YBFwCvrarbtrugnVCS8aoaG3YdkvrzHJVGm+eoNPo8T6XR5jmqmZj2trWpVNUNwNI+28eBl/WsrwBWTGrzR5orkCRJkiRJkjSiZvtra5IkSZIkSZrDDI+G7+RhFyBpSp6j0mjzHJVGn+epNNo8RzWtWc15JEmSJEmSpLnNK48kSZIkSZLUyfBIkiRJkiRJnQyPhiTJsiRXJLkqyRuHXY+kLSVZkWRdkp8MuxZJW0tyQJJvJbk8yWVJXjPsmiRtKckeSX6Q5EftefqPw65J0taSzEtySZJzh12LRpfh0RAkmQd8FHgScBjwvCSHDbcqSZOcCiwbdhGSOm0GXldVDwYeCbzS/0ulkXMr8LiqehhwBLAsySOHXJOkrb0GuHzYRWi0GR4Nx9HAVVX1i6raCJwJHDfkmiT1qKqLgA3DrkNSf1W1tqoubpf/QPOmd//hViWpVzVublfntw9/rUcaIUkWA08BPjnsWjTaDI+GY3/gmp71NfiGV5Kk7ZJkCXAk8P3hViJpsvZ2mEuBdcD5VeV5Ko2WDwJvAG4fdiEabYZHw5E+2/wWRpKkbZTk7sDZwIlV9fth1yNpS1V1W1UdASwGjk7ykGHXJKmR5KnAuqpaPexaNPoMj4ZjDXBAz/pi4Loh1SJJ0k4pyXya4OhzVfWlYdcjqVtV3QSsxPkEpVHyaODpSX5FM5XK45J8drglaVQZHg3HD4FDktw/yQJgOXDOkGuSJGmnkSTAp4DLq+pfhl2PpK0lWZTknu3ynsDjgZ8NtypJE6rqTVW1uKqW0Hwm/WZVvWDIZWlEGR4NQVVtBl4FfJ1mgs+zquqy4VYlqVeSM4BVwKFJ1iQ5Ydg1SdrCo4EX0nxLemn7ePKwi5K0hX2BbyX5Mc2Xp+dXlT8FLkk7oVQ51Y4kSZIkSZL688ojSZIkSZIkdTI8kiRJkiRJUifDI0mSJEmSJHUyPJIkSZIkSVInwyNJkiRJkqSdWJIVSdYl+ckM2h6Y5FtJLkny45n8Yq3hkSRJ0gwkWZnkiZO2nZjkY1Mcc/OOr0ySJIlTgWUzbPtW4KyqOhJYDnS+l5lgeCRJkjQzZ9C8weq1vN0uSZI0NFV1EbChd1uSByb5WpLVSb6T5M8mmgN7tct7A9dN17/hkSRJ0sx8EXhqkt0BkiwB9gMuTXJhkouT/HeS4yYfmOTYJOf2rH8kyUva5aOSfLt9Y/f1JPveGS9GkiTNeScDr66qo4DXc8cVRm8HXpBkDfBV4NXTdWR4JEmSNANVdQPwA+64JHw58HngFuAZVfVw4LHA+5NkJn0mmQ98GHhW+8ZuBXDSoGuXJEm7liR3Bx4FfCHJpcC/ARNfUD0POLWqFgNPBj6TZMp8aLcdWawkSdIcM3Hr2n+0zy8FArwryWOA24H9gfsCv5lBf4cCDwHOb/OmecDawZctSZJ2MXcBbqqqI/rsO4H2y7CqWpVkD2AhsG6qziRJkjQzXwaWJnk4sGdVXQwcDywCjmrfoP0W2GPScZvZ8n3XxP4Al1XVEe3joVX1hB37EiRJ0lxXVb8Hfpnk2QBpPKzd/Wtgabv9wTTvS9ZP1Z/hkSRJ0gxV1c3ASprbyyYmyt4bWFdVm5I8Fjioz6FXA4cl2T3J3rRv2IArgEVJjoHmNrYkh+/I1yBJkuaeJGcAq4BDk6xJcgLNF1wnJPkRcBkwMS/j64C/a7efAbykqmrK/qfZL0mSpB5JngF8CXhwVf0syULgK8B84FLg0cCTqupXSW6uqru3x72X5k3blcBG4JyqOjXJEcCHaEKo3YAPVtUn7vQXJkmS1MHwSJIkSZIkSZ28bU2SJEmSJEmdDI8kSZIkSZLUyfBIkiRJkiRJnQyPJEmSJEmS1MnwSJIkSZIkSZ0MjyRJkiRJktTJ8EiSJEmSJEmd/g9e5LNVwG21XgAAAABJRU5ErkJggg==\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# Task 2: Remove outliers\n", "\n", "\n", "#这里定义了一个可以画出某个数据一维的分布情况的函数,方便观察一下异常值\n", "def draw_1d(*,dataset, feature_name, style = \"r.\", figsize = 20):\n", " plt.figure(figsize=(figsize,1))\n", " for name in dataset.keys():\n", " f1 = dataset[name][feature_name]\n", " if f1 != \"NaN\":\n", " plt.plot(f1,0,style)\n", " plt.title(feature_name)\n", " plt.xlabel(\"Value\")\n", " plt.show()\n", "\n", "draw_1d(dataset = data_dict,feature_name=\"salary\",style=\"bx\")\n", "draw_1d(dataset = data_dict,feature_name='total_payments')\n", "draw_1d(dataset = data_dict,feature_name='total_stock_value')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "首先遍历了整个字典中的人名,然后画出各种数据的一维图像,发现有一个点远超所有的数据,我要把他找出来看看是谁的工资和其他人差这么多个数量级。" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "ExecuteTime": { "end_time": "2019-09-07T07:33:59.237214Z", "start_time": "2019-09-07T07:33:59.212915Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The biggest data: SKILLING JEFFREY K\n", "The people who has taken the highest amount: ['SKILLING JEFFREY K', 'LAY KENNETH L', 'FREVERT MARK A', 'PICKERING MARK R', 'WHALLEY LAWRENCE G', 'DERRICK JR. JAMES V', 'FASTOW ANDREW S', 'SHERRIFF JOHN R', 'RICE KENNETH D', 'CAUSEY RICHARD A', 'KEAN STEVEN J', 'HAEDICKE MARK E', 'MCMAHON JEFFREY', 'METTS MARK']\n" ] } ], "source": [ "def find_max(ori_dataset, *, features, out_type=\"r\", ratio=0.1, num=1):\n", "\n", " from copy import deepcopy\n", " # 这里使用深层拷贝来防止原来的字典数据被这里的 NaN 值修改语句修改\n", " dataset = deepcopy(ori_dataset)\n", "\n", " for key in dataset.keys():\n", " if dataset[key][features] == \"NaN\":\n", " dataset[key][features] = 0.0\n", "\n", " sort = sorted(dataset, reverse=True,\n", " key=lambda name: dataset[name][features])\n", "\n", " if out_type == \"r\":\n", " data_point_num = int(len(sort)*ratio)\n", " #print(\"Should return %d data point(s)\" % data_point_num)\n", " elif out_type == \"n\":\n", " data_point_num = num\n", "\n", " return sort[:data_point_num]\n", "\n", "\n", "print(\"The biggest data:\", find_max(\n", " data_dict, features=\"salary\", out_type=\"n\", num=1)[0])\n", "\n", "print(\"The people who has taken the highest amount:\",find_max(data_dict, features=\"salary\", out_type=\"r\", ratio=0.1))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "发现最大的那个异常值是名字为 TOTAL 的字典键,也就是说其实是数据库中求和过后的结果,我不太需要这个值,所以直接删掉,并且把删除这个值之后的数据保存为 dataset_processed.pkl" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "ExecuteTime": { "end_time": "2019-09-06T12:14:20.726454Z", "start_time": "2019-09-06T12:14:20.717299Z" } }, "outputs": [], "source": [ "# 数据只需要删除一次并且保存一下\n", "# data_dict.pop(\"TOTAL\")\n", "\n", "# with open('dataset_processed.pkl', 'wb') as dataset:\n", "# pickle.dump(data_dict, dataset, protocol=pickle.HIGHEST_PROTOCOL)" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "ExecuteTime": { "end_time": "2019-09-15T09:00:09.512022Z", "start_time": "2019-09-15T09:00:09.504954Z" } }, "outputs": [], "source": [ "#读入之前保存的删除了 TOTAL 的数据集\n", "with open(\"dataset_processed.pkl\", \"rb\") as data_file:\n", " data_dict = pickle.load(data_file)" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "ExecuteTime": { "end_time": "2019-09-15T09:00:37.410984Z", "start_time": "2019-09-15T09:00:37.082927Z" } }, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAABI8AAABzCAYAAAAR6iJzAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjAsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+17YcXAAAanklEQVR4nO3de5RcVZ3o8e+vqrvDIyEJSYgQQprIQ0A0QBMQBhVRQB3FexfjoIxyFfU6OnN93Jm5Os5cGWc5S++M42NQRkQc5MpD8RXxAYgysuSRdEgur/BIQodOgCRAnoDp7qrf/aNO2k6nqtOhK+mk/X7WOqvq7LPP2ftU/c7uql+fcyoyE0mSJEmSJKme0mh3QJIkSZIkSXsuk0eSJEmSJElqyOSRJEmSJEmSGjJ5JEmSJEmSpIZMHkmSJEmSJKkhk0eSJEmSJElqyOSRJElSE0VERsQRo90PSZKkZjF5JEmSJEmSpIZMHkmSJO0BIqJltPsgSZJUj8kjSZKkBiLif0XEqojYFBEPR8RZETE3Iu6MiPUR8WREXBoRbQ3Wf3NELIqIjRHRHRGXDFjWXlzidnFEPA78KiJ+GhF/OWgb90bE23btnkqSJDVm8kiSJKmOiDga+Avg5MycAJwDdAEV4GPAVOBVwFnAhxps5jng3cAk4M3An9dJBL0GOKbY/lXAnw3owyuBGcDPmrJTkiRJL4LJI0mSpPoqwDjg2IhozcyuzFyWmQsz867M7MvMLuDr1BJA28nM2zLzvsysZua9wLV16l6Smc9l5gvAj4EjI+LIYtm7gOszs2dX7KAkSdJwmDySJEmqIzOXAh8FLgHWRMR1EXFIRBwVETdGxFMRsRH4J2pnIW0nIk6JiF9HxNqI2AB8sE7d7gFtbgG+C/xZRJSAdwBXN33nJEmSdoLJI0mSpAYy85rM/CNgFpDA54HLgIeAIzPzAOBvgWiwiWuAecDMzJwI/Hudujlo/irgQmqXwz2fmXc2Y18kSZJeLJNHkiRJdUTE0RHxuogYB/wOeIHapWwTgI3A5oh4GfDnQ2xmAvBsZv4uIuYC79xRu0WyqAp8Ac86kiRJewCTR5IkSfWNAz4HPA08BRxE7Syjv6KWBNoEfAO4fohtfAj4TERsAv43tUvShuPbwPHA/31RPZckSWqiyBx8prQkSZJGU0S8G/hAccmcJEnSqPLMI0mSpD1IROxH7Yyly0e7L5IkSWDySJIkaY8REecAa4HV1G62LUmSNOq8bE2SJEmSJEkNeeaRJEmSJEmSGmppxkYi4lzgy0AZuCIzPzdo+ThqvxpyEvAM8KeZ2RUR7cAS4OGi6l2Z+cEdtTd16tRsb29vRtclSZIkSZIELFy48OnMnDa4fMTJo4goA18F3gCsBBZExLzMfHBAtYuBdZl5RERcAHwe+NNi2bLMnLMzbba3t9PZ2TnSrkuSJEmSJKkQESvqlTfjsrW5wNLMXJ6ZPcB1wHmD6pwHXFU8vwE4KyKiCW1LkiRJkiRpF2pG8mgG0D1gfmVRVrdOZvYBG4ApxbLDI2JRRPxnRJzRqJGI+EBEdEZE59q1a5vQbUmSJEmSJO1IM5JH9c4gGvwTbo3qPAkclpknAB8HromIA+o1kpmXZ2ZHZnZMm7bd5XeSJEmSJEnaBZqRPFoJzBwwfyjwRKM6EdECTASezcwtmfkMQGYuBJYBRzWhT5IkSZIkSWqCZiSPFgBHRsThEdEGXADMG1RnHnBR8fx84FeZmRExrbjhNhExGzgSWN6EPkmSJEmSJKkJRvxra5nZFxF/AdwElIErM/OBiPgM0JmZ84BvAldHxFLgWWoJJoBXA5+JiD6gAnwwM58daZ8kSZIkSZLUHJE5+PZEe76Ojo7s7Owc7W5IkiRJkiSNGRGxMDM7Bpc347I1SZIkSZIkjVEmjyRJkiRJktSQySNJkiRJkiQ1ZPJIkiRJkiRJDZk8kiRJkiRJUkMmjyRJkiRJktSQySNJkiRJkiQ1ZPJIkiRJkiRJDZk8kiRJkiRJUkMmjyRJkiRJktSQySNJkiRJkiQ1ZPJIkiRJkiRJDZk8kiRJkiRJUkMmjyRJkiRJktSQySNJkiRJkiQ11JTkUUScGxEPR8TSiPhEneXjIuL6YvndEdE+YNkni/KHI+KcZvRHkiRJkiRJzTHi5FFElIGvAm8EjgXeERHHDqp2MbAuM48Avgh8vlj3WOAC4DjgXOBrxfYkSZIkSZK0B2hpwjbmAkszczlARFwHnAc8OKDOecAlxfMbgEsjIory6zJzC/BYRCwttndnE/q1R1u4Yh13LX+GU2dP4aRZk/vLn1q+gVWPrGPGUZN5yeyJo9jD5tjZ/Vm8ZjGdqzvpmN7BnIPmQPd86Lod2s+AmXPZsmIjW5ZvYNzsiYybdUD9jXTPZ9Xim7mzciyHn3DmNq/v7rBhwz2sW3c3kyefwsSJJ+6ydjo3PMcd6zdz2qTxdEzcf7v5gQa+rgetH0f3A/cx87jjOeSoYxrWe8nmw/vfu32nLOvfp+dW77vd+t3d3XR1dbFx+gyWjdufyS1l1vVV+vsyOA6W3jufZTfeyZQVS3nstBn8/PBxnD3zVZzeMoOuri7a29uZOXMmGzbcw4pHf8LmJ/ZjwoQz6dkytX8b/bEwfhVrVv+GLg6l/ZVnMHPmzG2Or5WPLODB+7o49vh23jB1Gs/PX8B+c09mvxNOoLu7m+/ffBv3PvYsT+x/OGeefCBTpj/0+/jj98fqUeN72Od3z/C7Bx8mftXJ1H2rHHPyRsbNOphnyi+nq2sc7e1bmDJ+PT37nMidm2dxW6xjyv0LmL5mDS89+w28MKObO7pv5rSZZ3P64W9v+N4uv/rLrPjt7cw6/Qxmv+sjw4+Jzk6WLFnCMcccQ0dHBwBPPLJkm/dr4GtzxKTHhh2rzy9axB33LmHxUcfw2mOP2i7GgGEdn0MdH4P7OlB3dzd33/tTtsQKjnjpuZxy9Gu2qz+s8aFJhttWvbF+R2PEduNgE/qxRxo0vu+svXrf1XTPL1q0zfguSdr7DPsz0Ag/QzTbH8rfoGYkj2YA3QPmVwKnNKqTmX0RsQGYUpTfNWjdGU3o0x5t4Yp1XHjFXfT0VWlrKfGd953KSbMm89TyDfz4i4uo9FUpt5Q472Mn7NUJpJ3dn8VrFvP+m99PT6WHtnIb35jzMeb86ONQ6YFyG1vO+RFPz6uQfVWipcTU9x2//ReG7vlU/+MtTO/r4c208J6Ff8dfv+/duy2BtGHDPdyz6F1Uqz2USm2ceMLVuySB1LnhOc5fvJTeatJaCv7xiBn8/dJV/fM3zDmi/8v9wNf14A37c8786WSlQrmlhT/5+8/2f0kfWG/Gc0fwlgc/TFag1AIzX/MF9jnwEZ5fM4FlNx5GtVLtX7+y73iuuuoqVu13AD95xX5UyhtJaqc1tpWCrx/0Erq+tqQ/Dk56yzjm/3ADWT2WZXk089ZdyqpcwYLHruZ1q1/NgS9Mplwu8/a3d7By1ceoVnrIfYL/98Ob6Xv+Itr2PZS3vvMoqj97jOyrsoa1/KxtE5V4iPKipZxy7vn85bwV9PRVOaa1m7Ofbmd89XC6lla4fv3/4dTF9xJtbZS/8C9c89vfUq1UOKRc4v6Nz3HZLcE+B9/JhKlf5xtnf4PKC7O48Iq7OKCykbNbH+aQpzbz6tt/QVT72GdahdbDVpNrk4nZwsaNF3PAU98io48yrXzpFV9gwzNtfOHrl9Ha10fvD7/Hl9+RPDQDru26i3+Dugmk5Vd/mR/Pu4lqBIvn3cR5MKwEUmdnJzfeeCMAy5YtA+CQA/bne//4KSp9fZRbWjj+/Z/gQ7c8Q09flaMP7OKvOr4K2bvDWH1+0SJ+fsln+fiH/pre56pces8j3HDitgmkLSs28vQV9w15fA51fDzxyJJt+jowNru7u7nh+//Mccf9gihVePbxH3LrU3/H/Vf8oL/++f/9Erhp09DjQ5MMZ1+h/lh/xKTHhhwjthsHz/5Gww9Pw+3HHql7Plz11v7xnYvm7dSHv71639V0zy9axOPveS/Z00O0tXHYt64c0x/eJWksGvZnoBF+hmi2P6S/Qc2451HUKcth1hnOurUNRHwgIjojonPt2rU72cU9y13La1/eqgm9fVXuWv4MAKseWUelr0omVCpVVj2ybpR7OjI7uz+dqzvpqfRQpUpvtZfO5TfVBoWsQKWHLfcvJ/uqkJB9VbYs37D9Rrpuh0oPLVGllT5Oygf6X9/dYd26u6lWe4Aq1Wov69bdvUvauWP9ZnqrSQXorSY/Xbthm/k71m/urzvwdZ36dJlqXy9ZrVLp66P7gfvq1pu2vp1qJWvvXV/y3OrZQJVNq1qp9PVts35XVxeVSoVVE6dQKUX/AVwt+vKrldvGwbJFa8ksQ5SpRpnpm48sRoI+VretJjOpVCo88cSvyWovUYIoJeMP3ky1dyWVSpX19z7dHwtPlDZRoURSolKpsvjBR/qPr0MrL1CqlilRplQt8/i0l0K1Svb2smzxYqrVKhFBiWR6aTMQ9G56eS3+Vnf2H6vTYyMlqkxfvZ6o9lEiGT/1eSKSAMpUmL3P3ZTpI2p7zskbFzNn6RJa+/ooZ5WWvgpHr0iSoJJwR/fNdd/bFb+9nWoERFCNYMVvbx9WTCxZsmS7+e4H7tvm/bpvwcL+12b2AY+Q2ctwYvX5+QtYdPiR9JZbqJbL9CbbxBjAluUbdnh8DnV8DO7rwNjs6upiwoRVlEoVSgEt0cdT3b/cpv76xd07Hh+aZDj7CvXH+h2NEduNg6s7R9yPPVIxVm8d3+kaXpxvtVfvu5ru+fkLyJ6e/vH9+fkLRrtLkqSdNOzPQCP8DNFsf0h/g5qRPFoJzBwwfyjwRKM6EdECTASeHea6AGTm5ZnZkZkd06ZNa0K3R8+ps6fQ1lKiHNDaUuLU2VMAmHHUZMotJaIE5XKJGUft3sutmm1n96djegdt5TbKUaa11ErH7HNq2eQoQ7mNcS+fTbSUICBaSoyrdxZT+xlQbqMvS/TSwsI4rv/13R0mTz6FUqkNKFMqtTJ58uCT8JrjtEnjaS0FZaC1FLx52sRt5k+bNL6/7sDX9empFUotrUSpRLmlhZnHHV+33tpJXZTKUXvvWoL9py8HykyY0Uu5pWWb9dvb2ymXy8zY8AzlavYPKqWiL687dNs4eOkJ04ioQLVCKSusHv8oZAAtTO+ZTkRQLpc55JAziVIrWYWsBpufHE+p7VDK5RKTXjG1PxYOqU6gTJWgSrlcYs6xR/UfXyvL+1ItVahSoVqqcNjaZVAuE62tvHTOHEqlEplJlWB1dTyQtE14oBZ/0zv6j9U1eQBVSqyePokstVClxOan9yOzliyrUGb57+ZSoYXaK9DKggPmsPiIl9Hb0kJfqURfS5mHZ9USVeWA02aeXfe9nXX6GZQyIZNSJrNOP2NYMXHMMcdsNz/zuOO3eb+OP/mk/tdm+cajiGhlOLG639yTOeGxR2mt9FGqVGgNtokxgHGzJ+7w+Bzq+Bjc14Gx2d7ezqZNM6hWy1SrQV+28JKZr9+m/qQ5M3c8PjTJcPYV6o/1OxojthsHp3eMuB97pGKs3jq+0z68ON9qr953Nd1+c08m2tr6x/f95p482l2SJO2kYX8GGuFniGb7Q/obFJl1T/QZ/gZqyaBHgLOAVcAC4J2Z+cCAOh8Gjs/MD0bEBcB/zcy3R8RxwDXU7nN0CHArcGRmVoZqs6OjIzs7G/83dm/gPY/q855Hw+c9j7zn0XYx4T2PvOfR3sR7HqmJ/lDuNyFJY5n3PNozRMTCzNwuezfi5FGx8TcBXwLKwJWZ+dmI+AzQmZnzImIf4GrgBGpnHF0w4AbbnwLeC/QBH83Mn++ovbGQPJIkSZIkSdqT7NLk0e5m8kiSJEmSJKm5GiWPmnHPI0mSJEmSJI1RJo8kSZIkSZLUkMkjSZIkSZIkNWTySJIkSZIkSQ2ZPJIkSZIkSVJDJo8kSZIkSZLUkMkjSZIkSZIkNWTySJIkSZIkSQ2ZPJIkSZIkSVJDJo8kSZIkSZLUkMkjSZIkSZIkNWTySJIkSZIkSQ2ZPJIkSZIkSVJDJo8kSZIkSZLUkMkjSZIkSZIkNWTySJIkSZIkSQ2NKHkUEQdGxC0R8WjxOLlBvYuKOo9GxEUDym+LiIcjYnExHTSS/kiSJEmSJKm5Rnrm0SeAWzPzSODWYn4bEXEg8GngFGAu8OlBSaYLM3NOMa0ZYX8kSZIkSZLURCNNHp0HXFU8vwp4W5065wC3ZOazmbkOuAU4d4TtSpIkSZIkaTcYafJoemY+CVA81rvsbAbQPWB+ZVG21beKS9b+PiKiUUMR8YGI6IyIzrVr146w25IkSZIkSRqOlh1ViIhfAi+ps+hTw2yjXkIoi8cLM3NVREwAvg+8C/h2vY1k5uXA5QAdHR1Zr44kSZIkSZKaa4fJo8x8faNlEbE6Ig7OzCcj4mCg3j2LVgKvHTB/KHBbse1VxeOmiLiG2j2R6iaPJEmSJEmStPtF5os/iSci/hl4JjM/FxGfAA7MzL8ZVOdAYCFwYlF0D3ASsBGYlJlPR0QrcC3wy8z892G0uxZY8aI7vueYCjw92p2QdgFjW2ORca2xytjWWGVsaywyrrWrzcrMaYMLR5o8mgJ8FzgMeBz4k8x8NiI6gA9m5vuKeu8F/rZY7bOZ+a2I2B/4DdAKlIFfAh/PzMqL7tBeJiI6M7NjtPshNZuxrbHIuNZYZWxrrDK2NRYZ1xotO7xsbSiZ+QxwVp3yTuB9A+avBK4cVOc5amcgSZIkSZIkaQ810l9bkyRJkiRJ0hhm8mh0XT7aHZB2EWNbY5FxrbHK2NZYZWxrLDKuNSpGdM8jSZIkSZIkjW2eeSRJkiRJkqSGTB5JkiRJkiSpIZNHoyAizo2IhyNiaUR8YrT7I20VETMj4tcRsSQiHoiIjxTlB0bELRHxaPE4uSiPiPhKEcv3RsSJA7Z1UVH/0Yi4aED5SRFxX7HOVyIihmpDapaIKEfEooi4sZg/PCLuLmLu+ohoK8rHFfNLi+XtA7bxyaL84Yg4Z0B53XG9URtSs0TEpIi4ISIeKsbuVzlmayyIiI8Vn0Xuj4hrI2Ifx23tjSLiyohYExH3DygbtXF6qDakoZg82s0iogx8FXgjcCzwjog4dnR7JfXrA/5nZh4DnAp8uIjPTwC3ZuaRwK3FPNTi+Mhi+gBwGdT+WAGfBk4B5gKfHvDF4rKi7tb1zi3KG7UhNctHgCUD5j8PfLGIuXXAxUX5xcC6zDwC+GJRj+JYuAA4jlrcfi1qCamhxvVGbUjN8mXgF5n5MuCV1GLcMVt7tYiYAfwPoCMzXw6UqY2/jtvaG/0Hvx87txrNcbpuG9KOmDza/eYCSzNzeWb2ANcB541ynyQAMvPJzLyneL6J2peQGdRi9Kqi2lXA24rn5wHfzpq7gEkRcTBwDnBLZj6bmeuAW4Bzi2UHZOadWbtb/7cHbateG9KIRcShwJuBK4r5AF4H3FBUGRzXW2PxBuCsov55wHWZuSUzHwOWUhvT647rO2hDGrGIOAB4NfBNgMzsycz1OGZrbGgB9o2IFmA/4Ekct7UXyszfAM8OKh7NcbpRG9KQTB7tfjOA7gHzK4syaY9SnPJ9AnA3MD0zn4Raggk4qKjWKJ6HKl9Zp5wh2pCa4UvA3wDVYn4KsD4z+4r5gbHYH7/F8g1F/Z2N96HakJphNrAW+FbULsm8IiL2xzFbe7nMXAX8C/A4taTRBmAhjtsaO0ZznPb7qF4Uk0e7X9Qpy93eC2kIETEe+D7w0czcOFTVOmX5IsqlXSYi/hhYk5kLBxbXqZo7WGa8a0/TApwIXJaZJwDPMfTlY8aw9grF5TjnAYcDhwD7U7vUZjDHbY01uyNmjXO9KCaPdr+VwMwB84cCT4xSX6TtREQrtcTRdzLzB0Xx6q2nsxaPa4ryRvE8VPmhdcqHakMaqdOBt0ZEF7VLE15H7UykScXlELBtLPbHb7F8IrXTzXc23p8eog2pGVYCKzPz7mL+BmrJJMds7e1eDzyWmWszsxf4AXAajtsaO0ZznPb7qF4Uk0e73wLgyOKXHNqo3cRv3ij3SQL67wPzTWBJZv7rgEXzgK2/6nAR8OMB5e8ufrXhVGBDcVrsTcDZETG5+O/h2cBNxbJNEXFq0da7B22rXhvSiGTmJzPz0Mxspzbm/iozLwR+DZxfVBsc11tj8fyifhblF0TtV30Op3ajyfk0GNeLdRq1IY1YZj4FdEfE0UXRWcCDOGZr7/c4cGpE7FfE3tbYdtzWWDGa43SjNqShZabTbp6ANwGPAMuAT412f5yctk7AH1E7bfVeYHExvYnaPQBuBR4tHg8s6ge1XytZBtxH7VdRtm7rvdRuTLkUeM+A8g7g/mKdS4Eoyuu24eTUzAl4LXBj8Xw2tS8RS4HvAeOK8n2K+aXF8tkD1v9UEbsPA28cUF53XG/UhpNTsyZgDtBZjNs/AiY7ZjuNhQn4B+ChIv6uBsY5bjvtjRNwLbV7d/VSO+vn4tEcp4dqw8lpqGlrYEmSJEmSJEnb8bI1SZIkSZIkNWTySJIkSZIkSQ2ZPJIkSZIkSVJDJo8kSZIkSZLUkMkjSZIkSZIkNWTySJIkaRgi4raIOGdQ2Ucj4mtDrLN51/dMkiRp1zJ5JEmSNDzXAhcMKrugKJckSRqzTB5JkiQNzw3AH0fEOICIaAcOARZHxK0RcU9E3BcR5w1eMSJeGxE3Dpi/NCL+W/H8pIj4z4hYGBE3RcTBu2NnJEmShsvkkSRJ0jBk5jPAfODcougC4HrgBeC/ZOaJwJnAFyIihrPNiGgF/g04PzNPAq4EPtvsvkuSJI1Ey2h3QJIkaS+y9dK1HxeP7wUC+KeIeDVQBWYA04GnhrG9o4GXA7cU+aYy8GTzuy1JkvTimTySJEkavh8B/xoRJwL7ZuY9xeVn04CTMrM3IrqAfQat18e2Z3xvXR7AA5n5ql3bbUmSpBfPy9YkSZKGKTM3A7dRu7xs642yJwJrisTRmcCsOquuAI6NiHERMRE4qyh/GJgWEa+C2mVsEXHcrtwHSZKkneWZR5IkSTvnWuAH/P6X174D/CQiOoHFwEODV8jM7oj4LnAv8CiwqCjviYjzga8USaUW4EvAA7t8LyRJkoYpMnO0+yBJkiRJkqQ9lJetSZIkSZIkqSGTR5IkSZIkSWrI5JEkSZIkSZIaMnkkSZIkSZKkhkweSZIkSZIkqSGTR5IkSZIkSWrI5JEkSZIkSZIa+v8cvwo5k3B8JAAAAABJRU5ErkJggg==\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "draw_1d(dataset = data_dict,feature_name=\"salary\",style=\".\")" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "ExecuteTime": { "end_time": "2019-09-15T09:01:25.221720Z", "start_time": "2019-09-15T09:01:24.632969Z" } }, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXgAAAEWCAYAAABsY4yMAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjAsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+17YcXAAAeY0lEQVR4nO3de5gcdZ3v8fdnOiGAXAZCVDDBCLjug/gkwCxLFtwTLwuIeHtY94FVgogbd8V1yK54zLqeM7ruQX12DfGoRziAEi94Q1lEED1oFjlGZALhDnILBmFNiDtcjxMy/T1/VPWkptPd05np6kvN5/U8/aS7rr/qmnzqV79fVbUiAjMzK56+ThfAzMzy4YA3MysoB7yZWUE54M3MCsoBb2ZWUA54M7OCcsCbTULSM5IOyXH5CyWFpFnp52slnZnDeu6StLTVy7XuJV8Hb41I2gi8CBjLDP6DiHhsGstcCnw1IuZPr3TTI+klwCPAKyLiwapx3wMejIgPtqEcC4GHgdkRsb1Fy/wy8GhE/GMrlme9yTV4a8abImKvzGvK4d4KlZrudEXEb4DrgTOqlr8/cDJwWSvWY9YpDnibMknHSvq5pBFJt2VP/yWdJekeSU9LekjSe9PhLwCuBQ5Kmz6ekXSQpC9L+kRm/qWSHs183ijpv0q6HXhW0qx0viskbZH0sKQPZKY/RtKwpKck/VbSZ+psxmVUBTxwGnBXRNyRLiskHZa+P1nS3el2/UbSB9Ph75J0Y9X3k53vjZJuTcuzSdJQg+91raT3pO9vy3xPz6TLXJqO+7ak/5D0pKQbJL0yHb4ceAfwoXSe72e+w9en7+dIukDSY+nrAklzst+9pL+XtFnS45LOqlde614OeJuStHnjB8AngP2BDwJXSJqXTrIZOAXYBzgLWCXpqIh4FngD8NgUzghOB94I9ANl4PvAbcBLgNcB50o6MZ12NbA6IvYBDgW+VWeZ3wMOkHR8ZtgZwJo6018CvDci9gaOAH7SZNmfBZalZX8j8DeS3jrZTBGxqPI9AX8H3Afcko6+Fng58MJ02NfSeS5K3386nfdNNRb9EeBYYDGwCDgGyDbnvBjYl+S7PRv4vKT9mtxW6xJdF/CSLk1rDXc2Me0qSRvS168kjbSjjDPQlWktfUTSlemwdwLXRMQ1EVGOiB8DwyRNG0TEDyLiwUj8O/Aj4NXTLMdnI2JTRPw/4I+AeRHx8YjYFhEPAf+bpPYN8DxwmKQDIuKZiPhFrQWmy/o2Sfgi6eXA0cDX65TheeBwSftExH9GxC11pqtez9qIuCP9rm4HLgf+S3ObDekB6BPAmyPiqXSZl0bE0xExCgwBiyTt2+Qi3wF8PCI2R8QW4GNMPJN5Ph3/fERcAzwDvKLZ8lp36LqAB74MnNTMhBGxIiIWR8Ri4H8C382zYDPYWyOiP31Vap0vBd6eCf4R4HjgQABJb5D0C0m/S8edDBwwzXJsyrx/KUkzT3b9/0DSIQxJrfMPgHsl3SzplAbLvQz4C0m7k4TcDyNic51pTyXZlkck/bukJc0UXNIfS/pp2pz0JPDXNPl9SFpAcgZyZkT8Kh1WkvRJSQ9KegrYmE7e7Hd8EEkHc8Uj6bCKrVUdvs8BezW5bOsSXRfwEXED8LvsMEmHSvqhpPWSfibpD2vMejpJrcjaYxPwlUzw90fECyLik2lb7hXAvwAvioh+4BpA6by1Lt16Ftgz8/nFNabJzrcJeLhq/XtHROUM4v6IOJ2k+eJTwHfS9v+dFxrxM2Ar8BaSM5N6zTNExM0R8ZZ0uVeyo+lnQvklVZf/68BVwIKI2Bf4Iju+j7ok7ZGu54KIuDYz6i/T8r6epCllYWWWSlEnWfRjJAfJioPTYVYgXRfwdVwE/G1EHE3S1vuF7EhJLwVeRvPtoTZ9XwXeJOnEtDa5e9o5Nx/YDZgDbAG2S3oDcEJm3t8Cc6uaEzYAJ0vaPw3HcydZ/y+Bp9KO1z3SMhwh6Y8AJL1T0ryIKAOVpruxuktLQv1TJG3k3681gaTdJL1D0r4R8TzwVGaZtwGvlLQ4PRMYqpp9b+B3EfF7SceQBHQzLgXujYhP11jeKMmBaU/gf1SN/y3Q6Nr9y4F/lDRP0gHAfyPZp1YgXR/wkvYC/gT4tqQNwIWkzQAZpwHfiYhG/4GthSJiE0kN8h9IgnwTcB7QFxFPAx8gqd3+J0mYXZWZ916SgHkobV45CPgKSUhuJGmv/+Yk6x8D3kTSSfgw8ARwMUltFpJmvrskPUPS4XpaRPy+wSLXkNRiv5m2addzBrAxbRb5a5IaP2nTyceB/wPcD9xYNd/7gI9LepokTOt1+lY7DXhb1ZU0r07L+wjwG+BuoLqP4RKSvoJsv0nWJ0j6TG4H7iDppP1Ejemsh3XljU5Kbvy4OiKOkLQPcF9EVId6dvpbgXMi4udtKqKZWdfr+hp8esXAw5LeDqDEosp4Sa8A9gPWdaiIZmZdqesCXtLlJGH9ivRmi7NJLuk6W9JtwF0kTQMVpwPfiG48FTEz66CubKIxM7Pp67oavJmZtUZLHtrUKgcccEAsXLiw08UwM+sZ69evfyIi5tUa11UBv3DhQoaHhztdDDOzniHpkXrj3ERjZlZQDngzs4LKtYlGya8BPU1yO/f2iBjIc31mZrZDO9rgXxMRT7RhPWZmluEmGjOzgso74AP4UfqY3+W1JpC0XMlPqw1v2bIl5+KYmeWk+qbRLriJNO+APy4ijiL5ibZzJP1p9QQRcVFEDETEwLx5NS/lNDPrbkNDsGLFjlCPSD4PDXWyVPkGfOW3NtNfx/keye8+mpkVRwSMjMDq1TtCfsWK5PPISEdr8rl1sqa/ntMXEU+n708geV62mVlxSLBqVfJ+9erkBTA4mAzXpD/clV/R8nrYmKRDSGrtkBxIvh4R/9xonoGBgfCdrGbWkyKgL9MoUi63Jdwlra93CXpuTTQR8VBELEpfr5ws3M3MelalWSYr2ybfIb5M0sxsOrJt7oODSc19cHBim3yHdNXDxszMeo4E/f0T29wrbfL9/cVsg58Kt8GbWc+KmBjm1Z9z0pE2eDOzGaU6zDtYc69wwJuZFZQD3sysoBzwZmYF5YA3MysoB7yZWUE54M3MCsoBb2ZWUA54M7OCcsCbmRWUA97MrKAc8GZmBeWANzMrKAe8mVlBOeDNzArKAW9mVlAOeDOzgnLAm5kVlAPezKygHPBmZgXlgDczKygHvJlZQTngzcwKygFvZlZQDngzs4JywJuZFZQD3sysoBzwZmYFlXvASypJulXS1Xmvy8zMdmhHDX4QuKcN6zEzs4xcA17SfOCNwMV5rsfMzHaWdw3+AuBDQLneBJKWSxqWNLxly5aci2NmNnPkFvCSTgE2R8T6RtNFxEURMRARA/PmzcurOGZmM06eNfjjgDdL2gh8A3itpK/muD4zM8vILeAjYmVEzI+IhcBpwE8i4p15rc/MzCbydfBmZgU1qx0riYi1wNp2rMvMzBKuwZuZFZQD3sysoBzwZmYF5YA3MysoB7yZWUE54M3MCsoBb2ZWUA54M7OCcsCbmRWUA97MrKAc8GZmBeWANzMrKAe8mVlBOeDNzArKAW9mVlAOeDOzgnLAm5kVlAPezKygHPBmZgXlgDczKygHvJlZQTngzcwKygFvZlZQDngzs4JywJuZFZQD3sysoBzwZmYF5YA3MysoB7yZWUE54M3MCiq3gJe0u6RfSrpN0l2SPpbXuszMbGezclz2KPDaiHhG0mzgRknXRsQvclynmZmlcgv4iAjgmfTj7PQVea3PzMwmyrUNXlJJ0gZgM/DjiLipxjTLJQ1LGt6yZUuexTEzm1FyDfiIGIuIxcB84BhJR9SY5qKIGIiIgXnz5uVZHDOzGaUtV9FExAiwFjipHeszM7N8r6KZJ6k/fb8H8Hrg3rzWZ2ZmE+V5Fc2BwGWSSiQHkm9FxNU5rs/MzDLyvIrmduDIvJZvZmaN+U5WM7OCcsCbmRWUA97MrKAc8GZmBeWANzMrKAe8mVlBOeDNzNopovHnFnLAm5m1y9AQrFixI9Qjks9DQ7mszgFvZtYOETAyAqtX7wj5FSuSzyMjudTk83xUgZmZVUiwalXyfvXq5AUwOJgMl1q/ysix/WdXDQwMxPDwcKeLYWaWnwjoyzSelMvTCndJ6yNioNY4N9GYmbVLpVkmK9sm32IOeDOzdsi2uQ8OJjX3wcGJbfIt5jZ4M7N2kKC/f2Kbe6VNvr+/u9rgJf1hRLT0BzzcBm9mhRcxMcyrP++ivNrgfzSNea0d2nhDhZk1qTrMc6i5VzRsopH02XqjgP7WF8daZmgouba2cipYaf/r78/tpgoz6y6T1eDPAu4E1le9hoFt+RbNpqwDN1SYWfeZrJP1ZuDOiPh59QhJQ7mUyKavAzdUmFn3adjJKml/4PcR8Vw7CuNO1hZr8Q0VZtZ9ptPJule7wt1arM03VEybO4TNWm6ygL+y8kbSFTmXxVqlAzdUTEubn7BnNlNM1gafPZ8/JM+CWAt14IaKKct2CENSzuzBaZrXCJvNZJO1wd8SEUdVv8+L2+BbrMU3VOQme8ZR4Q5hs6Y0aoOfLODHgGdJavJ7AJX2eAEREfu0sqAO+BnMHcJmUzLlTtaIKEXEPhGxd0TMSt9XPrc03G0G67UOYbMe4adJWmf1WoewWQ/x0ySts3qpQ9isx/gXnaw79EqHsFmX8S86Wfdr4xP2zGYKB7yZWUHlFvCSFkj6qaR7JN0laTCvdZmZ2c7y7GTdDvx9RNwiaW9gvaQfR8TdOa7TzMxSudXgI+LxiLglff80cA/wkrzWZ2ZmE7WlDV7SQuBI4KYa45ZLGpY0vGXLlnYUx8xsRsg94CXtBVwBnBsRT1WPj4iLImIgIgbmzZuXd3HMzGaMXANe0myScP9aRHw3z3WZmdlEeV5FI+AS4J6I+Exe6zEzs9ryrMEfB5wBvFbShvR1co7rMzOzjNwuk4yIG5n4gyFmZtZGvpPVzKygHPBmZgXlgDczKygHvJlZQTngzcwKygFvZlZQDngzs4JywJuZFZQD3sysoBzwVlv1j7F30Y+zm1lzHPC2s6EhWLFiR6hHJJ+HhjpZKjPbRQ54mygCRkZg9eodIb9iRfJ5ZMQ1ebOsLj/TzfM3Wa0XSbBqVfJ+9erkBTA4mAyXnx9nBiRntCMjO/5fVCpD/f1dc7brGrztLBvyFQ53sx165EzXAW87q/yxZmXb5M1mukolaHAwCfW+vuTfLjvTdcDbRNmayOAglMs7/ogd8mY79MCZrgPeJpKSNsRsTaRSU+nv76o/XrOO6oEzXXey2s6GhpI/0kqYV0Le4W6WqD7TXbVqx2fomv8vDnirrfqPswv+WM26Rr0zXeiqM11FF51ODAwMxPDwcKeLYWbWnOyZbq3PbSBpfUQM1BrnNngzs6nq8jNdB7yZWUE54Nupy29rNrNiccC3ix/gZWZt5oBvhx65rdnMisWXSbaDH+BlZh3gyyTbKSJ5ZkVFuexwN7Np8WWS3aBTtzW7Y9dsxnLAt0OnHuDljl2zGc1t8O3Qiduasx27MPFZGYODHbnjzszaK7c2eEmXAqcAmyPiiGbmmRFt8O28rTl75lDhjl2zQulUG/yXgZNyXH7vafdtzT3wvGozy09uAR8RNwC/y2v51oQeeF61meXHnaxF5V9mMpvxOt7JKmk5sBzg4IMP7nBpCqRHnldtZvnJ9UYnSQuBq93J2kFd8LxqM8uPb3Saybr8edV5WrdpHef/7HzWbVrX6aKYdURuTTSSLgeWAgdIehT47xFxSV7rM8tat2kdr1vzOraNbWO30m5cv+x6lixY0ulimbVVbgEfEafntWxr3rpN61i7cS1LFy6dUQG3duNato1tYyzG2Da2jbUb186o7TeDLuhktfzM5Frs0oVL2a202/i2L124tNNFMms7B/wu6LXa8EyuxS5ZsITrl13fU/vLrNUc8E3qxdrwTK/FLlmwpOv3kVmeHPBNWrtxLaPbRylTZnT7aE/Uhl2LNZvZHPBNmrvnXMqUAShTZu6ecztcoua4Fms2c/k6+CZtfW4rfUq+rj71sfW5rR0ukZlZYw74Ji1duJQ5pTmUVGJOaU5L2rN9I46Z5clNNE1qdXt2Hp22vXaVj5nlywG/C1rZnt3qSxh78Sqfdh+QfAC0mcYBP4m8QqHVlzB2+pr3Xf2e2n1A6sUDoNl0OeAbyDMUWt3k08lr3qfyPbX7gNTpA6BZJxQi4POqZecdCq1s8unkNe9T+Z7afUCa6Td92czU8wGfZy2710KhXde8Vx9Qp/I9tfuA5Ju+bCbq+YDPs5bdraHQyc7CegfUqXxP7b4Jyzd92UzT8wGfdy2720Kh052F9Q6o3fY9mVkBAj6vWna3XlLX6c7CXmu2MpvJej7gofW17E7XkhvpdMB2a7OVme2sEAHfap2uJTcylYBt9dmIm2PMeoMDvoZO15InsysB28qzkWYOFN3atGU2EzngayhSM0SrzkaaOVB0c9OW2Uzkp0nWsWTBEla+emVPBFSjp1JWzkZKKk3rbKTWgaLWNKNjo4zFGKNjozWnaYafsmnWGq7B97jJas2tOhtpptlq7p5zKUf6oygxtR9F8VmAWes44HtEpW177p5z2frc1vGwbqYJplWdomcuOhOAZYuW1Vze1ue20kcfZcr0sWs/ilLZvl8/+euu7eA26zUO+By0uqOxUqsdHRulHEl4zpk1h+uXXd/yDuFaZa+uVS9btKzmvEsXLmXOrDm7XJbs8mf1zaLUV4IyXdnBbdZLHPAtlkcTQ6Vte7z5g/J47Xblq1fu1ASTDenK/M0cbNZtWsdrLnsNo2OjlFTiC2/8AsuPXt7wLKH6gDCV5qDs8inDXx31Vxy878E938Ft1mkO+BbL44c8fvnYL8fDvSJbu802wazbtI6lly3l+bHnKfWVKKnE9vL2pg42a25bw+jYKABjMcb7fvA+XvXCV9U9S2h0MKt0sDaz7dXLr24C8qWXZlPjgG9gKsHSyiaTi9ZfxPuveT/by9snDO9THxecdEHNMq25bQ3bxrYBsL28ne0k807lYFOOct2zBKh/Zc2unsE0qvm709Vs6hzwdUw1WJpppmj2hqFzrjlnp3AHiIi6HZh3P3H3hM996kOoqYPNskXLuPjWi8fXWe8soaLWwWyqZzD1OoK7+a5is27ngK9jOsHS6KqV6gPHBSddwNbntta8OqZcLtdcRp/6mLvnXM7/2fk7dYbe+OsbJ0x7/ILj2X3W7iw+cDFrN67ljs13cOvjtwI7Xw2zZMESbnjXDay5bU3N8bW2s9bBrJWdvt1+V7FZN1NEdLoM4wYGBmJ4eLjTxQDyaxo4/2fn89GffpSxGKNPffSpj3KUx6+OmV2azVmLz+LIA4/kA9d+gNGx0fFa+FiMAVBSiVJfibHyGKW+Eu9e/G6WLVrGp3/+aa6898oJ66ssv5ZZfbM46sVHcfZRZ7P86OXT3raKdZvWNX2QaHZ5boM3q03S+ogYqDnOAV9fHsGSPXD0qY+x8hhlJgZwpUllLMYYK48xuzSbY+cfyw2P3DA+HiCICdPvPWdvnnjuiSmV68JTLpwQ8lPd9kq4f2nDl5ru3DUgAqT6n83q6FjASzoJWA2UgIsj4pONpu9kwLezlpi9aencH5474RLIWqoDfXbfbIKo2T4/VScccgLXnXHdePmmcvZSme/3238/XtaSSvzTa/6Jla9e2bKyFs7QEIyMwKpVSahHwIoV0N+fjDNroFHA59YGL6kEfB74M+BR4GZJV0XE3Y3nbL92X6mRbaN/1QtfxdqNaxkZHeFff/6v480wFULjYVn5fPaRZwPwxfVfrLuO6oPCZE49/NTx91Ptf6jMV31m4XbzBiKScF+9Ovm8alUS7qtXw+Cga/I2LXk+bOwY4IGIeCgitgHfAN6S4/qmrJkHaeWl8lCz/jn9O43rUx+H7nfoeFgDlPpKLFu0jGWLljGnNKfmMoXYfdbunHfceRPm3Wn59HHYfoft1Dwz1QeUVc/33qPf6+aZyUhJqA8OJqHe17cj3Cs1erMpyvMqmpcAmzKfHwX+uHoiScuB5QAHH3xwjsWprxuu1KiUYXT7aPIsF/UxpzSH8447b7wZp6QSnzv5c+OBedbis7hw/YU71dIP3e9Q1rxtDUsWLOHQ/Q7lnGvOYaw8NmG6yvIr02VN50e0i/KY5baqhHylFg8Od2uJ3NrgJb0dODEi3pN+PgM4JiL+tt48M6UNfrIyVF8yWa9stdq8oX6HaWW51cu3Dqu0uWcD3jV4a1JHOlklLQGGIuLE9PNKgIg4v9483XYVTS+ohPfI6AgbHt/AqYef2tJLHi1n2XCvhHr1Z4e8NdCRTlbgZuDlkl4G/AY4DfjLHNc3I/n3UXuclFwtkw3zVauScf39DnebltwCPiK2S3o/cB3JZZKXRsRdea3PrGcNDU28WqYS8g53m6ZcH1UQEdcA1+S5DrNCqA5zh7u1gH+T1cysoBzwZmYF5YA3MysoB7yZWUE54M3MCsoBb2ZWUA54M7OC6qof/JC0BXhkirMfAEzt1y66m7er9xR127xd3emlETGv1oiuCvjpkDRc73kMvczb1XuKum3ert7jJhozs4JywJuZFVSRAv6iThcgJ96u3lPUbfN29ZjCtMGbmdlERarBm5lZhgPezKygChHwkk6SdJ+kByR9uNPlAZC0QNJPJd0j6S5Jg+nw/SX9WNL96b/7pcMl6bPpNtwu6ajMss5Mp79f0pmZ4UdLuiOd57NS8hDxeuto8faVJN0q6er088sk3ZSu85uSdkuHz0k/P5COX5hZxsp0+H2STswMr7k/662jxdvVL+k7ku5N992SIuwzSSvSv8M7JV0uafde3GeSLpW0WdKdmWEd2z+N1tEVIqKnXyS/FvUgcAiwG3AbcHgXlOtA4Kj0/d7Ar4DDgU8DH06Hfxj4VPr+ZOBaQMCxwE3p8P2Bh9J/90vf75eO+yWwJJ3nWuAN6fCa62jx9v0d8HXg6vTzt4DT0vdfBP4mff8+4Ivp+9OAb6bvD0/31RzgZek+LDXan/XW0eLtugx4T/p+N6C/1/cZ8BLgYWCPzPf4rl7cZ8CfAkcBd2aGdWz/1FtHt7w6XoAW/PEuAa7LfF4JrOx0uWqU89+APwPuAw5Mhx0I3Je+vxA4PTP9fen404ELM8MvTIcdCNybGT4+Xb11tHBb5gPXA68Frk7/uJ8AZlXvE5KfbFySvp+VTqfq/VSZrt7+bLSOFm7XPiRBqKrhPb3PSAJ+Uxpos9J9dmKv7jNgIRMDvmP7p946Wvl3OZ1XEZpoKn+8FY+mw7pGeop7JHAT8KKIeBwg/feF6WT1tqPR8EdrDKfBOlrlAuBDQDn9PBcYiYjtNcoyXv50/JPp9Lu6vY3W0SqHAFuALylpfrpY0gvo8X0WEb8B/gX4NfA4yT5YTzH2GXR2/3R1/hQh4Gv9eGXXXPspaS/gCuDciHiq0aQ1hsUUhudK0inA5ohYnx3coCyt2q52bO8sktP//xURRwLPkpyO19ON27CTtL34LSTNKgcBLwDe0KAsvbTPGmlHeTu9jQ0VIeAfBRZkPs8HHutQWSaQNJsk3L8WEd9NB/9W0oHp+AOBzenwetvRaPj8GsMbraMVjgPeLGkj8A2SZpoLgH5JlR9xz5ZlvPzp+H2B302yXbWGP9FgHa3yKPBoRNyUfv4OSeD3+j57PfBwRGyJiOeB7wJ/QjH2GXR2/3Rt/kAxAv5m4OVpb/1uJJ1CV3W4TKS975cA90TEZzKjrgIqvfZnkrTNV4YvS3vljwWeTE8FrwNOkLRfWhM7gaQd83HgaUnHputaVrWsWuuYtohYGRHzI2IhyXf9k4h4B/BT4M/rbFelLH+eTh/p8NPSKzZeBrycpIOr5v5M56m3jlZt238AmyS9Ih30OuBuenyfkTTNHCtpz3S9le3q+X1Wo7zt3j/11tEdOt0J0IoXSU/2r0h68j/S6fKkZTqe5FTtdmBD+jqZpF3yeuD+9N/90+kFfD7dhjuAgcyy3g08kL7OygwfAO5M5/kcO+5MrrmOHLZxKTuuojmE5D/7A8C3gTnp8N3Tzw+k4w/JzP+RtOz3kV6t0Gh/1ltHi7dpMTCc7rcrSa6y6Pl9BnwMuDdd91dIroTpuX0GXE7Sj/A8Se357E7un0br6IaXH1VgZlZQRWiiMTOzGhzwZmYF5YA3MysoB7yZWUE54M3MCmrW5JOYzSySxkgueat4K8k1358keZjWNuC8iPhJB4pn1jRfJmlWRdIzEbFX1bAjgd9GxGOSjiC5MaZrnjliVosD3qxKrYCvGl95SuJBETHavpKZ7Ro30ZjtbA9JG9L3D0fE26rGnwrc6nC3bucavFmVRjV4Sa8kef7ICRHxYHtLZrZrfBWNWZMkzQe+ByxzuFsvcMCbNUFSP/ADkl80+r+dLo9ZMxzwZs15P3AY8FFJG9JXq38py6yl3AZvZlZQrsGbmRWUA97MrKAc8GZmBeWANzMrKAe8mVlBOeDNzArKAW9mVlD/H/t761catttfAAAAAElFTkSuQmCC\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "x_data = []\n", "for name in data_dict.keys():\n", " f1 = data_dict[name]['salary']\n", " f2 = data_dict[name]['total_stock_value']\n", " if f1 != \"NaN\" and f2 != \"NaN\":\n", " if f1 > 6e5 or f2 > 1e7:\n", " x_data.append([name,f1, f2])\n", " plt.scatter(f1,f2,marker=\"x\",c=\"r\")\n", " else:\n", " plt.scatter(f1,f2,marker=\".\",c=\"g\")\n", "plt.title(\"Features Visualization\")\n", "plt.ylabel(\"F1\")\n", "plt.xlabel(\"F2\")\n", "plt.show()\n", "# print(\"People who has taken the Highest amount:\")\n", "# for x in x_data:\n", "# print(\"\\nName: %s\\nSalary: %d\\nStock: %d\" % (x[0],x[1],x[2]))" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "ExecuteTime": { "end_time": "2019-09-15T09:02:10.424615Z", "start_time": "2019-09-15T09:02:10.257014Z" } }, "outputs": [], "source": [ "# Task 3: Create new feature(s)\n", "\n", "# 这里想使用PCA对于财务方面的数据做一个分析,然后看看效果如何。\n", "# 所以可能要输入全部的特征给PCA才比较有效果\n", "from sklearn.decomposition import PCA\n", "# \n", "\n", "\n", "all_financial_features = ['poi', 'deferral_payments', 'total_payments', 'loan_advances',\n", " 'bonus', 'restricted_stock_deferred', 'deferred_income',\n", " 'total_stock_value', 'expenses', 'exercised_stock_options', \n", " 'other', 'long_term_incentive', 'restricted_stock', 'director_fees']\n", "\n", "\n", "\n", "\n", "finance_data = featureFormat(\n", " data_dict, all_financial_features, remove_NaN=True)\n", "poi_labels, finance_features = targetFeatureSplit(finance_data)\n", "\n", "features_train, features_test ,labels_train, labels_test = train_test_split(finance_features, poi_labels, test_size = 0.3, random_state = 42)\n", "features_train = np.array(features_train)\n", "features_test = np.array(features_test)\n", "finance_features = np.array(finance_features)\n", "\n", "all_features_pca = np.array(finance_features)\n", "\n", "finance_pca = PCA(n_components=3, whiten=True)\n", "all_finance_pca = PCA(n_components=2, whiten=True)\n", "#all_pca.fit_\n", "all_features_pca = all_pca.fit_transform(finance_features)\n", "finance_features_train = pca.fit_transform(features_train)\n", "finance_features_test = pca.transform(features_test)\n", "\n", "\n", "print(pca.explained_variance_)\n", "print(pca.explained_variance_ratio_)\n", "# Store to my_dataset for easy export below.\n", "#my_dataset = data_dict\n", "\n", "\n", "# Extract features and labels from dataset for local testing\n", "#data = featureFormat(my_dataset, features_list, sort_keys=True)\n", "#labels, features = targetFeatureSplit(data)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 看看邮件数据里有什么有价值的东西\n", "\n" ] }, { "cell_type": "code", "execution_count": 32, "metadata": { "ExecuteTime": { "end_time": "2019-09-15T09:49:12.665613Z", "start_time": "2019-09-15T09:49:12.649203Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[11270511.41941995 3322099.12922766]\n", "[0.7550185 0.22254946]\n" ] } ], "source": [ "from sklearn.decomposition import PCA\n", "\n", "all_email_features = ['poi','to_messages', 'from_poi_to_this_person', 'from_messages', \n", " 'from_this_person_to_poi', 'shared_receipt_with_poi'] \n", "\n", "email_data = featureFormat(\n", " data_dict, all_email_features, remove_NaN=True)\n", "poi_labels_email, email_features = targetFeatureSplit(email_data)\n", "\n", "email_features_train, email_features_test ,email_labels_train, email_labels_test \\\n", " = train_test_split(email_features, poi_labels_email, test_size = 0.3, random_state = 42)\n", "email_features_train = np.array(email_features_train)\n", "email_features_test = np.array(email_features_test)\n", "\n", "\n", "email_pca = PCA(n_components=2,whiten=True)\n", "train_email_features_pca = email_pca.fit_transform(email_features_train)\n", "print(email_pca.explained_variance_)\n", "print(email_pca.explained_variance_ratio_)\n", "test_email_features_pca = email_pca.transform(email_features_test)" ] }, { "cell_type": "code", "execution_count": 33, "metadata": { "ExecuteTime": { "end_time": "2019-09-15T09:49:14.636440Z", "start_time": "2019-09-15T09:49:14.622153Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Before PCA: 0.9230769230769231\n", "After PCA 0.9230769230769231\n" ] } ], "source": [ "from sklearn.svm import SVC\n", "\n", "email_svm = SVC(kernel=\"rbf\",gamma=\"auto\")\n", "email_svm.fit(email_features_train, email_labels_train)\n", "print(\"Before PCA:\",email_svm.score(email_features_test, email_labels_test))\n", "\n", "email_svm_pca = SVC(kernel=\"rbf\",gamma=\"auto\")\n", "email_svm_pca.fit(train_email_features_pca, email_labels_train)\n", "print(\"After PCA\",email_svm_pca.score(test_email_features_pca, email_labels_test))" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "ExecuteTime": { "end_time": "2019-09-15T09:19:53.592451Z", "start_time": "2019-09-15T09:19:53.578858Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Before PCA: 0.22727272727272727\n", "After PCA: 0.8636363636363636\n" ] } ], "source": [ "# Task 4: Try a varity of classifiers\n", "# Please name your classifier clf for easy export below.\n", "# Note that if you want to do PCA or other multi-stage operations,\n", "# you'll need to use Pipelines. For more info:\n", "# http://scikit-learn.org/stable/modules/pipeline.html\n", "\n", "# Provided to give you a starting point. Try a variety of classifiers.\n", "# 使用朴素贝叶斯对原始数据(未进行PCA处理)进行拟合\n", "clf = GaussianNB()\n", "clf.fit(features_train, labels_train)\n", "print(\"Before PCA:\",clf.score(features_test, labels_test))\n", "\n", "# 使用朴素贝叶斯对经过主成分分析的数据进行拟合\n", "clf_pca = GaussianNB()\n", "clf_pca.fit(finance_features_train, labels_train)\n", "print(\"After PCA:\",clf_pca.score(finance_features_test, labels_test))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**PCA无敌!!!!**我的妈耶,经过PCA以后的数据用最简单的分类器,准确率提升也太夸张了。因为一开始就采用了PCA,所以看到Task5说要把分类器的表现从0.3提升起来,感觉比较奇怪,因为我的分类器已经0.93的准确率了。\n", "所以才做了这个对比。可以看到,即使只选择第一主成分,对于准确率的提升也是很大了。" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "ExecuteTime": { "end_time": "2019-09-15T09:02:34.636528Z", "start_time": "2019-09-15T09:02:34.622874Z" } }, "outputs": [ { "data": { "text/plain": [ "0.8863636363636364" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# 测试支持向量机算法\n", "from sklearn.svm import SVC\n", "\n", "svm = SVC(kernel=\"rbf\",gamma=\"auto\")\n", "svm.fit(finance_features_train, labels_train)\n", "svm.score(finance_features_test, labels_test)" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "ExecuteTime": { "end_time": "2019-09-15T09:02:37.491152Z", "start_time": "2019-09-15T09:02:37.312285Z" } }, "outputs": [ { "data": { "text/plain": [ "0.8181818181818182" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# 测试 Adaboost 算法\n", "from sklearn.ensemble import AdaBoostClassifier\n", "\n", "ada = AdaBoostClassifier(n_estimators=80)\n", "ada.fit(finance_features_train, labels_train)\n", "ada.score(finance_features_test, labels_test)" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "ExecuteTime": { "end_time": "2019-09-15T09:02:40.718031Z", "start_time": "2019-09-15T09:02:40.213094Z" } }, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAsEAAAFlCAYAAAAK1DURAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjAsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+17YcXAAAedklEQVR4nO3df4xlaVkn8O/TPTQqYioZOkoYsqNZYjS4CluLXNgwBY0GVxZc3WUxUVbZZLLKaE3tJsisy1rumB2jxp5KMLoTfiwTUZaguEZRGZophaUGppof8mNQCcEwQTKNpCOgTs90vfvHqZqurunuqa57qu6tez6fpHLr3Dp1zsvNMPOt5z73eau1FgAAGJIjk14AAAAcNCEYAIDBEYIBABgcIRgAgMERggEAGBwhGACAwblmEjd90pOe1K6//vpJ3BoAgAE5ffr0F1trx3c+P5EQfP3112d9fX0StwYAYECq6q8v9bx2CAAABkcIBgBgcIRgAAAGRwgGAGBwhGAAAAZHCAYAYHCEYAAABkcIBgBgcIRgAAAGp5cQXFVzVfX2qvpUVd1XVaM+rgsAAPuhr22TV5L8cWvt31bVsSRf19N1AQCgd2OH4Kr6hiTPS/JjSdJaO5fk3LjXHaK1tWR1NVlYSEZq6QAA+6aPSvC3JDmT5E1V9Z1JTidZbK19tYdrD8baWnLiRHLuXHLsWHLqlCAMALBf+ugJvibJM5P8emvtGUm+muQ1O0+qqhurar2q1s+cOdPDbWfL6moXgM+f7x5XVye9IgCA2dVHCL4/yf2ttQ9sHr89XSi+SGvtjtbafGtt/vjx4z3cdrYsLHQV4KNHu8eFhUmvCABgdo3dDtFa+0JVfa6qvrW19hdJTiT55PhLG5bRqGuB0BMMALD/+poO8VNJ3rI5GeIzSX68p+sOymgk/AIAHIReQnBr7SNJ5vu4FgAA7Dc7xgEAMDhCMAAAgyMEAwAwOEIwAACDIwQDADA4QjAAAIMjBAMAMDhCMAAAgyMEAwAwOEIwAACDIwQDADA4QjAAAIMjBAMAMDhCMAAAgyMEAwAwOEIwAACDIwQDADA4QjDABKytJbfd1j0CcPCumfQCAIZmbS05cSI5dy45diw5dSoZjSa9KoBhUQkGOGCrq10APn++e1xdnfSKAIZHCAY4YAsLXQX46NHucWFh0isCGB7tEAAHbDTqWiBWV7sArBUC4OAJwQATMBoJvwCTpB0CAIDBEYIBABgcIRgAgMERggEAGBwhGACAwellOkRVfTbJl5OcT/Jwa22+j+sCAMB+6HNE2vNba1/s8XoAALAvtEMAADA4fYXgluRdVXW6qm681AlVdWNVrVfV+pkzZ3q6LQAAXL2+QvBzW2vPTPJ9SV5VVc/beUJr7Y7W2nxrbf748eM93RYAAK5eLyG4tfb5zccHkrwjybP6uC4AAOyHsUNwVT2hqp649X2S703y8XGvCwAA+6WP6RDfmOQdVbV1vd9qrf1xD9cFAIB9MXYIbq19Jsl39rAWAAA4EEakAQAwOEIwAACDIwQDADA4QjAAAIMjBAMAMDhCMAAAgyMEAwAwOEIwAACDIwQDADA4QjAAAIMjBAMAMDhCMAAAgyMEAwAwOEIwAACDIwQDADA4QjAAAIMjBAMAMDhCMAAAgyMEAwAwOEIwAACDIwQDADA4QjAAAIMjBAMAMDhCMAAAgyMEAwAwOEIwAACDIwQDADA4vYXgqjpaVR+uqj/o65oAALAf+qwELya5r8frAQDAvuglBFfVdUm+P8nr+7geAADsp74qwbcneXWSjZ6uBwAA+2bsEFxVL07yQGvt9GOcd2NVrVfV+pkzZ8a9LQAA7FkfleDnJnlJVX02yVuTvKCqfnPnSa21O1pr8621+ePHj/dwWwAA2JuxQ3Br7ZbW2nWtteuTvDzJe1prPzL2ygAAYJ+YEwwAwOBc0+fFWmurSVb7vCYAAPRNJRgAgMERggEAGBwhGACAwRGCAQAYHCEYAIDBEYIBABgcIRgAgMERggEAGBwhGACAwRGCAQAYHCEYAIDBEYIBABgcIRgAgMERggEAGBwhGACAwRGCAQAYHCEYAIDBEYIBABgcIRgAgMERggEAGBwhGACAwRGCAQAYHCEYAIDBEYIBABgcIRgAgMERggEAGBwhGACAwRGCAQAYnLFDcFV9TVV9sKo+WlWfqKqf72NhAACwX67p4RoPJnlBa+0rVfW4JO+rqj9qrd3Tw7UBAKB3Y4fg1lpL8pXNw8dtfrVxrwsAAPull57gqjpaVR9J8kCSu1prH7jEOTdW1XpVrZ85c6aP2wIAwJ70EoJba+dba9+V5Lokz6qqp1/inDtaa/Ottfnjx4/3cVsAANiTXqdDtNbOJllN8qI+rwsAAH3qYzrE8aqa2/z+a5O8MMmnxr0uAADslz6mQzw5yZur6mi6UP221tof9HBdAADYF31Mh/jzJM/oYS0AAHAg7BgHAMDgCMEAAAyOEAwAwOAIwQAADI4QDADA4AjBAAAMjhAMAMDgCMEAAAyOEAwAwOAIwQAADI4QDADA4AjBAAAMjhAMAMDgCMEAAAyOEDxD1taS227rHgEAuLxrJr0A+rG2lpw4kZw7lxw7lpw6lYxGk14VAMB0UgmeEaurXQA+f757XF2d9IoAAKaXEDwjFha6CvDRo93jwsKkVwQAML20Q8yI0ahrgVhd7QKwVggAgMsTgmfIaCT8AgDshnYIAAAGRwgGAGBwhGAAAAZHCAYAYHCEYAAABkcIBgBgcIRgAAAGZ+wQXFVPraq7q+q+qvpEVS32sTAAANgvfWyW8XCS/9Ja+1BVPTHJ6aq6q7X2yR6uDQAAvRu7Etxa+5vW2oc2v/9ykvuSPGXc6wIAwH7ptSe4qq5P8owkH+jzugAA0KfeQnBVfX2S30lyc2vt7y7x8xurar2q1s+cOdPXbQEA4Kr1EoKr6nHpAvBbWmu/e6lzWmt3tNbmW2vzx48f7+O2AACwJ31Mh6gkb0hyX2vtV8dfEgAA7K8+KsHPTfKjSV5QVR/Z/PpXPVwXAAD2xdgj0lpr70tSPawFAAAOhB3jAAAYHCEYAIDBEYIBABgcIRgAgMERgg9Sa1c+BgDgQAjBB2V5OVlauhB8W+uOl5cnuSoAgEESgg9Ca8nZs8nKyoUgvLTUHZ89qyIMAHDAxp4TzC5UJSdPdt+vrHRfSbK42D1fxiwDABykahOoQs7Pz7f19fUDv+/EtZYc2VZ839gQgAEA9lFVnW6tze98XjvEQdlqgdhue48wAAAHRgg+CNt7gBcXuwrw4uLFPcIAABwYPcEHoSqZm7u4B3irR3huTksEAMAB0xN8kFq7OPDuPAYAoFd6gqfBzsArAAMATIQQDADA4AjBAAAMjhAMAMDgCMEAAAyOEAwAwOAIwVNubS257bbuEQCAftgsY4qtrSUnTiTnziXHjiWnTiWj0aRXBQBw+KkET7HV1S4Anz/fPa6uTnpFHHbeWQCAjkrwFFtY6CrAW5XghYVJr4jDzDsLAHCBEDzFRqMuqKyudgFYYGEcl3pnwT9TAAyVEDzlRiNBhX54ZwEALhCCYSC8swAAFwjBQ9daUnX5Y2aKdxYAoGM6xJAtLydLS13wTbrHpaXueQCAGdZLCK6qN1bVA1X18T6uxwFoLTl7NllZuRCEl5a647NnLwRjAIAZ1Fc7xP9O8rokd/Z0PfZbVXLyZPf9ykr3lSSLi93zWiIAgBnWSyW4tfZnSb7Ux7U4QNuD8BYBGAAYgAPrCa6qG6tqvarWz5w5c1C35Uq2WiC2294jDAAwow4sBLfW7mitzbfW5o8fP35Qt+VytvcALy4mGxvd4/YeYQCAGWVEWk/W1q5+/upefqc3Vcnc3MU9wFutEXNzWiIAgJkmBPdgbS05ceLCTlynTj12qN3L7/RuefniucBbQVgABgBmXF8j0n47yVqSb62q+6vqP/Zx3cNidbULs+fPd4+rq/vzO/tiZ+AVgAGAAeilEtxa++E+rnNYLSx01dytqu7Cwv78DgAA/dAO0YPRqGtnuJr+3r38DgAA/ag2gSkA8/PzbX19/cDvCwDAsFTV6dba/M7nD2xE2mG1tpbcdlv3OMv3BAAYEu0QVzCJCQ5TMTUCAGDGqQRfwSQmOEzN1AgAgBkmBF/B1gSHo0f7meCwmzaHvu8JAMCjaYe4gj4nOOy2zcHUCACA/ScEP4bRqJ8geqk2h8tdt697AgBwadohDog2BwCA6aESfEC0OQAATA8heEtrSdXlj3d7zhVocwAAmA7aIZJkeTlZWupCbdI9Li11z1/NOdvt3InvKnbms1kGAMD+EoJbS86eTVZWkqWlrL2/5d5/udQdnz3b/XzHOY8E4O3nbHe1gXmbrSkSr31t9ygIAwD0TztEVXLyZPf9ykpGKytJktcdXcw/f9nJjLbaHbadk81zsriYnDyZtXvqQq/vs7cF5q3f2wrMi4uP2UJxNVMkAADYm2pX8TZ9X+bn59v6+vqB3/eKWkuOXCiMX3NkI7f+QuWWWy5/TjY2snZPPXr+77O3VYq3bAbmx+ohtm0yAEB/qup0a21+5/PaIZIL7Qrb3F5LWbihXfGcLC1l9e726G2Ot1eXt+wiACcXpkjceqsAfJD0YQPAsAjB2/t7Fxez9v82cu9zFnPT+ZWM3rZ0oSd4e0vDxkb3uLKShY+u5NixdvH838sE5t1+OG40Sm65RQA+KPqwAWB49ARXJXNzj7QrjKqS951MltI9v1W93XbO9krvaO5sTt28oyd4e2De3hOc7LoizMHRhw0Aw6MneEufc4KXl7sPx20F3q3K8NzcriZEcLD0YQPA7LpcT7AQvEdra4+x+9uYG2vst8dc/8B4PQBgNl0uBGuH2INdVQ53Bt4pC8Aqnxezmx8ADIsPxu3BpXpIr8rV7iY3xu5zlzL2+gEADrlhV4L32LKwsNBVUB98sDv92muv4p6b/cJrLzuZ1T+tLNzQuikUl+sX3of+4q31b1WCFxb2dBkAgENruJXgMbY2Ho2S229Pjh7tpqXdfPMux2ptbr+8tvKBnLjhobz2tS0nbngod6x8NbfdNZ+191+i4ns12zXvklnEAMDQDbMSvD1cJle9tXGS/O3fdgF4Y+MqxmptjlZbvfcPc+79R3I+lQdzJDcd+Y1sfOBIjr2wLg6lO7Z03rld89Ya9/KhLj2wAMCQDXc6xPaq6pZdbm2cjPfhsrX3t5x47j/kXB6XSsvGkcdlY6Ny9GhXnb1oq+atte7Yrnl7APYhNwCAS7Nt8k5jbG2cjNFS0Loe4FM5kVvz3/NreVUef+ShHD3aLt2f+xi7z03zh9xsRQwATKteQnBVvaiq/qKqPl1Vr+njmvvuMcLl2lryEz/RfV0qxO1pruy26vNo8btzy8b/zI2LT8iph2/Ird/9hzn17nbxta6wXfPWWrc+5HbRts1TwFbEAMA0G7snuKqOJvm1JN+T5P4k91bV77fWPjnutffNznC5Y2vjtZedzPNfUHnwwe70N77x4p7fPbcg7NiieasaPcpSRnPryXNevKvzkzyypfNWRXraNnqwFTEAMM36+GDcs5J8urX2mSSpqrcmeWmS6Q3BVfncl+fyhecs5uGXncxoR7hc/dPKuXMXTj93ruXOO+uRELd6d8u5c7W3gLe8fPEH77bufbk2jF2cP40fcjOGDQCYZn2E4Kck+dy24/uTfHcP1903a2vJid9ezrkH27aJDBfC5cLahTnA1+ShnM+RvOlNR/KKV1RGz275sY8uZaPm8nNHl68Y8C7bMnG1u8lN8e5zlzOtFWoAgKSfnuBLJbJHjZyoqhurar2q1s+cOdPDbffukbfqN7qWh+XlzZ7VzXA5GiV3350861ktGzmSlqN5+NxGVu/u2iie/LaVvPIHz+bW/9Eu2wqhJ7Z7XbYmXfiAHAAwTfqoBN+f5Knbjq9L8vmdJ7XW7khyR9KNSOvhvnu2fce3jY3k3e9O3vvernKZXKhe3n575cSJIzn3j+dzrD2YhZ89keSeZHExTz55MrdcoSKrJ7ZjhBsAMI36qATfm+RpVfXNVXUsycuT/H4P1903W2/Vv/CF3fjdrQ0v7rzz4uptkpw6Vbn1F47kVE5klHu6J3cxSm3XUxt2zmnedjwLI8bGGeE2C//7AYDpNHYluLX2cFXdlORPkhxN8sbW2ifGXtk+G42SH/qh5D3v6Y6PHesedwa2W17TMvo/S8lWAE6SpaWsvexkVv+0cu213e5xO/ted9UTu7zc7Vy3Fapby9rLV7L6xafn2n//wtx88+GvoO71A3IqyJOxp9F/AHAI9bJtcmvtnUne2ce1DsoddyQ33dQF3muuSW6/PfmO70je/OZtge2Gy49SO/265L9tnMxGqxw5kjz+8Y8Oalec2nCJrZvXXr6SE2+7Mefq8Tny3pbz5+vqtmWeQnv9gJx2koPnDw8AhqSXEHzYrK0lr3pV8vDD3fHDDycf/nBy4407A1sl73r0nN57702+tDaXjda1RGxsdP3FVxXUto9lW1lJVlaymtfkXD0+59vRtPNdq0bV4R8xtpcRbkasHTx/eAAwJIMMwaurXXDd0lq3IcYrXnGJwHaJOb0P//LJ/OILK0cevHCdjY3k2muvciFbQXizGryQ1Rz7miOPBL/bb+9aLa699kIv7VBCiRFrB88fHgAMySBD8MJC177wD/9w4bnz569Q+drxIbjRc7rZwsvLyV13dRn5yJEusO7GI32XN7SM3nZh6+ZR7smpf72S1e9czMLzu805DvIt6mnrB53GTUBmmT88ABiSQYbg0airsv7kT3bhN+mmOFxN5Ws06kLwe997dZWzR0Ltgy2311JG5y/uNx6tLGX05M8mzz6ZpA7sLWr9oCT+8ABgOAYZgpOLq7ZVyStfefX/8d9L5Wz7Rh1fqrnc+5zF/Itt/cZJkrm5R6rP40xX2NO69IMCAAMwyBB8xx3J7/1elzO35vi+4hUHc+/tofYXjy3ne3754n7jnTOI9xK091LV1Q8KAAzJ4ELwz/xM8ku/dOH4B34gefWr91b13EvYfHSo3bHpxiU24bjat6j3UtXVDwoADMmgQvDaWvIrv3Lxc3//93sPfHttIdjvvsu9VnX1gwIAQzGoEHznnRePRku6XeO2+mcvt/vb5UxrC8G0V3WnbQoFADA8gwrBX/jCxcdPe1q3ScZP/3Ty0ENdQL7c7m+XMs1hc1qruqZQAADTYFAh+LOfvfj4r/6q+9puY6ObH3zzzd0Ytd0EYSFu90yhAACmwaBC8Mc+tvtzP/jB5PnPT+6+O49sWrFV8U261oqtyvI3fdOF3ea2G/dt/1lsG5jWFhIAYFgGFYK3NsbYra1KZXLhLfxrrumqxQ89dPG5b3rThcCcdGPYbrqpu+du2yu2m9W2gWluIQEAhuPIpBdwkL7t2y7/s0tMJnukUrnzLfydATi5ODCvrSWvetWFPuMHH7zws62f33Zb93g5l2obmBWjUXLLLQIwADA5g6oEf/KTybd/e/KpT3XV2X/8x+75quSlL03e+c4uuFYlL3nJxfODt97Cv1wlePtb+6urF0+h2L4l824rvNoGAAD2z6BCcNIF4eTRYfTVr+6+LvU2/c638JMr9wQvLHQh+8EHu2kTr3vdhZ/t9oNh2gYAAPZPtdYO/Kbz8/NtfX39wO+7035+8Oxy157VXl8AgGlUVadba/OPen7IIXhSZnHqwzTwugIAO10uBA+uHWIamC3cv6upsAvLAIAQzEzYba+1dhQAIBnYiDRmw6VGzG1N0zh69MrTNGZ59BwAsHsqwRwql6vk7naahtFzAEAiBHPIXKntYTe91kbPjam1i3eW2XkMAIeEEMyh0kcld78/mDizH7xbXk7Onk1OnuyCb2vJ0lIyN9f9DAAOESGYQ2XaK7kz+8G71roAvLLSHZ882QXglZVkcVFFGIBDRwjm0JnmEXO7nVJx6FR1wTfpgu9WGF5cvFAZBoBDxHQI6NFup1QcStuD8BYBGIBDSgiGHm21a9x66wy1QmzZ6gHebmmpex4ADpmxQnBV/buq+kRVbVTVo7ajgyEajZJbbpnRALzVA7yx0T2urAjCABxK4/YEfzzJDyb5Xz2sBZhWVd0UiO09wFutEXNzWiIAOHTGCsGttfuSpPwHEGbf8vLFUyC2grD//wNwCB1YT3BV3VhV61W1fubMmYO6LdCnnYFXAAbgkHrMSnBVvTvJN13iRz/bWvu/u71Ra+2OJHckyfz8vAZCAAAm5jFDcGvthQexEABgky3KYd8ZkQYA02R5+eKpK1vTWWxPDr0ad0Tav6mq+5OMkvxhVf1JP8sCgAHavkX5VhDeGk949qxxhNCjcadDvCPJO3paCwAMmy3K4cBUm8BflfPz8219ff3A7wsAh0JryZFtb9ZubAjAsEdVdbq19qhN3fQEA8A0sUU5HAghGACmhS3K4cCMu20yANAXW5TDgdETDADTxpxg6I2eYAA4LGxRDvtOCAYAYHCEYAAABkcIBgBgcIRgAAAGRwgGAGBwhGAAAAZHCAYAYHCEYAAABkcIBgBgcIRgAAAGRwgGAGBwhGAAAAZHCAYAYHCEYAAABkcIBgBgcIRgAAAGRwgGAGBwhGAAAAZHCAYAYHCEYAAABkcIBgBgcMYKwVX1y1X1qar686p6R1XN9bUwAADYL+NWgu9K8vTW2j9L8pdJbhl/SQAAsL/GCsGttXe11h7ePLwnyXXjLwkAgJnQ2pWPJ6jPnuBXJvmjHq8HAMBhtbycLC1dCL6tdcfLy5Nc1SMeMwRX1bur6uOX+HrptnN+NsnDSd5yhevcWFXrVbV+5syZflYPAMD0aS05ezZZWbkQhJeWuuOzZ6eiIlxtzEVU1X9I8p+SnGit/f1ufmd+fr6tr6+PdV8AAKbY9uC7ZXExOXkyqTqwZVTV6dba/KOeHycEV9WLkvxqkhtaa7su7wrBAAAD0FpyZFvjwcbGgQbg5PIheNye4NcleWKSu6rqI1X1G2NeDwCAWbBVCd5ue4/whF0zzi+31v5pXwsBAGBGbG+F2GqB2N4accAtEZcyVggGAIBHqUrm5i7uAT55svvZ3NzEA3DSwwfj9kJPMADAALR2ceDdeXwA9qsnGAAALm1n4J2CCvAWIRgAgMERggEAGBwhGACAwRGCAQAYHCEYAIDBEYIBABgcIRgAgMERggEAGBwhGACAwRGCAQAYHCEYAIDBqdbawd+06kySvz7wGx9+T0ryxUkvYgZ5XfvnNd0fXtf94XXdH17X/eF1vXr/pLV2fOeTEwnB7E1VrbfW5ie9jlnjde2f13R/eF33h9d1f3hd94fXtT/aIQAAGBwhGACAwRGCD5c7Jr2AGeV17Z/XdH94XfeH13V/eF33h9e1J3qCAQAYHJVgAAAGRwg+BKrqRVX1F1X16ap6zaTXMyuq6o1V9UBVfXzSa5kVVfXUqrq7qu6rqk9U1eKk1zQLquprquqDVfXRzdf15ye9pllRVUer6sNV9QeTXsusqKrPVtXHquojVbU+6fXMiqqaq6q3V9WnNv8dO5r0mg477RBTrqqOJvnLJN+T5P4k9yb54dbaJye6sBlQVc9L8pUkd7bWnj7p9cyCqnpykie31j5UVU9McjrJD/jndTxVVUme0Fr7SlU9Lsn7kiy21u6Z8NIOvar6z0nmk3xDa+3Fk17PLKiqzyaZb62ZZdujqnpzkve21l5fVceSfF1r7eyk13WYqQRPv2cl+XRr7TOttXNJ3prkpRNe00xorf1Zki9Neh2zpLX2N621D21+/+Uk9yV5ymRXdfi1zlc2Dx+3+aWCMaaqui7J9yd5/aTXAldSVd+Q5HlJ3pAkrbVzAvD4hODp95Qkn9t2fH+ECg6Bqro+yTOSfGCyK5kNm2/bfyTJA0nuaq15Xcd3e5JXJ9mY9EJmTEvyrqo6XVU3TnoxM+JbkpxJ8qbN9p3XV9UTJr2ow04Inn51iedUgJhqVfX1SX4nyc2ttb+b9HpmQWvtfGvtu5Jcl+RZVaWFZwxV9eIkD7TWTk96LTPoua21Zyb5viSv2mw9YzzXJHlmkl9vrT0jyVeT+IzQmITg6Xd/kqduO74uyecntBZ4TJs9q7+T5C2ttd+d9HpmzeZboKtJXjThpRx2z03yks3+1bcmeUFV/eZklzQbWmuf33x8IMk70rX1MZ77k9y/7R2gt6cLxYxBCJ5+9yZ5WlV982Yj/MuT/P6E1wSXtPkBrjckua+19quTXs+sqKrjVTW3+f3XJnlhkk9NdlWHW2vtltbada2169P9e/U9rbUfmfCyDr2qesLmh2Kz+Xb99yYxgWdMrbUvJPlcVX3r5lMnkvjA8ZiumfQCuLLW2sNVdVOSP0lyNMkbW2ufmPCyZkJV/XaShSRPqqr7k/xca+0Nk13VoffcJD+a5GOb/atJ8l9ba++c4JpmwZOTvHlzWsyRJG9rrRnpxTT6xiTv6P4ezjVJfqu19seTXdLM+Kkkb9ksiH0myY9PeD2HnhFpAAAMjnYIAAAGRwgGAGBwhGAAAAZHCAYAYHCEYAAABkcIBgBgcIRgAAAGRwgGAGBw/j8MR7i+w2GmfQAAAABJRU5ErkJggg==\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "plt.figure(figsize=(12,6))\n", "for f in range(len(finance_features_train)):\n", " if labels_train[f]:\n", " plt.scatter(finance_features_train[f][0],finance_features_train[f][1],c = \"r\", marker=\"x\")\n", " else:\n", " plt.scatter(finance_features_train[f][0],finance_features_train[f][1],c = \"b\", marker=\".\")\n", "plt.show()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Task 5: Tune your classifier to achieve better than .3 precision and recall\n", "# using our testing script. Check the tester.py script in the final project\n", "# folder for details on the evaluation method, especially the test_classifier\n", "# function. Because of the small size of the dataset, the script uses\n", "# stratified shuffle split cross validation. For more info:\n", "# http://scikit-learn.org/stable/modules/generated/sklearn.cross_validation.StratifiedShuffleSplit.html\n", "\n", "# Example starting point. Try investigating other evaluation techniques!\n", "features_train, features_test, labels_train, labels_test = \\\n", " train_test_split(features, labels, test_size=0.3, random_state=42)" ] }, { "cell_type": "code", "execution_count": 34, "metadata": { "ExecuteTime": { "end_time": "2019-09-15T09:50:20.596483Z", "start_time": "2019-09-15T09:50:20.588076Z" } }, "outputs": [], "source": [ "# Task 6: Dump your classifier, dataset, and features_list so anyone can\n", "# check your results. You do not need to change anything below, but make sure\n", "# that the version of poi_id.py that you submit can be run on its own and\n", "# generates the necessary .pkl files for validating your results.\n", "\n", "dump_classifier_and_data(email_svm_pca, data_dict, all_email_features)" ] }, { "cell_type": "code", "execution_count": 35, "metadata": { "ExecuteTime": { "end_time": "2019-09-15T09:50:24.883604Z", "start_time": "2019-09-15T09:50:23.467614Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Got a divide by zero when trying out: SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,\n", " decision_function_shape='ovr', degree=3, gamma='auto', kernel='rbf',\n", " max_iter=-1, probability=False, random_state=None, shrinking=True,\n", " tol=0.001, verbose=False)\n", "Precision or recall may be undefined due to a lack of true positive predicitons.\n" ] } ], "source": [ "#!/usr/bin/pickle\n", "\n", "\"\"\" a basic script for importing student's POI identifier,\n", " and checking the results that they get from it \n", " \n", " requires that the algorithm, dataset, and features list\n", " be written to my_classifier.pkl, my_dataset.pkl, and\n", " my_feature_list.pkl, respectively\n", "\n", " that process should happen at the end of poi_id.py\n", "\"\"\"\n", "sys.path.append(\"../tools/\")\n", "from feature_format import featureFormat, targetFeatureSplit\n", "import pickle\n", "import sys\n", "from sklearn.model_selection import StratifiedShuffleSplit\n", "\n", "\n", "PERF_FORMAT_STRING = \"\\\n", "\\tAccuracy: {:>0.{display_precision}f}\\tPrecision: {:>0.{display_precision}f}\\t\\\n", "Recall: {:>0.{display_precision}f}\\tF1: {:>0.{display_precision}f}\\tF2: {:>0.{display_precision}f}\"\n", "RESULTS_FORMAT_STRING = \"\\tTotal predictions: {:4d}\\tTrue positives: {:4d}\\tFalse positives: {:4d}\\\n", "\\tFalse negatives: {:4d}\\tTrue negatives: {:4d}\"\n", "\n", "\n", "def test_classifier(clf, dataset, feature_list, folds=1000):\n", " data = featureFormat(dataset, feature_list, sort_keys=True)\n", " labels, features = targetFeatureSplit(data)\n", " #cv = StratifiedShuffleSplit(labels, folds, random_state=42)\n", " cv = StratifiedShuffleSplit(n_splits=folds, random_state=42)\n", " true_negatives = 0\n", " false_negatives = 0\n", " true_positives = 0\n", " false_positives = 0\n", " for train_idx, test_idx in cv.split(features, labels):\n", " features_train = []\n", " features_test = []\n", " labels_train = []\n", " labels_test = []\n", " for ii in train_idx:\n", " features_train.append(features[ii])\n", " labels_train.append(labels[ii])\n", " for jj in test_idx:\n", " features_test.append(features[jj])\n", " labels_test.append(labels[jj])\n", "\n", " # fit the classifier using training set, and test on test set\n", " clf.fit(features_train, labels_train)\n", " predictions = clf.predict(features_test)\n", " for prediction, truth in zip(predictions, labels_test):\n", " if prediction == 0 and truth == 0:\n", " true_negatives += 1\n", " elif prediction == 0 and truth == 1:\n", " false_negatives += 1\n", " elif prediction == 1 and truth == 0:\n", " false_positives += 1\n", " elif prediction == 1 and truth == 1:\n", " true_positives += 1\n", " else:\n", " print(\"Warning: Found a predicted label not == 0 or 1.\")\n", " print(\"All predictions should take value 0 or 1.\")\n", " print(\"Evaluating performance for processed predictions:\")\n", " break\n", " try:\n", " total_predictions = true_negatives + \\\n", " false_negatives + false_positives + true_positives\n", " accuracy = 1.0*(true_positives + true_negatives)/total_predictions\n", " precision = 1.0*true_positives/(true_positives+false_positives)\n", " recall = 1.0*true_positives/(true_positives+false_negatives)\n", " f1 = 2.0 * true_positives / \\\n", " (2*true_positives + false_positives+false_negatives)\n", " f2 = (1+2.0*2.0) * precision*recall/(4*precision + recall)\n", " print(clf)\n", " print(PERF_FORMAT_STRING.format(accuracy, precision,\n", " recall, f1, f2, display_precision=5))\n", " print(RESULTS_FORMAT_STRING.format(total_predictions,\n", " true_positives, false_positives, false_negatives, true_negatives))\n", " print(\"\")\n", " except:\n", " print(\"Got a divide by zero when trying out:\", clf)\n", " print(\"Precision or recall may be undefined due to a lack of true positive predicitons.\")\n", "\n", "\n", "CLF_PICKLE_FILENAME = \"my_classifier.pkl\"\n", "DATASET_PICKLE_FILENAME = \"my_dataset.pkl\"\n", "FEATURE_LIST_FILENAME = \"my_feature_list.pkl\"\n", "\n", "\n", "def dump_classifier_and_data(clf, dataset, feature_list):\n", " with open(CLF_PICKLE_FILENAME, \"wb\") as clf_outfile:\n", " pickle.dump(clf, clf_outfile)\n", " with open(DATASET_PICKLE_FILENAME, \"wb\") as dataset_outfile:\n", " pickle.dump(dataset, dataset_outfile)\n", " with open(FEATURE_LIST_FILENAME, \"wb\") as featurelist_outfile:\n", " pickle.dump(feature_list, featurelist_outfile)\n", "\n", "\n", "def load_classifier_and_data():\n", " with open(CLF_PICKLE_FILENAME, \"rb\") as clf_infile:\n", " clf = pickle.load(clf_infile)\n", " with open(DATASET_PICKLE_FILENAME, \"rb\") as dataset_infile:\n", " dataset = pickle.load(dataset_infile)\n", " with open(FEATURE_LIST_FILENAME, \"rb\") as featurelist_infile:\n", " feature_list = pickle.load(featurelist_infile)\n", " return clf, dataset, feature_list\n", "\n", "\n", "def main():\n", " # load up student's classifier, dataset, and feature_list\n", " clf, dataset, feature_list = load_classifier_and_data()\n", " # Run testing script\n", " test_classifier(clf, dataset, feature_list)\n", "\n", "\n", "main()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.3" }, "varInspector": { "cols": { "lenName": 16, "lenType": 16, "lenVar": 40 }, "kernels_config": { "python": { "delete_cmd_postfix": "", "delete_cmd_prefix": "del ", "library": "var_list.py", "varRefreshCmd": "print(var_dic_list())" }, "r": { "delete_cmd_postfix": ") ", "delete_cmd_prefix": "rm(", "library": "var_list.r", "varRefreshCmd": "cat(var_dic_list()) " } }, "types_to_exclude": [ "module", "function", "builtin_function_or_method", "instance", "_Feature" ], "window_display": false } }, "nbformat": 4, "nbformat_minor": 2 }