====== Data Analysis for ENSIIE (second year) ====== **PROJET A RENDRE LE 10 JANVIER 2020 par mail à l'adresse christophe.ambroise@univ-evry.fr** Le jeu de données se trouve sur https://www.kaggle.com/jboysen/state-firearms ===== Corrigé des quizz ===== * {{:members:cambroise:teaching:doc-corrige.pdf| Corrigé QUIZZ 1 }} * {{:members:cambroise:teaching:doc-corrige2.pdf| Corrigé QUIZZ 2 }} ===== Lecture Notes ===== * {{:members:cambroise:teaching:cours01-ensiee.pdf| Introduction, probability reminders and multivariate normal distribution}} * {{:members:cambroise:teaching:cours02-ensiee.pdf| Clustering: geometrical approaches}} * {{:members:cambroise:teaching:acp-2018.pdf| Principal component Analysis}} * {{:members:cambroise:teaching:kpca.pdf| Kernel PCA (slides from Rita Osadchy)}} * {{:members:cambroise:teaching:spectral-clustering.pdf| Spectral Clustering (slides from David Sonntag)}} ===== Exercices ===== * {{:members:cambroise:teaching:multivariate-normal-exercices.pdf|Exercices (multivariate normal distribution)}} * {{:members:cambroise:teaching:td-clustering.pdf| Clustering}} * {{:members:cambroise:teaching:td-mixture.pdf| Mixture Analysis}} * {{:members:cambroise:teaching:td-acp.pdf| PCA and PCoA}} * Globin Data D <- read.table(text="MYG_PHYCA 0.0000 0.1806 0.2434 0.3964 0.5656 0.4987 1.9654 2.1040 2.1278 2.0965 2.2725 2.0807 1.9645 1.9928 1.9195 2.0944 1.9867 1.9486 1.8515 1.9880 2.6100 MYG_HUMAN 0.1806 0.0000 0.1929 0.2997 0.4852 0.4271 1.9675 2.0689 2.2427 2.1483 2.2753 2.0387 2.0941 2.1273 1.9495 2.0628 2.1114 1.9951 1.9200 2.0044 2.5663 MYG_MOUSE 0.2434 0.1929 0.0000 0.3432 0.5312 0.4635 1.8727 2.1478 2.1478 2.1092 2.2318 1.9386 2.0581 2.0567 1.9920 2.1235 2.1776 2.0310 1.9519 2.0735 2.6225 MYG_CHICK 0.3964 0.2997 0.3432 0.0000 0.3657 0.3196 1.8520 2.0577 2.0649 1.8216 1.9345 2.0096 1.9935 2.0463 1.8520 1.9878 2.1320 1.9407 1.8823 2.0378 2.5424 MYG_ALLMI 0.5656 0.4852 0.5312 0.3657 0.0000 0.2970 1.8912 2.0551 2.0572 1.7896 1.9478 1.9237 1.7647 1.9622 1.9429 1.9423 2.0500 1.9352 1.9823 2.0511 2.3154 MYG_CHEMY 0.4987 0.4271 0.4635 0.3196 0.2970 0.0000 1.7142 1.9036 1.9751 1.6927 1.8907 1.8523 1.8770 1.8414 1.7849 1.8503 1.9604 1.9075 1.8643 1.7584 2.4536 HBB_CHICK 1.9654 1.9675 1.8727 1.8520 1.8912 1.7142 0.0000 0.2561 0.3093 0.4523 0.4192 0.4873 0.5325 1.1029 1.0926 1.2118 1.1729 1.1009 1.1261 1.1767 2.0827 HBB_CHRPI 2.1040 2.0689 2.1478 2.0577 2.0551 1.9036 0.2561 0.0000 0.3486 0.4529 0.4763 0.5700 0.5593 1.2466 1.1259 1.2788 1.2850 1.2104 1.2175 1.2384 2.0504 HBB1_IGUIG 2.1278 2.2427 2.1478 2.0649 2.0572 1.9751 0.3093 0.3486 0.0000 0.4923 0.4896 0.5368 0.6719 1.1485 1.1610 1.2959 1.1969 1.1418 1.1076 1.1371 2.2368 HBB_PHYCA 2.0965 2.1483 2.1092 1.8216 1.7896 1.6927 0.4523 0.4529 0.4923 0.0000 0.1716 0.3657 0.7177 1.1980 1.1738 1.2054 1.2110 1.2201 1.1237 1.3139 2.1681 HBB_HUMAN 2.2725 2.2753 2.2318 1.9345 1.9478 1.8907 0.4192 0.4763 0.4896 0.1716 0.0000 0.2601 0.8439 1.2198 1.2138 1.2014 1.1185 1.0397 1.0545 1.2736 2.1545 HBB1_MOUSE 2.0807 2.0387 1.9386 2.0096 1.9237 1.8523 0.4873 0.5700 0.5368 0.3657 0.2601 0.0000 0.8461 1.1385 1.1809 1.2038 1.1498 1.0818 1.1020 1.2044 2.0275 HBB_ALLMI 1.9645 2.0941 2.0581 1.9935 1.7647 1.8770 0.5325 0.5593 0.6719 0.7177 0.8439 0.8461 0.0000 1.1711 1.2448 1.2727 1.2456 1.3315 1.2844 1.3254 2.0693 HBA_CHICK 1.9928 2.1273 2.0567 2.0463 1.9622 1.8414 1.1029 1.2466 1.1485 1.1980 1.2198 1.1385 1.1711 0.0000 0.2987 0.3798 0.4657 0.3991 0.3995 0.6689 2.1705 HBA_CHRPI 1.9195 1.9495 1.9920 1.8520 1.9429 1.7849 1.0926 1.1259 1.1610 1.1738 1.2138 1.1809 1.2448 0.2987 0.0000 0.3752 0.5381 0.4647 0.5060 0.7493 2.1054 HBA_ALLMI 2.0944 2.0628 2.1235 1.9878 1.9423 1.8503 1.2118 1.2788 1.2959 1.2054 1.2014 1.2038 1.2727 0.3798 0.3752 0.0000 0.5438 0.4856 0.4472 0.7831 2.3011 HBA_PHYCA 1.9867 2.1114 2.1776 2.1320 2.0500 1.9604 1.1729 1.2850 1.1969 1.2110 1.1185 1.1498 1.2456 0.4657 0.5381 0.5438 0.0000 0.1639 0.2272 0.6760 1.8766 HBA_HUMAN 1.9486 1.9951 2.0310 1.9407 1.9352 1.9075 1.1009 1.2104 1.1418 1.2201 1.0397 1.0818 1.3315 0.3991 0.4647 0.4856 0.1639 0.0000 0.1675 0.6708 1.9058 HBA_MOUSE 1.8515 1.9200 1.9519 1.8823 1.9823 1.8643 1.1261 1.2175 1.1076 1.1237 1.0545 1.1020 1.2844 0.3995 0.5060 0.4472 0.2272 0.1675 0.0000 0.7283 1.9680 HBA1_IGUIG 1.9880 2.0044 2.0735 2.0378 2.0511 1.7584 1.1767 1.2384 1.1371 1.3139 1.2736 1.2044 1.3254 0.6689 0.7493 0.7831 0.6760 0.6708 0.7283 0.0000 2.1875 GLB3_MYXGL 2.6100 2.5663 2.6225 2.5424 2.3154 2.4536 2.0827 2.0504 2.2368 2.1681 2.1545 2.0275 2.0693 2.1705 2.1054 2.3011 1.8766 1.9058 1.9680 2.1875 0.0000",row.names=1) * {{:members:cambroise:teaching:td-kpca.pdf| Kernel PCA and Spectral Clustering}} ===== Project ===== * {{:members:cambroise:teaching:projet-mad-2019.pdf| Mélange de Bernoulli multivarié}} * {{https://www.dropbox.com/s/hfnw08e0v3tgjv7/Projet-2019.rmd?dl=0 | simulation}} ===== Quizz ===== * {{:members:cambroise:teaching:quizz2-em-pca.pdf| Quizz 2 with correction}} ====== References ====== * [[https://www6.inra.fr/mia-paris/content/download/4587/42934/version/1/file/ModelsHiddenStruct-Biology.pdf| Lecture notes of Stéphane Robin about latent structures]] * [[ http://users.isr.ist.utl.pt/~wurmd/Livros/school/Bishop%20-%20Pattern%20Recognition%20And%20Machine%20Learning%20-%20Springer%20%202006.pdf | Pattern Recognition and Machine Learning]] CM Bishop's books (Chapter 1, 2, 9, 10 et 12). * [[http://wikistat.fr/pdf/st-m-explo-acp.pdf | Notes de cours de Philipe Besse sur l'ACP]]