info232 syllabus 2019-2020

[Notes for next year]

Info 232

Projet Licence 2 Informatique

Introduction à la science des données avec Python

Isabelle Guyon, professeur

Chargés de TP :

Victor Estrade, Adrien Pavao, Herilalaina Rakotoarison (Heri)

Questions : info232@chalearn.org

Website : http://saclay.chalearn.org/ 

Homework submission : https://chagrade.lri.fr/

 (Version 5.2, 16 Mars, 2020)

Due to the Coronavirus epidemic classes are now "virtual". Each team will meet weakly with one of the instructors. 

The goal of this class is to practice programming in Python while having fun. As you may know, "data science" has been called the sexiest job of the 21st century. We will explore how to solve data science problems with Python. Our master students have prepared mini-data science competitions for you. This will give you the opportunity to participate and try to win!

Schedule:

The courses start on January 15, 2020 and end on May 1, 2020.

The following courses take place on Wednesday and Friday in building 336 in several groups:

    Wednesday 13h30-15h30, Nautilus Asoutien Python” and “re-grading”: Victor Estrade

    Friday AM 8h15-10h15, Room 309 Adrien Pavao (MEDICHAL) & Room 213 Herilalaina Rakotoarison (XPORTER)

    Friday PM 15h45-17h45, Nautilus A Adrien Pavao (XPORTER) & Room 209 Herilalaina Rakotoarison (GAISAVERS)

The instructors assigned to each team are shown below:

AM groups MEDICHAL = AFRICA, MALARIA, MOSQUITO, PARASITE, SAHARA

AM groups XPORTER = AUTOBUS, AUTOCAR, AUTOMOBILE, TAXI, TRUCK

PM groups XPORTER = CARAVAN, MOTO, SCOOTER, VELO, SEGWAY

PM groups GAIASAVERS = ECOLO, OCEAN, PLANKTON, GREEN

Repartition: Isabelle, Herilalaina, Victor, Adrien

Homework deadlines are strict, but those who return homework on time can improve their grades by going to the Python support class the following Wednesday. The presence in class is important. Absentees will not be able to benefit from the points allocated to the group for the corresponding homework (except official excuse presented to the secretariat).

Week

Class Dates

Homework

Due Date

Points

0

Jan 15-17

0. Intro&Tools [slides][homework]

Jan 18

5

1

Jan 22-24

1 . Workflow   [slides]       [homework]

Jan 25

5

2

Jan 29-31

2 . Pandas     [slides]       [homework]

Feb 1

5

3

Feb 5-7

3 . SklearnClassif  [slides]       [homework] (Adrien)

Feb 8

5

4

Feb 12-14

4 . SklearnHPselect [slides]       [homework] (Heri)

Feb 22

5

5

Feb 26-28

5 . SklearnPrepro   [ slides ] [Improve TP4] (Victor)

Feb 29

5

6

Mar 4-6

6 . Proposal

[grading]

Mar 14

10

7

Mar 20 (telecon)

Proposal discussion

7 . CodeReview

[TIPS FOR CODE SUBMISSION]

Mar 21

10

8

Mar 27 (or by appointment, telecon)

8 . Video

+ second version of proposal

Mar 28

10

9

Apr 3 (or by appointment, telecon)

9 . Report

+ second version of homework 7 (Code); grade improvement, cannot get lower.

Apr 4

20

10

Apr 10 (or by appointment, telecon)

10 . CodeReview

new grade on code (different from hw7) :

Up to 3 points for each  binôme (clarity=1, originality=1, test=1).

+ 1 point for code efficiency (better than starting kit).


Apr 18

10

11

Apr 24 (or by appointment, telecon)

11. Graphs

+ final version of video+5min (include most interesting graphs; comment on them)

See this notebook for inspiration.

Apr 25

10

12

May 1

12 . Re-grading final versions (Report+2pages,  final code version)

+ questionnaire final => 5 points de bonus

+ video contest (vote for best video - not from your project)

May 9

NA (improvements)



Total


100


Homework and grades:

Students will be scored on 100 points (then transformed into a final grade on 20). For any change, students will be notified in class or by email. For group work, team leaders are responsible for submitting homework on time on ChaGrade https://chagrade.lri.fr/. 

Teams:

Students will work in teams of 6 people, grouped in pairs (binômes). Each binôme must produce the implementation of at least one algorithm and its unit test. The team will have a coordinator responsible for delivering the homework. The teams are determined by the teacher from the answers to the questionnaire. 

Project choices:

Each team of 6 is assigned one of the mini-challenges that the M2 AIC have set up for this class, see http://saclay.chalearn.org/. The preferences of the questionnaire were taken into account, but, unfortunately, all the students could not obtain their preferred project for reasons of balancing. 

Practical work (5 points) :

The first weeks, the homework will be simple practical work consisting of a Jupyter Notebook to complete.

Project proposal (10 points):

Follow THIS TEMPLATE. Here is an example of SUCCESSFUL PROPOSAL.

Each team must write a 2-page proposal+ appendices:

- One page of text and one page of figures explaining the chosen challenge and describing the algorithms that will be implemented (5 points for the text and 5 points for the figures).

- As many pages as necessary with the pseudo-code of the algorithms and the references.

Written report (20 points):

Hint: you can use Google Docs or Overleaf to write the report together remotely.

The written report should not exceed 6 pages + appendices including bibliography, code, supplemental material.

The report should include a header with:

1) the name of the administrative group (1, 1bis, 2, 2bis, 4, or 4),
2) the name of the team (which will also be their Codalab login),
3) the names of all team members,
4) the URL of the chosen Codalab challenge,
5) THE NUMBER of the last submission on Codalab
6) the URL of the Youtube video (final version of the report only),
7) the URL of the Github team repository where all the team code will be deposited.
8) The URL of the presentation slides (upload the PDF).

The report should be based on the project proposal and develop its components, explain the successes and failures, and possible changes in orientation (different choices of algorithms, improvements). Follow the following diagram: (1) Introduction with brief description of the challenge and the data; (2) Description of the algorithms studied and pseudo-code; (3) Results obtained in the challenge (include a table with the leaderboard, graphs and figures); (4) Discussion and conclusion (lessons learned, message to students for next year).

The absolute requirement is that the code works. There will be no points in the report if the code does not work (i.e. if the execution of the Jupyter notebook by the assistants or the professor fails or the code submitted on Codalab does not provide the results indicated in the report). There will also be no report points if one of the team members is convicted of plagiarism. Note that this policy applies to the whole team: no points in the team report if only one of the Python classes provided does not work or a single piece of code is copied: help each other.

Apart from these absolute requirements, the work will be evaluated according to the following criteria:

1) Originality (5 points): The proposed approach, analysis and / or algorithms contain original ideas emanating from the group.

2) Scientific and technical quality (10 points): The algorithms are implemented clearly and efficiently. The explanations provided in the text are good.

3) Presentation (5 points): The report is clearly written, well-presented with figures and graphs and a bibliography.


Video (10 points):

You will also prepare a 3 minute video describing your project [strict 3 minute limit]. The video should be self-contained: any student in the class should be able to understand the problem and the solutions you have provided. You can make a video as simple as voice-over slides, or as sophisticated as your imagination allows. But avoid PowerPoint presentations like http://norvig.com/Gettysburg/. Video recording of a speaking person (if the person is a good speaker) is sometimes engaging. You can also use animations. Content and clarity should not be sacrificed by special effects. For inspiration, you can watch YouTube videos on "theses in 180 seconds" and those of last year's students http://saclay.chalearn.org (last column of table).

You must upload the video to YouTube and provide the link on ChaGrade. Make sure the video is visible to the public. The video will be evaluated according to:

1) Completeness and balance (4 points): The project is fully explained and enough time is devoted to each subject.

2) Clarity and logic (3 points): The project is clearly presented with good figures and good text, and a logical sequence.

3) Communication skills and charisma (3 points): The presentation captures the attention of the audience and their imagination, shares their conviction, attracts the public to learn more.

Challenge participation and code review (2 x 10 points)

A challenge is not funny if there are no participants! For this reason, you will earn points if you participate in the challenges organized by the master students. Once your team is formed and your challenge chosen, you will have to stick to it. You will have to use a single account with the name of the team to make submissions on the Codalab platform where the challenges are implemented (follow the links http://saclay.chalearn.org/).You can make as many submissions as you want, within the limit of 5 per day. For each code review deadline, you will need to submit your Github URL on ChaGrade. Your score will depend on the progress achieved:

1. First submissions: Obtain preliminary results with a basic method.

- Clarity of the code: 3 points
- Originality: 3 points
- (Good) tests present with the code: 3 points
- Efficiency: 1 point (results better than the basic method)
- Preprocessing: -3 points if no preprocessing

2. Second submission: Obtaining results with a “personal” method.
Each binôme 3 points: clarity, originality, test.
+ 1 point for code efficiency (better than starting kit).

Slides and class presentation (10 points)

Your class presentation (and the slides) will be grades with 2 points for each of these criteria:
1) Flow: Organize the flow of the presentation in a logical way. Terminate with a message to the students of next year.
2) Charism: Make your talk accessible to a diverse audience and convey your message with enthusiasm and humor.
3) Balance: Limit your talk to less than 10 not-too-busy slides balancing well each topic. Practice explaining every single thing you show on the slides, so you do not exceed your time limit.
4) Simplicity: Use as little text as possible in the slides, avoid complex notations. Avoid cluttering your slides (one idea per slide).
5) Illustrations: Avoid tables of numbers and rather use colorful graphs with labeled axes with large enough fonts.

Support:

In case of problems, ask the instructors first (preferably during class or otherwise by email at info232@chalearn.org) In case of emergency, only if there is a problem related to computers, contact  mounir.aatif@u-psud.fr.
ć
Isabelle Guyon,
Jan 19, 2020, 9:56 PM
ć
Isabelle Guyon,
Jan 19, 2020, 9:56 PM
ć
2.Pandas.pptx
(11075k)
Isabelle Guyon,
Jan 23, 2020, 2:56 PM
Ċ
Adrien Pavao,
Feb 6, 2020, 1:41 PM
Ċ
Adrien Pavao,
Feb 13, 2020, 2:13 PM
Ċ
Victor Estrade,
Mar 4, 2020, 5:58 AM
Comments