What's the best way to assess programming?

Question

Over in England the current model of assessing GCSE level programming (exams for 16 year olds) is being debated. A system where students had to program and write up solutions to a given scenario has suffered from plagiarism to the extent that the exams might be withdrawn.

We are now asking ourselves what is the best way to assess programming? With three models emerging:

On screen timed programming exam
Open ended programming projects with report and/or questions about code produced
Entirely paper based exam asking with code comprehension and writing of short code pieces

None of these models are without their problems and there are advocates for each of them. Does anyone have any good research that shows assessment model x is the best way to assess programming?

ctrl-alt-delor · Answer

Some related research
There has been some research into how to hire people: How do you select people that can do the job. The traditional methods rejects too many “good” people, and accepts too many “bad” people. It turns out that the best thing to do is to test them doing the job, and/or selecting for attitude.
Therefore to test programming you should test programming (I don't care how good an essayist my heart surgeon is).
Google is one of the companies that has done some research in this area. There will be others. I heard about this on BBC Radio4, so there may be some info on there web site (good luck finding it).
Some other suggestion

Open book: programming is not about memory (off line: to web pages downloaded before exam release).

Automated tests, for the coding section (tests written by the exam board). See test-first, and unit-testing.

Look at what can be assess automatically, assess the rest manually (I think there is an article in Hello World issue 3).

To avoid plagiarism: Give them materials to work with, be open book, use exam conditions (multiple small exams).

It was suggested by one teacher: “pupils are messing up in the early stages, this is trickling through and causing pupils to loose marks in later stages as well. Therefore divide the test into multiple sections, the output of section one is not the input of section two (exam board provide inputs for each section, a different problem, the confusion does not propagate).

Check for working code.

Check big-O (run-time, and code size).

Check for good-name (of methods, and variable), over comments.

Ryan Nutt · Answer

I don't think there's a "best way." I do a bit of a shotgun approach. No research to back this up, but it works well for me.

Lab Assignments

Most of the assignments we do in class are small lab style assignments. Average students can finish a few of these in a class period.

We do all of these online with an autograder. Students can submit their code as many times as they want by the due date to get a score they're happy with. I'd say until it's perfect, but some stop early with a "good enough" grade.

For these I encourage them to work together. They bounce ideas off of each other. They work through snags together. And, yeah, sometimes they straight copy each other. Fortunately, where I teach these are worth such a small part of their average that a little plagiarism here doesn't really help them much.

I think the biggest advantage here is that they're able to get instant feedback on their work and not wait for me to grade it. If they click the button and it didn't work they can go back and fix it.

Free Response

I teach CS AP-A, so the kids have a partially written test to prepare for at the end of the year. We do a lot of FRQs through the year on paper.

Some of these are test grades and I hand grade them. We'll do maybe 20 or so of these throughout the school year. The limit to this isn't the students, it's how many I can grade. If I had minions who could grade for me I'd probably do one of these a week. But it's not realistic when I'm the only one grading.

Most are done on paper before they're let loose to put them on the computer, at which point it looks like the lab assignments above. The main difference is that they have to start on paper. Some days I have them show me and walk me through their work before they get on the computer. Some days I'll put a timer up and when it's done they can get on the computer.

Projects

This one is pretty new for me. It's something I started doing last year.

Projects are really easy to cheat on. My first year teaching computer science I had 45 students turn in the exact same code.

What I started doing last year is splitting projects into two parts.

The first part is the actual project and they're graded on a rubric targeting specific methods. I count this as 70% of the grade.

The other 30% comes from an on paper FRQ similar to the project. For example, we just finished a Black Jack project. One of the methods that they had to write for the project was a shuffle method. One of the FRQs was shuffling an array of integers. Should be really easy if they did the project.

I've found that splitting projects like this does a really good job of separating those that copy from a friend and end up with 0 points from the FRQ and those that know what they're doing and get all 30 points.

It also allows them to work together on the project and still keeps the accountable for doing their own work.

Buffy · Answer

Unfortunately exams of any kind advantage some students and disadvantage others. They also favor those good at exams, not necessarily those good at the subject matter. I was always (over 40 years) a big fan of using projects to assess students. Let me describe how you might be able to make it work, provided that the scale isn't overwhelming. The course I will describe had both undergraduate and graduate versions. The number of students was normally around 30, always less than 50. My other teaching duties let me spend the majority of my time on this course. The course grade was determined 70% by a major project, 10% by a minor (warm up) project and 20% by a final exam for which any tricky or deep questions the grading was lenient. The main project took up all but 2 weeks of the semester.

To make it work, I did this. First I partitioned the overall task into subtasks and gave a point value to each task with the sum being 700 (= 70% of the course). The tasks needed to be done in order for the project to be successful. They were dependent on successful "completion" but not necessarily "perfection" of earlier tasks. The student worked in self-selected pairs, which cut my "grading" task in half.

Every two weeks the students would submit their work in a folder, including the current version of the project and all earlier versions. The changes from their previous version had to be marked with a highlighter pen so that I could easily find the changes. I also kept an index card for each student so that I could easily make notes as I looked at the work.

I would spend about 2 hours every other week going through the project folders making notes on their pages. Sometimes the "note" was just a checkmark, indicating success at the level they were at. Students could work at their own pace and could target their own desired level of accomplishment. They didn't need to complete the entire project to achieve a grade with which they were comfortable. So some pairs were on item 10 of the work schedule, others on item 8, etc.

If my notes on the work were negative they could re-do that work to earn the "checkmark" and could work on the next feature as well.

At the end of the course I would need to evaluate everything they did and assign a grade. Normally it would be easy as could keep a running tally of their progress along the way.

I also used various electronic communications (mailing list primarily) so that students could ask questions at any time. Everyone saw every question. Students were encouraged to answer questions (but not submit code) asked by others. I monitored the list daily and answered questions as needed. Every student saw my answers as well.

The project was thought to be very challenging by the students. I gave them a lot of guidance, both about the subject the project covered and things like coding, design patterns, thinking like a programmer, etc. I provided a fairly comprehensive set of tests they could use to assure themselves of their progress. The periodic reports would include the test results from running their code. I never actually had to run their code. But if it was too sloppy I'd just note that on the work and quit reading.

Only a few students hated me. Many loved me. The course was transformational as well for many of them.

The trick here is to be (a) always available and (b) to evaluate (but not necessarily "grade" frequently, without it taking so much time that your life suffers.

You can be very severe with them when needed so long as they know that today's judgement won't necessarily haunt them and that they can come back tomorrow.

One caveat. The student (pair) that has to frequently repeat work to catch up will have some difficulty with the current work and in understanding the current lecture, which was paced to the project work. You need to watch out for that and give advice to "move on" leaving a bad spot in their work. But looking at their work every two weeks let me also determine if I needed to revisit an earlier topic in the next lecture. It was pretty obvious if groups of students didn't get it.

Since students worked in pairs I needed some way to grade individuals. The small project and the exam let me get a handle on that, but I also had students to peer assessments of the form: (a) what was your partner's chief contribution, and (b) what was your chief contribution. The peer assessment always tried to let people say good things when they could, rather than asking for the bad. Grading was never a problem.

Some students learned more, some less. Most were happy enough with the course. I always thought that I was giving something of value to every student.

What's the best way to assess programming?

3 Answers

Some related research

Some other suggestion

Add your own answers!

Ask a Question