GVS is now part of Acquia. Acquia logo

Selecting conference session proposals: popular vote? selection committee?

Greg's picture

I was on the "Ecosystem" track session selection team for Drupalcon London, which motivated me to finally do some more analysis on the traditional pre-selection session voting. Specifically, I wanted to compare the votes a session receives against the evaluations submitted after the conference.

By the way, if you have the opportunity, I highly suggest going to a Drupalcon; they are always great events.

Here are some conclusions based on analysis of the evaluation and voting data from DrupalCon Chicago:

  • Voting was not a useful predictor of high quality sessions!
  • The pre-selected sessions did not fare better in terms of evaluation than the other sessions (though they may have served a secondary goal of getting attendees to sign up earlier).
  • We should re-evaluate how we do panels. They tend to get lower scores in the evaluation.
  • The number of evaluations submitted increased 10% compared to San Francisco, which seems great (Larry Garfield theorizes it is related to the mobile app, I think there are a lot of factors involved)

Is voting a good way to judge conference session submissions?

Drupalcon has historically used a voting and committee system for session selection that is pretty common. This is also the default workflow for sites based on the Conference Organizing Distribution.

Typical system:

  1. Users register on the site
  2. They propose sessions (and usually there is a session submission cutoff date before voting)
  3. Voting begins: people (sometimes registered users, sometimes limited to attendees) can vote on their favorite sessions
  4. During steps 2 and 3, a session selection committee is encouraging submissions and contacting the session proposers to improve their session descriptions
  5. Selection begins: Voting closes and the session selection committee does their best to choose the right sessions based on factors like appropriateness of content to the audience, the number of votes, their knowledge of the presenter's skill, diversity of ideas
  6. ???
  7. Profit

Drupalcon Chicago (the event I'm basing this analysis on) had a few changes to that model. They pre-selected some sessions from the people they knew would submit sessions and get accepted (see their blog post on that and the faq). This allows us to see whether pre-selecting actually brought in sessions that were more valuable to people which seems like a decent proxy for whether or not the committee's choices are right.

The pre-conference voting had 5 stars with the following labels:

  • I have no interest in this session
  • I would probably not attend this session
  • I might attend this session
  • I would probably attend this session
  • I totally want to see this session

The post-session evaluations had 5 stars with the following criteria:

  • Overall evaluation of this session
  • Speaker's ability to control discussions and keep session moving
  • Speaker's knowledge of topic
  • Speaker's presentation skills
  • Content of speaker's slides/visual aids

I've previously looked at the percent of the attendee population that actually gets to vote and the distribution of votes (1 to 5) to see if that was actually used in a meaningful way in Chicago (that analysis is on groups.drupal.org). Given the distribution of votes in Chicago across the entire 1 to 5 spectrum, I believe it is useful to use a 5 star system as a rating on a session. However, I don't think the resulting value is directly useful by the session selection committee when they choose individual sessions (more on that later).

My analysis method was to create a nice spreadsheet with the average and count of votes on sessions from the pre-conference period where votes were used to help determine which sessions to include. Then I added in the votes (from 1 to 5 stars) which covered several categories.

I graphed the pre-conference votes compared to the post-conference evaluations and used the "correl" function to see how correlated the data is. I expect a straight line correlation: the higher the average votes, the higher the post-conference evaluation scores. In fact, there was basically no correlation.

What I found was that there is basically no correlation between the pre-conference voting and the post-session evaluations. Here is a table that shows the axis (i.e. one of those 5 elements above) and the correlation between that axis and pre-conference session votes.

Axis Correlation (r)
overall -0.00615455064217481
control 0.0528419859853818
knowledge 0.0907826506020892
presentation 0.00493457701973411
visuals -0.0216904506593498

As a graph, the overall data looks like:


This data is not correlated. Just look at it, spaghetti soup!

I graphed it along with a random line that has a correlation value of .95. As you can see, the overall evaluation is not at all correlated to the outcome evaluations.

It isn't surprising that votes don't correlate to session quality. Voting tends to be done by a minority of event attendees who are "insiders" to the event. They are likely to be swayed by friendships, employers, and social media campaigns.

Comparing pre-selected sessions to regular sessions

I also took an average of the evaluation scores across non-pre-selected-sessions and the pre-selected sessions. The average overall evaluation score for non-pre-selected sessions was 80.9 vs. 80.7 for pre-selected sessions. The other axes show similar results except for knowledge and visuals, though it's not clear if those are statistically significant.

Axis Pre-selected average evaluation score Non-pre-selected average evaluation score
overall 81 81
control 83 83
knowledge 93 91
presentation 80 81
visuals 78 75

So, we can see that regularly selected sessions got very similar scores to the pre-selected ones. I'm not suggesting that pre-selecting is flawed (it didn't produce lower results, anyway), but I do think we should carefully consider who we pre-select.

The third bit of analysis I did was to look at overall score, and the number of presenters for that session. Here's the average per decile where decile 1 is the 9 sessions that were ranked highest. Seems like a pretty clear trend from nearly 1 person for the top rated sessions to 2.5 people for the bottom rated sessions.

Average # of presenters Decile
1.11 1
1.67 2
1.89 3
1.44 4
1.67 5
2.33 6
2.00 7
2.00 8
2.44 9
2.50 10

I believe there are two big reasons for this. First, panel presentations are rarely done in a well-coordinated manner and the panel members usually don't take time to practice as a group (our distributed community makes that hard). Second, Drupalcon session selection committees often suggest similar topics get merged into one panel. I think we should stop merging independent presenters. The result is often that people who may not have the same story to tell end up putting 45 minutes of information into one-half or one-third of the time.

What can we do to improve session quality and session selection?

One of the great tools for session selection committee members at Drupalcon London was the availability of evaluation data from previous conferences. If a proposed session got a lot of votes (perhaps due to a campaign on twitter or within a large company) but the presenter had horrible evaluations from a previous conference then the evaluator has an easy job: just say "no thanks".

The only problem with using previous conference evaluations to judge sessions is that it can lead to stagnation among the presenters. Part of the value of a conference is in hearing new ideas. This can be reduced by having free-for-all BOF sessions, but I think in the Drupal world that part of the solution is to use Drupalcamps as a ramp into Drupalcon: presenters should give their session at a camp and mention that (and any evaluations from the camp, any video from the camp) in their session proposal. With approval from presenters, Drupalcamp Colorado published our evaluations - we hope this helps other camps and that they will do the same. It's not surprise that some feature requests for COD will help make the process of gathering this information and getting it to the right people much easier.

See also a great discussion on groups.drupal.org: On popular voting and merit-based selection of sessions.

What else can improve session quality?

So far I've talked about identifying good sessions, but I think the nature is more complex. It's also about encouraging and inspiring the presenters to do great work on their sessions. We can tell them "please practice it 10 times" but nobody will do it if they aren't motivated. Sending reminder mails to presenters like "we expect 3,000 attendees including key decision makers from companies like Humongo Inc." could help. There's also the possibility of compensating presenters. Drupalcon Chicago gave a mix of cash and non-cash benefits (massage chair, faster check-in line).

Scott Berkun gives some tips on how to improve the presenter experience at a conference in An open letter to conference organizers. He recommends a lot of things including sharing the results of the evaluation data. I'm in favor of that as well...(provide default terms of attendance).

Extra note: Want to see your evaluations from Chicago? Just needs more code

There were evaluations in Chicago, but the speakers have not seen this data. I got access to it as part of my role on the London session selection team and my work on the infrastructure team/Chicago sites.

However, the fact that presenters can't see it is a result of a bug in software that you can help fix. The organizers of Drupalcon want to share that information, but the code to do that isn't fully working. If you can help make it work then all session presenters will be able to see their evaluations.

Comments

Attendance?

I'm not at all surprised that there's no correlation between pre-conference votes and post-conference evaluations. They're measuring two entirely different things.

Pre-conference votes, in theory, are measuring interest in the subject and/or presenter.

Post-conference evaluations are measuring the quality of the presentation and/or presenter.

I wouldn't actually expect them to correlate.

What would be interesting to know is if pre-conference votes correlate to bodies-in-the-room attendance. There is certainly an element of "give the people what they want" in session selection, and it would be useful to know if pre-conference votes are a reasonable predictor of who will actually show up (and therefore, how big a room we need). I am not sure if we have that data available, but I would love to know how those relate if we do.

APPROPRIATENESS (demand) VS QUALITY

I have to agree with Larry. My first thought on reading this article was that you would not really expect pre-session demand to correlate with post-session evaluation.

Example:

session x : performance in drupal
session y : theming in drupal

I am a performance nut and pre-conference mark session x as my favourite.

But I find the session itself disappointing (at least relative to my expectations).

Does this mean that session y (which incidentally got better feedback) was better or even better for me? Of course not. Even though session x was somewhat disappointing, it is still more useful for me than session y (however well presented that session might be).

--

Session quality matters.

But so does session appropriateness.

e.g. a brilliantly presented session on Barbie Dolls is less useful than an abysmally presented session on Drupal.

DrupalCamps

Greg,

Going along with your thought of using DrupalCamps as an on-ramp for Drupalcon presentations, I would take it a step further and suggest that DrupalCon presenters should at the very least have some prior experience presenting at DrupalCamp (or similar) events. Ideally (in a perfect rainbows-and-unicorns world), I think that virtually all DrupalCon presentations should be required to be presented (practiced) at a DrupalCamp, Drupal meetup, or similar event prior to DrupalCon.

While this does raise the bar a bit for presenters, the goal is to have higher-quality DrupalCon sessions. Practice makes perfect.

-mike

Panel evaluations

Greg,

Totally fascinating and useful data -- thanks for writing it up!

One thing that I think affects the panel evaluations is that the evaluation form was structured in a way that allowed ONE evaluation for each quality of the ENTIRE panel. I forget exactly how it was worded but it was something like:

  • Rate the speaker's knowledge
  • Rate the speaker's ability and interesting-ness
  • etc.

So presented with a panel of 2-5 people, evaluators had to figure out how to translate that into one rating. Do they rate each speaker and then average? Rate the best speaker? Rate their impression of the session overall (could be more than the sum of its parts)?

I basically did #1 in my evaluations, but other people could have done it other ways, or been less likely to evaluate those sessions at all. So I think the evaluation form has to be...re-valuated (ha!) to be more intelligible when rating panels.

Again, great data and analysis here.

Yeah, this is a tricky thing.

Yeah, this is a tricky thing. There were several cases where a panel got rated 3 and the comments said "person X was great, person Y was not engaging" or something like that. I think that's probably the best way to handle it as a compromise of accuracy and taking too much of people's time.

GVS projects

The Hyperlocal News installation profile is an "internal project" for some of the folks at GVS. Profiles are ways to bundle together Drupal, some contributed modules, and the configuration necessary to make the site actually do something cool. Users are presented with an wizard that sets up...

GVS is now part of Acquia.

Acquia logo

Contact Acquia if you are interested in a Drupal Support or help with any products GVS offered such as the Conference Organizing Distribution (COD).

We Wrote the Book On Drupal Security:

Cracking Drupal Book Cover