e-voting

An attempt to make a clear, concise (and therefore cheap) e-voting system
New World Order JCZD Dream-inspired

UPDATE: Now on Github with working client-server architecture

e-voting is usually seen as something that seems straightforward but yet quite tricky when one looks into the details.

Besides security (where e-voting is often lamer than traditional voting), cost is often argued to be a major setback. This approach aims to fix these problems, and more. Security, trust and cost are so closely related that we will not even try to categorize each issue.

It is interesting to note that the core of the code (the part that really matters) basically amounts to some 15-20 lines. So it’s very short. Security doesn’t only depend on the code though, there’s a substantial part of reverse social engineering that I believe makes this entire concept really strong. I would really appreciate comments, especially if I’m wrong.

Problems with voting mechanisms (traditional and electronic)

Main principle of operation, the paper analogy

To make an e-voting system acceptable by the public, it must be simple enough to be understood by the majority. A working paper analogy is needed:

How it would really work

Few people like to copy and go through pages of printed material. Printed material is also very prone to error, mostly when it comes to making hand copies.

The current approaches works with both methods : the main recommendation is that the serial numbers of paper ballots are computer-generated and the results made available through a machine-readable process (in fact, this strengthens the system by making statistical analysis easier, as well as making it easier for voters to very their votes have not been tampered with.

Cost

The code does not yet deal with the full chain of events required to have a practical voting process. However, it can be expected that a single RaspberryPi-type computer could serve as a master host for a few thousand people at a time (a rough estimate tells me I wouldn’t expect anything in the 100k users range to have decent performance). With a fairly conservative figure of 10k users, hardware, network and associated costs being at most some 500$, the cost of the infrastructure is around 5¢/voter for the base service4, which seems totally acceptable.

Compared to an estimate for Switzerland5 of around CHF 100.-/voter (hopefully for more than a single ballot) there is definitely a margin for error here.

Additional security

Since the name of a person is basically not a secret and no communication channel is 100% secure, the ballot instance (the one running the core process) may rely on a variety of sources to verify the identity of a voter when delivering the ballot “paper”. Using such a mechanism implies that a ballot paper can be delivered and used when and only when all available sources are compliant and no communication channel is compromised in a significant way. The same applies for the instance (aka. authority) that generates the ballot “paper” and of course to the voter (it makes no sense if authorities make use of mechanisms a voter cannot trust and verify). It is not required for a voter to be known by all Secret Providers ; if the voter is unknown, the provider should respond with an empty string.

Reading the source code makes sense at this point, so I will include it now.

Read the source, Luke!

Download source code here or read it below:

#!/usr/bin/python
# -*- coding: utf-8 -*-
# vim: ts=4 noet number nowrap


def voter_full_id( voter_name, providers ):
	"""
		a helper function known by everybody ; simply concatenate hashes
	"""
	return b''.join([ provider.hash( voter_name ) for provider in providers ])


class LocalBallot:
	"""
		each town/community SHOULD have its own process running for the duration of the ballot and a little longer 
		to allow people to verify their results and various instances to concatenate the data from each source

		this is less than 30 lines (comments and blank lines stripped) of very straightforward code, and basically
		just 10 lines of actual code
	"""
	def __init__( self, authority, question, voters, secret_providers = None, encoding = 'utf-8' ):
		self.authority = authority	# just a name/reference, used for data gathering
		self.question = question	# it's good to know what question we're answering
		self.encoding = encoding	# encoding everything to ensure consistency of the data

		self.total_possible_votes = len(voters)
		self.invites = { voter_full_id(v, secret_providers): id(v) for v in voters}	# id(v) is not used, it's just a placeholder that may be used for verification

		if len(self.invites) != self.total_possible_votes:	# should be unnecessary, just a safety measure
			raise Exception("INIT: Number of invites does not match number of electors ({len(self.invites)}/{len(self.total_possible_votes})")
		print(f'''Vote is open to {len(voters)} ; the question is : "{self.question}"''')	# cosmetic

		self.result = {}


	def request_auth( self, voter ):
		"""
			generate personalized ballot paper ("bulletin de vote")
		"""
		return hex(id(self.invites[voter]))

	def submit_vote( self, voter, secret, value ):
		"""
			the few CPU cycles (and associated network traffic) it takes this function to execute are the _only_ 
			places where voter/vote can be sniffed (matched)
		"""
		if id( self.invites[ voter ]) == secret:
			self.invites.pop( voter )	# in some rare cases, may also raise KeyError
			self.result[ secret ] = value
			return id(self.result[secret])
	
	def verify_vote( self, secret, index ):
		"""
			by feeding 'random' data in this function, counting the number of occurences allows for statistical 
			analysis of the integrity of the ballot (this is the so-called "brute-force" pseudo-attack)
		"""
		if id(self.result[secret]) == index:	return self.result[ secret ]
		else:									raise KeyError	# to make querying cleaner ; not strictly necessary
	
	def ballot_integrity_check( self ):
		"""
			SHOULD return zero
			MAY return a small negative integer for a very short while (borderline case due to concurrency issues)
			MUST NOT return a positive value (this means some votes have been added which shouldn't have)
		"""
		return self.total_possible_votes - ( len(self.invites) + len(self.result) )

	def concatenate_votes( self ):
		"""
			return a summary of the results. this data must be public at the very least when the ballot is closed
		"""
		from collections import Counter
		return Counter( self.result.values() )


"""
this is basically all for the functional code ; the rest is just interface stuff and is not critical
"""


class SecretInstance:
	"""
		really, that is just a fancy way to say "there is a function that will make a pseudo-secret 'hash' of some sort when given publicly known data"

		it is suggested to use regular cryptography with keys and certificates or another trust and validation mechanism to authenticate the parties ; 
		however there are a large variety of methods available and we will not discuss them here.

		SecretInstance objects have a hash() method, called three times, in order:
		- once by the ballot process when it is instantiated and it build the voters list
		- a second time when generating the balot "papers" (by the process generating these, either by print or through a web service or equivalent)
		- a third time by each voter upon ballot/vote submission

		this ensure a high level of anonymity and high security
	"""
	def hash( self, name ):
		try:				return self.table[name]
		except KeyError:	return b''



if __name__ == '__main__':
	"""
		very basic zero-privacy example interface
	"""
	# elements of this list must be unique, and are publicly known
	voters = {
		b'alice',
		b'bob',
		b'charles',
	}

	class NameProvider(SecretInstance):
		"""
			sort of a dummy thing.. not strictly necessary (we need at least _one_ provider and no collisions)

			adding the name provider as a SecretInstance makes code cleaner and easier to understand but in fact there's
			even more privacy if that's not included
		"""
		def hash( self, name ):
			return name

	class SecretProvider0(SecretInstance):
		from hashlib import md5	# this is intentionally insecure ;-)
		def hash( self, name ):
			return self.md5( name ).digest()

	class SecretProvider1(SecretInstance):
		table = {
				b'alice': b'InWonderland',
				b'bob': b'IsASponge',
			}

	class SecretProvider2(SecretInstance):
		table = {
				b'charles': b'Paranoid, has own auth service',
				b'alice': b"she's charle's friend, too",
			}
	
	# this list is public, known by the ballot instance/server and every voter
	# in real life, these will be accessible through TCP/IP or something equivalent
	secret_providers = (
			NameProvider(),
			SecretProvider0(),
			SecretProvider1(),
			SecretProvider2(),
			# etc.., as many as required
		)

	b = LocalBallot(
			authority = "Cryptoville",
			question = "What is the answer to Life, the Universe and Everything?",
			voters = voters,
			secret_providers = secret_providers,
		)
	

	print("Hint: press ctrl+d to exit loops")


	print("--- Generation of credentials (sent in batches per voter preference or on-demand) ---")
	for v in voters:
		try:
			print(f'''Ballot 'paper' ID: {b.request_auth( voter_full_id(v, secret_providers) )} for {v}''')
		except KeyError:
			# this is just here for the example
			print(f'''ERROR: cannot get ballot paper for {v}''')


	print("--- Loop: votes submission ---")
	while True:
		try:
			v = bytes( input("Voter name: "), b.encoding )
			r = b.submit_vote(
					voter = voter_full_id( v, secret_providers ),
					secret = int( input( "Ballot ID (hex): " ), base=0 ),
					value = bytes( input("Your answer: " ), b.encoding ),
				)
		except KeyError:
			print(f'''ERROR: could not process vote for "{v}" (already voted or not in list)''')
		except EOFError:
			print("\nBallot was ended!")
			break
		else:
			print(f'''Verification ID: {hex(r)} (voter: "{v}")''')

	print("--- Vote verification ---")
	while True:
		print( "Is ballot valid?", b.ballot_integrity_check(), "(should be zero)" )
		try:
			print( "You voted: ", b.verify_vote(
					secret = int(input("Ballot ID: "), base=0),
					index = int(input("Verification ID: "), base=0) ),
				"; if this is not correct, please shout out loud to complain."
				)
		except KeyError:
			print("ERROR: not in votes list")
		except EOFError:
			break
	
	print("--- Vote results ---")
	print(b.concatenate_votes())

Other attack vectors

Some attack vectors have not been mentioned above ; this is a good place to talk about them now.

Man-in-the-middle attacks (MITM), phishing, spoofing

To work, such attacks would require to be done successfully at least twice : first in the distribution phase, giving voters a dummy address for the polling office or its numerical equivalent (possibly altering or generating dummy ballot papers IDs), then again in the verification phase. Chances to succeed continuously - even for a limited number of voters - while avoiding detection during the entire duration of the ballot is highly unlikely.

Subtypes and related attacks such as whale-phishing or spear-phishing are just as unlikely.

DDoS attacks

The system would be very slow, people just couldn’t vote, so it would be very obvious. See also 1.

Ransomware

You’d really need to pay a lot of people, for a voting process this is not very realistic. Giving jobs to people is likely to be more efficient.

Password attack, URL interpretation

Just like MITM and phishing, but harder: while it is technically possible to brute-force ballot paper IDs and user credentials, it is basically impossible to successfully take-over a significant number of votes without being unnoticed.

SQL injection

Can only be done at the voters list level, in case it relies on such a database. See also the note about distribution1, as it would require to break in a variety of different, independent systems.

Session hi-jacking

There are basically no sessions to hi-jack : a voter sends his credentials and ballot ID along with his vote, receives a verification key, and the Session is closed. Any subsequent session is “send verification key, receive vote value” and ends there.

Trojan horses, malware attacks, XSS attacks, eavesdropping, drive-by and other web attacks

I mean, you get to compromise a very large number of systems of an entire community for a continuous duration ; although it doesn’t have to last for weeks on end it is very unlikely to avoid detection.

Birthday attack, brute-force attacks

At his stage, probably the most interesting one to mention : the main hashing algorithm is memory addresses (which people can share or collect “randomly”2 in order to ensure the validity of the data) that - by design - can only yield unique values. Furthermore, as many safety elements (in addition to the voters’ name which isn’t really safety anyway) can be added as desired and these can be gathered from a number of sources each with limited trust (or whose communication channels are potentially untrusted)


  1. Using distributive computing makes for cheaper systems, lessens the influence of corrupted systems or officials, all while leveraging the power of statistical analysis for software validation, installation validation, compiler trust, delivery validation. This also makes attacks on the data in systems memory very expensive and unpractical. ↩︎ ↩︎ ↩︎ ↩︎

  2. In the current paradigm, vote confidentiality can be attacked mainly with a brute-force approach ; however, due to the architecture of the system, the data gathered can only be used to gather data for statistical analysis. The current paradigm uses this as a feature, thus rendering other types of attacks less stealth and therefore less effective. ↩︎ ↩︎

  3. It’s not dramatic in the sense that is basically impossible to prevent it : however, keeping track of spoofing occurrences is important as it may indicate a flaw in the delivery process ; such statistics also allows communities to determine whether spoofing has a significant influence on the ballot. ↩︎

  4. Other costs have to be taken into account, such as the statistical analysis of the results and other “democratic” costs (writing laws etc.) but all this is distributed, sometimes made by benevolent benefactors, sometimes by highly-rewarded lobbyists, so overall it’s really hard to give a figure without accurately defining first what these cost should be include. ↩︎

  5. “Appraisals assume that the implementation of e-voting in the whole country would cost 400 to 600 million CHF.” Wikipedia ↩︎