Open Data Security: How we transparently enabled HTTPS for all our customers and users
Recent development efforts have made HTTPS activation completely automatic and effortless for Opendatasoft customers, enhancing security for everyone.
Making Open Data Security A Priority
- HTTPS protects everyone’s privacy
- All Opendatasoft portals have been HTTPS-protected since the end of 2016
- Opendatasoft customers using their own DNS are protected too, free of charge, thanks to Let’s Encrypt
In today’s world, security and privacy on the internet are under constant assault. Cryptography has become the best way to ensure communication integrity, security and confidentiality, and its adoption accelerated in 2016: 18.4% of the Alexa top million sites load via HTTPS in February 2017, compared to only 8.4% in February 2016. Security has become a number one priority for most Internet actors, to a point where Google has begun to favor HTTPS sites by ranking them better in the search results, and flagging non-HTTPS pages as non-secure in its Chrome browser. The entire Opendatasoft platform has supported HTTPS access ever since its creation, and now loads content via HTTPS by default since late 2016. Recent development efforts have made HTTPS activation completely automatic and effortless for Opendatasoft customers, enhancing security for everyone.
But isn't encryption useless with open data?
Some say Open Data portals don’t require HTTPS. After all, Open Data is open, why hide anything? Well, obviously, the data itself is open, and available to anyone. However, your usage of it is not. Encrypting your access to Open Data protects your privacy. In some countries, looking for some kind of information can get you flagged as suspicious. This is when privacy is really needed. And it is better to encrypt all communications, private or not, so that privacy-needing communications don’t stand out simply by being encrypted. Encryption works best when everyone uses it, and it becomes the norm. Moreover, HTTPS also ensures data integrity. Without HTTPS, it is easy for an attacker to impersonate your correspondent or modify the information you access. For example attackers can insert propaganda or erase inconvenient truth. HTTPS makes you able to verify that the content you access is really what the author meant.
So how does Opendatasoft use HTTPS?
Ever since November 2016, all Opendatasoft portals have been accessible only via HTTPS. They still answer unsecured requests, but only to redirect to secure pages. This is the case on all portals using the opendatasoft.com domain, but also on portals with custom aliases! For example, the Paris Open Data portal, which we proudly host, has a valid HTTPS certificate too. Our APIs are available through valid HTTPS too, though HTTP access does not automatically redirect to HTTPS yet, as a backwards compatibility measure. Developers will need to migrate to HTTPS requests: Opendatasoft’s next API version will deprecate simple HTTP, and require HTTPS access. To stay informed on our HTTPS API transition efforts, please subscribe to our newsletter!
What does HTTPS protect?
HTTPS is just the SSL-protected version of HTTP. HTTP is a standard protocol to access any information on the world-wide web. SSL, meaning Secure Sockets Layer, encapsulates every HTTPS request in an encrypted layer.
HTTPS Benefit #1 - Privacy
HTTPS protects the communication between the client (your browser) and the web site (the server) against passive eavesdropping. Massively cracking strong SSL encryption is today still out of reach even of most powerful governments.
HTTPS Benefit #2 - Security
HTTPS makes sure you are talking to the real site server, and not someone else impersonating it. This is known as a ‘man-in-the-middle’ attack, and is routinely performed by governments and corporations.
HTTPS Benefit #3 - Integrity
HTTPS makes sure you receive the information as the server sent it, and has not been tampered with on the way. This would be a ‘man-in-the-middle’ attack as well.
In an unencrypted communication, anyone eavesdropping on the communication channel can see that Alice sends a message to Bob.
Example of man-in-the-middle attack: Mallory has access to the communication channel between Alice and Bob. Mallory is able to watch and replace the content of the messages while staying undetected.
When the communication is protected by an SSL channel, Mallory can neither read nor tamper with the exchanged messages. Alice’s and Bob’s conversation is protected against man-in-the-middle attacks. However, HTTPS cannot protect data at rest, either when stored on the server or on your device. Unauthorized access to either end of the transmission can completely defeat the encryption channel, obviously. Securing servers and devices requires other measures that are out of the scope of this article.
How does SSL work then?
Cryptographic key pair generation
The main concept in SSL is public-key encryption, also called asymmetric encryption. Everyone gets a couple of keys: one public, one private. The public key is given to everyone, while the private key is kept secret. For example, in the popular RSA cryptosystem, the private key is a couple of large prime numbers, and the public key is their product. Public key cryptosystems have two main applications.
- Encryption: anyone can encrypt a message using my public key, that only I can decrypt, using my secret private key.
- Authentication: I can publish an encrypted message using my private key, and anyone can check that I’m the only one who could have sent it.
This last property is incredibly useful as signature. A message with a signature generated with my private key cannot be forged, as my private key is required to make the signature.
The public key can be used to encrypt a message. Then the associated private key is required to decrypt it.
Sending a private message using asymmetric cryptography. Alice encrypts a message with Bob’s public key, so that only Bob can decrypt it using his private key. Anyone else only sees unintelligible data
Asymmetric cryptography can be used to digitally sign messages, and authenticate their author. Alice uses her own private key to generate a digital signature attached to her message. Bob verifies the signature using Alice’s public key. Bob is certain that only Alice could have sent this message, and it cannot have been modified by a third party.
Signatures also allow to build chains of trust. A cryptographically signed message is considered to be authenticated by the owner of the key. That way, a trusted person can sign another person’s public key to tell that this second person can be trusted too. HTTPS certificates work in this way. Every browser has a list of root certificates (public keys) that are trusted by default, and managed by the software editor (Microsoft, Google, Mozilla, Apple…). Then certificates providing a signature by one of these root certificates can be trusted too. The whole internet relies on chains of trust to ensure that SSL communications are not being intercepted.
SSL Certificates provide a way to establish trust: root certificate authorities are trusted by default, which themselves trust intermediate and then final server certificates.
These SSL certificates, the basis of the web of trust on the Internet, are also what makes managing customer domains difficult. Before, we had to ask our clients for SSL certificates for their custom domains. Either they sent us the certificates and the private keys, which from a security point of view is rather bad, or we had to exchange lots of information to have them sign Certificate Signing Requests by their SSL provider. This meant a lot of work, and additional costs and delay. We had to find a better, automated, and scalable solution. The Let’s Encrypt process allowed us to deploy SSL certificates for all our customer domains, including custom DNS aliases, in minutes, and for free!
Enter Let’s Encrypt’s: free, automated SSL certificates
Let’s Encrypt is an initiative by the ISRG, a public benefit corporation, to provide SSL certificates for free to anyone. It is currently sponsored by the EFF, the Mozilla foundation, OVH, Akamai and Cisco Systems. Its goal is to help secure the internet by providing free SSL certificates. Traditionally, HTTPS certificates were issued by certificate authorities for a fee, and required a manual validation process. Certificates could be issued for a validity period of 1 up to 3 years before expiring, and renewal required the same manual process. On the contrary, Let’s Encrypt provides HTTPS certificates completely for free, and using a completely automatic process. Certificates are currently issued for only 3 months, but renewal can be made completely automatic using tools and Let’s Encrypt’s APIs. Moreover, the DNS validation required by Let’s Encrypt certificates can be performed even if you’re not the owner of the domain. Having the domain name point to a server you control is enough to get the certificate. This property simplifies greatly the validation process. As our service uses high-availability, redundant front servers, we still had a non-trivial issue to solve before massively deploying certificates: deploy certificates behind a load balancer.
Let’s Encrypt automation behind a load balancer
The next paragraphs take a deep dive into the technical details of our particular usage of Let’s Encrypt tools, and are written for a more technical audience. Let’s Encrypt validation process obviously requires some proof that the person asking the certificate is in control of the domain name. This is achieved by asking the server to fulfill a challenge from the very address for which the certificate is requested, using the ACME protocol. The EFF provides an open-source software client, certbot, to use this Let’s Encrypt’s API automatically from a web server. However, our sites are served by a cluster of servers behind a load balancer, like most HA deployments. This means that the server asking for the certificate will not necessarily be the one receiving the challenge request. We could try again until the challenge comes back to us, but clearly, this scales badly.
A Let’s Encrypt client on a front server behind a load balancer does not work reliably, because the challenge verification is not necessarily answered by the same server asking the certificate.
To solve this, we delegated the certificate generation process to a dedicated server, and made all our front servers reverse proxy the challenge requests to this particular server. As the path of the challenge always begins with /.well-known/acme/, matching it is easy.
Now the Let’s Encrypt client runs a separate, dedicated server, behind the front servers. All the front servers are configured to reverse proxy the challenges to the server running the Let’s Encrypt client, so the challenge succeeds reliably.
So now Let’s Encrypt can give us certificates for the sites we serve, but we still have to deploy them to the actual front servers. We achieved this with our configuration management tool, SaltStack, and some cron jobs. When adding a new certificate, we configured salt to automatically push it to all relevant front servers, and reload them. For renewal, cron jobs automate the renewal process, the deployment and the server reloads.
Certificate expiration monitoring
We need alerts in case some part of the process unexpectedly fails. In the end, we used monit to check the certificate expiration date for every site we serve. Certificates have a 3 month expiration date. We automate renewal when only 30 days remain, and the deployment cron jobs are run daily, so we told monit to send alerts when any certificate expires in less than 28 days. This way, we still have 4 weeks to deploy a fix in case something breaks. The certificate generation server being an obvious single point of failure, several weeks of leeway seems reasonable.
Let’s Encrypt certificate lifecycle. We renew automatically 30 days before expiration, and deploy updated certificates every day. An expiration alert is sent whenever a deployed certificate’s remaining validity becomes less than 28 days.
Strong SSL is hard, how can I get a good grade?
SSL itself is under constant attack. Every year, cryptography algorithms can be broken, and have to be replaced with better ones. Implementation details can hide privacy-defeating bugs for years. This means that configuring an SSL-enabled server is a hard task. For that, SSLLabs by Qualys provides an invaluable SSL assessment tool. It gives a grade to each SSL server according to the level of security provided.
SSL Report of data.opendatasoft.com by SSLLabs.com
Today, all Opendatasoft’s servers have at least an A grade, which means:
- we support only the strongest encryption algorithms;
- we support Perfect Forward Secrecy: even if some day the server’s private key leaks, past communications can still not be decrypted.
An A+ grade requires in addition HTTP Strict Transport Security (HSTS). We’re still not sure it won’t break anything on our platform, so we need time to test, but it’s definitely on our roadmap! Thanks for reading, and don’t forget to subscribe our newsletter for more updates by the Opendatasoft engineering team.
Say goodbye to your pain points
At Opendatasoft, we’ve listened closely to these problems that our customers’ face in trying to make the most of their data. We’ve decided that looking the other way was not an appropriate philosophy. For us, data should be accessible to the largest number of non-technical people, Open Data programs should be powerful, and Open Data should speak to all.
To give customers choice when it comes to AI, the Opendatasoft data portal solution now includes Mistral AI's generative AI, alongside its existing deployment of OpenAI's model. As we explain in this blog, this multi-model approach delivers significant advantages for clients, their users, our R&D teams and future innovation.