WPTavern: WordPress Telemetry Proposal Addresses Long-Standing Privacy Concerns as GDPR Compliance Deadline Looms
At the end of October 2016, Morten Rand-Hendriksen created a proposal on WordPress trac for adding telemetry to core, an opt-in feature that would collect anonymized data on how people are using the software. He proposed that the new feature be displayed on first install or update, disabled by default in the admin with a control available under Settings->General. One option he suggests is shipping it as a plugin that auto-installs on opt-in and auto-uninstalls on opt-out. He also identified a few examples of core data that could be tracked, including number of themes and plugins installed, frequency of use of specific views (Settings, Customizer, etc), current version, update status, locale, and language.
“WordPress prides itself on being an application built by the user for the user,” Rand-Hendriksen said. “The problem is with the popularity and reach of WordPress today, the distance between the WordPress 1% (or even .1%) and the average user is becoming so vast we (the people who contribute to WordPress core) know almost nothing about the actual people who use WordPress or how they use the application.”
During the WordPress 4.7 development cycle, Rand-Hendriksen said he was involved in several conversations where participants assumed the use of features without any data to back up their opinions. He contends that WordPress contributors do not have the necessary data to know how users are interacting with the application and its features.
“The general argument was that based on the 80/20 rule, certain features should be added while others should be removed,” Rand-Hendriksen said. “I kept brining up the well known fact we don’t have a clue what features 80%, or even 20%, of WordPress users actually use so any claim of validity in the 80/20 rule is guesswork at best.”
His proposal states that all the data collected should be public for transparency and also made available to end-users in the admin and on WordPress.org.
The idea has had a few months to marinate and has generated some discussion about what a prototype would entail. Core committer Ella Van Dorpe created an experimental wp-data standalone plugin for tracking a few simple interactions with the editor. Participants in the discussion recommended creating an Elasticsearch/Logstash setup for storing the data, technologies that the WordPress.org systems team have deployed before.
“I think a good summary is that there are a lot of hurdles in the way and currently no one has time to work on it,” Greg Brown, a Data Wrangler at Automattic, said in a followup discussion on the ticket three weeks ago. “Ultimately, I think the biggest blocker is getting someone with the time, inclination, and persistence to work on this. Getting it deployed onto .org is the right thing to do eventually, but I suspect it will take quite a while.”
WordPress lead developer Dion Hulse confirmed that WordPress is already tracking many of these stats and that creating a prototype on WordPress.org infrastructure would be the best option forward.
“It would also be valuable to see how our existing stats system can compliment or be replaced by the proposal here though,” Hulse said. “I mention this as most of the stats from the original description are already tracked, just not exposed in any form. The only new thing mentioned here is the Frequency of use of specific views (Settings, Customizer, etc) and transparency part (which would still probably only be anonymized summaries, not exact data).”
WordPress Telemetry Project Provides a Solution to Long-Standing Privacy Concerns
Moving WordPress’ current data tracking into a more transparent opt-in feature would also provide a solution to some long-standing privacy concerns raised by contributors in a six-year-old trac ticket. WordPress tracks the number of blogs and users in a given installation, along with the installation URL in the headers, in order to facilitate update requests that may become problematic, particularly in the case of large multisite installations.
“Even if a user knows that some data needs to be passed for a version check of core, plugins, or themes, the amount of data passed to remote is obviously more than needed to do the version check,” one contributor commented on the ticket. “But users should be made aware upfront so they can freely decide on their own if they want to instead of being forced to support the project with their usage-data. They could be offered an opt-in to do so.”
“The number of registered users I have on my site tied to the URL that is sent with tracking request gives out vital information on how well my business could be doing – information that is mine and mine only,” WordPress plugin developer Danny van Kooten said. “At the very least we could make it very clear that WordPress is tracking this information and what exactly it is doing with it. I really do not think there is any excuse for that.”
Developers can filter the data to satisfy their privacy concerns but it is somewhat inextricable from the update process for larger multisite installations. It’s also too big of a technical hurdle for most regular users who would be better served by a simple UI allowing them to opt out of data collection.
Rand-Hendriksen’s WordPress telemetry proposal gives the project an opportunity to formalize what data is being collected, state the purpose behind it, and allow users to choose if they want to be included.
Europe’s General Data Protection Regulation (GDPR) May Push WordPress Towards More Transparent Data Collection
Progress on both the Telemetry project and the ticket regarding privacy concerns has been slow. Neither seem to be a priority among contributors, but Europe’s General Data Protection Regulation (GDPR) may provide the impetus needed to push WordPress towards more transparent and responsible data collection.
The GDPR is an overhaul of data protection law in Europe with far more stringent requirements than the previous laws. It requires full disclosure for any data collection and standardized privacy notices to help users understand where and how the data is being used. Consent to have data collected must be confirmed and users have the right to access their own data. It also includes the right of erasure or “the right to be forgotten,” which allows users to remove their data from the web. The GDPR goes into effect in May 2018.
Heather Burns, a digital law specialist who consults and speaks extensively on internet laws and policies, encouraged WordPress contributors to frame the discussion regarding privacy concerns in terms of working towards compliance with a specific framework.
“For the purposes of this discussion, core should work to the GDPR standard for two reasons,” Burns said. “The first reason lies in cultural differences. The US does not have a single overarching data protection and privacy regulation, unlike Europe, where we have this data protection regime which applies to all personal data regardless of use, format, or sector. So GDPR gives developers – even those outside the EU – a robust, healthy, and very tough set of standards to follow. Given what we have seen coming out of the White House in the past week, GDPR also provides as good a starting point as any for defensive user protection.
“The second is that GDPR is extraterritorial. It applies to the personal data of anyone in Europe regardless of where the online service is located. If your business is in the US or Australia or Israel but you have European users, you have to protect their data to European GDPR standards.”
Pricewaterhouse Coopers recently surveyed 200 US-based multinational companies with more than 500 employees and found that 77% plan to spend $1 million or more on GDPR compliance. More than half of those surveyed cited GDPR readiness as the highest priority on their data-privacy and security agendas.
The hefty penalties of noncompliance are one of the driving factors behind American companies spending millions of dollars on satisfying the requirements of this new European regulation.
“GDPR is a complete overhaul of its dialup-era (1995) predecessor and one of the areas that has been beefed up is its teeth,” Burns said. “Businesses which are found to be in noncompliance by a European member state’s data protection regulator, whether that is your small app studio all the way up to Automattic, could face penalties of up to 4% of the business’s global annual turnover. Now there’s some solid context for the philosophical discussion.”
However, not everyone is convinced that the GDPR will be beneficial to consumers. Kitty Kolding, CEO and president of Infocore Inc, an international company that specializes in sourcing market data, told ExchangeWire that she believes the GDPR will undermine “the sanctity of consumers’ data privacy and security” and hobble marketing and advertising worldwide.
She contends that provisions like the “right to be forgotten,” which require customer data to be retained beyond the time that it’s in active use, will make that data more susceptible to hacking. Additionally, the enforcement body for the new legislation claims authority over companies, with the right to search and seize records, without any oversight or appeals.
“Every company everywhere that handles data on EU citizens is also automatically subject to this group’s absolute power – though it’s anybody’s guess how the EU believes they can enforce such a broad mandate outside its own borders,” Kolding said.
Currently, only two trac tickets mention the GDPR so it’s not yet clear how WordPress core will respond to the requirements of the new legislation. Burns recommends that WordPress core contributors go through the process of conducting a privacy impact assessment to determine the right way forward.
Regardless of WordPress’ response, companies and organizations that depend on the software will need to assume the responsibility of their own compliance, as these requirements extend far beyond core. The GDPR applies to anything added into a website or app that collects users’ data. For example, many contact form plugins store submissions inside the WordPress database and site owners will want to re-examine how users are notified of this.
“One of the main changes with GDPR is called the accountability principle,” Burns said. “Businesses collecting personal data must be completely transparent and accountable over what data they are collecting, how they are storing it and where, who it is being passed to (such as third parties), who has access to it, and how long it is retained. Users also have the right to request that any data collected about them must be deleted.”
There’s no WordPress plugin that will instantly make a site GDPR compatible. Drupal has a GDPR module that aims to make sure the site follows the guidelines and legislation set by the EU, but it doesn’t cover all requirements. Automating an assessment of privacy impact for a site using a CMS and potentially dozens of third-party extensions is a complex endeavor. This is one regulation that will require business owners to educate themselves and implement privacy practices that put users’ interests first.
With the deadline for compliance closing in, WordPress has an opportunity to re-evaluate how the project handles user privacy and make steps towards greater transparency. If contributors are looking into collecting more data to assist decision-making on features, as outlined in Rand-Hendriksen’s telemetry proposal, this project provides an avenue for working towards GDPR compliance. These privacy concerns are especially important to address when considering WordPress for government, healthcare, educational institutes, and other data sensitive websites.
Burns views the GDPR’s compliance deadline as a fresh opportunity for WordPress to build better privacy structures and legal certainty using the regulation as a healthy baseline for all users.
“Everyone needs to be working in implementations for their own businesses and sites in any case ahead of deadline day, in addition to any changes that need to be made in the WP code,” Burns said. “It’s important to remember that GDPR compliance is not a tick box you can squeeze in next April. This is about your processes, your workflows, and your systems of accountability. Start now.”