How to support the GDPR requirements in your Sitecore solution

In the past months GDPR has been a hot topic at my clients and we have had to implement the different GDPR requirements at multiple clients. This involved not only educating everyone involved about what GDPR involves, what it means and how it should be interpreted but also how to support this in Sitecore implementations.

A good starting point, that we also used is this nice stackexchange post. It gives an overview of the different touch points in Sitecore and an approach of fulfilling the requirements. This blog post tries to go one step further with some implementation examples, mainly targeted at Sitecore 8 prior to 8.2 update 7. This post does not cover all GDPR topics as some of them are more business and less implementation related, nor does it give full working code examples, but it gives a good indication on the major topics that need to be considered.
This post may be updated in the coming period if I receive feedback, to make this a good and even more complete source of information on this topic.

Data

To understand what your implementation requires it is good to have an overview of where Sitecore is potentially saving private (PII Sensitive) data and other data that is tied to a contact.

  • xDB: here is all the behavioral data stored as well as private data in facets, identifiers, interactions, states and formData
  • Core database: this may contain user profiles if you use membership tables
  • Indexes: analytics index may contain email addresses, (custom) facet data etc. to provide quick access to lists, experience profile and reports
  • Reporting database: Identified contacts have there primary identifier in the Contacts table 'ExternalUser' column
To make it a little easier to identify sensitive data, and to erase/anonymize such data, in Sitecore 9 and Sitecore 8.2 update 7, data/facets can be marked with a PIISensitive attribute. This data will then be in scope fo the 'ExecuteRightToBeForgotten' pipeline and will exclude this data from indexing. Keep in mind that you still have to implement calling and integrating with these attributes and pipelines yourself in your solution.

Actions

With GDPR in place, users get a set of rights, of which the following rights can and should be supported.

Consent

Your website (in a Sitecore context, assuming you use the tracker) should inform the visitor of how its (inter)actions on the site are being tracked and what that information is used for. In a lot of implementations we see that this is done with a cookie 'wall' where a user is requested for consent before he/she can continue to use the site's functionality. While a cookie wall is not part of GDPR but more on ePrivacy, getting consent to store data that later may be lilnked to PII data is very important and therefore is highly recommended.
The consent that is given should be stored, together with the date of consent and the version of the privacy policy that the user agreed to.
How we implemented this is with a custom Facet collection of consents in xDB. So each contact on the site will have this facet collection where is stored which versions and date on which a consent was given.
With this custom facet based implementation we also give the content editor the choice to manage cookie 'wall' content and store a 'policy date' with that. So that if he changes the cookie privacy notice, he can also change the policy date value and visitors will have to accept the consent again as the privacy notice has changed to an extend that they would have to agree with it again. This policy date is therefore stored with the consent. In this way a Sitecore contact can have multiple consents stored in his profile.
Here is what such a consent collection looks like in Facet code:
Consent collection:
    [Serializable]
    public class ConsentCollection: Facet, IConsentCollection, IFacet, IElement, IValidatable
    {
        public const string CONSENT_COLLECTION_FACET_NAME = "ConsentCollection";

        public ConsentCollection()
        {
            base.EnsureDictionary("Entries");
        }

        public IElementDictionary Entries
        {
            get
            {
                return base.GetDictionary("Entries");
            }
        }
    }

Consent:
    [Serializable]
    public class CookieConsent : Element, ICookieConsent, IElement, IValidatable, IFacet
    {
        public const string CONSENT_COLLECTION_FACET_NAME = "CookieConsent";

        public CookieConsent()
        {
            base.EnsureAttribute(nameof(ConsentGiven));
            base.EnsureAttribute(nameof(ConsentDate));
            base.EnsureAttribute(nameof(CookieValue));
            base.EnsureAttribute(nameof(PolicyDate));
        }

        public bool ConsentGiven
        {
            get
            {
                return base.GetAttribute(nameof(ConsentGiven));
            }
            set
            {
                base.SetAttribute(nameof(ConsentGiven), value);
            }
        }

        public DateTime ConsentDate
        {
            get
            {
                return base.GetAttribute(nameof(ConsentDate));
            }
            set
            {
                base.SetAttribute(nameof(ConsentDate), value);
            }
        }

        public string CookieValue
        {
            get
            {
                return base.GetAttribute(nameof(CookieValue));
            }
            set
            {
                base.SetAttribute(nameof(CookieValue), value);
            }
        }

        public DateTime PolicyDate
        {
            get
            {
                return base.GetAttribute(nameof(PolicyDate));
            }
            set
            {
                base.SetAttribute(nameof(PolicyDate), value);
            }
        }
    }


Before a consent is given, the site will not render any Sitecore identification controls so Sitecore analytics will not be started, nor will any thirdparty cookies and scripts be loaded yet by disabling these parts when consent is not present.

In Sitecore 9 you can store such information in the OOTB  'ConsentInformation' facet.

Transparency and modalities & Information and access to personal data  & Right to be informed

Your content editor should very clearly describe what kind of data is collected and used within your site with regards to what kind of personal data may be collected to for example personalize adds, but also to describe what kind of third party cookies are added and what kind of implications this has.
Next to that content needs to be created that describes how the visitor can execute their rights on your web site or how to trigger execution through other channels.

Right to data rectification

If you gather private data such as email addresses, names etc. you also have to offer a way for the visitor to change this data. This can be supported with online forms, or offline channels. But in either case, the data in xDB is it is stored there needs to be updated. for xDB data this can be done by loading the contacts data and changing the facets that have this information stored. This can be implemented using the ContactManager and ContactRepository in Sitecore 8 or using XConnect in Sitecore 9.
Code for this in Sitecore 8 could look like this:

     var contactEmail = Sitecore.Analytics.Tracker.Current.Contact.GetFacet("Emails");
     if (!contactEmail.Entries.Contains(CONTACT_HOME_EMAIL_KEY))
     {
        contactEmail.Entries.Create(CONTACT_HOME_EMAIL_KEY);
     }
     var emailObject = contactEmail.Entries[CONTACT_HOME_EMAIL_KEY];
     emailObject.SmtpAddress = emailAddress;
     contactEmail.Preferred = CONTACT_HOME_EMAIL_KEY;

In Sitecore 9 it could look like this:

     var contactEmail = Sitecore.Analytics.Tracker.Current.Contact.GetFacet("Emails");

using (XConnectClient xConnectClient = Sitecore.XConnect.Client.Configuration.SitecoreXConnectClientConfiguration.GetClient())
{
  try
  {
 var contactRef = new IdentifiedContactReference(Sitecore.Analytics.XConnect.DataAccess.Constants.IdentifierSource, identity);
 var xConnectContact = xConnectClient.Get(contactRef,
 new ExpandOptions(EmailAddressList.DefaultFacetKey, CustomCustomerFacets.DefaultFacetKey));

 if(xConnectContact == null)
 {
  xConnectContact = new Contact();
  xConnectClient.AddContact(xConnectContact);
 }

 if (xConnectContact != null)
 {
  if (xConnectContact.GetFacet() == null)
  {
   EmailAddressList emails = new EmailAddressList(new EmailAddress(identity, true), "Home");

   xConnectClient.SetFacet(xConnectContact, emails);
  }
  // For each contact, retrieve the facet - will return null if contact does not have this facet set
  var customFacets = xConnectContact.GetFacet();

  if (customFacets != null)
  {
   customFacets.xxx= 123;
  }
  else
  {
   var facets = new CustomCustomerFacets();
   facets.xxx = 123;
   xConnectClient.SetFacet(xConnectContact, facets);
  }

  xConnectClient.Submit();
 }
  }
  catch (XdbExecutionException ex)
  {
 Sitecore.Diagnostics.Log.Error(string.Format("Updating contact failed"), ex, this);
  }
}

Keep in mind that to execute right to rectification (or right to be forgotten) that you have to be really sure of the user requesting this is the right one. So only do it when you are 100% sure of your checks. If not, you might not want to automate this, but incorporate a working process to have an employee handle these requests. In turn, this employee can use solution based on above samples.

To identify your customer inside Sitecore, there are two ways to extract data from the customer, or to 'forget' them.
The first one is by cookie id. Sitecore creates an analytics cookie on the customers browser and this cookie id is the actual contact id of the contact in Sitecore. Using this id you can extract, update or anonymize all the data. To get the cookie id, the user either needs to inspect his cookie using his browser developer tools which of course is not very friendly, an alternative is writing a few lines of code that can be exposed on a seperate url for example that reads and prints this id. This is is however a guid, and therefore quiet long to read. So you might want to consider a button to not just display this id, but directly submit it to a chat or mail address of your employee that is facilitating the customer in his rights.
The second one is for known customers that you have identified by for example their email address. In this case you can use this identifier to execute the rights.

Right for data portability

For this right, you have to support the visitor in extracting the data stored about him/her and his interactions with the site that you store.
In Sitecore 8.2 update 7 you can do this by the following code:

 
var contactRepository = Sitecore.Configuration.Factory.CreateObject("contactRepository", true) as Sitecore.Analytics.Data.ContactRepositoryBase;
var history = contactRepository.GetInteractionCursor(contactId, visitToLoadPerBatch, maximumSaveDate);

In Sitecore 9 you can use the XConnect API, and in Sitecore 8 you can make it yourself like this:

 
private ContactManager _manager = Sitecore.Configuration.Factory.CreateObject("tracking/contactManager", true) as ContactManager;
private ContactRepository _repository = Sitecore.Configuration.Factory.CreateObject("tracking/contactRepository", true) as ContactRepository;

public ContactManager XdbContactManager
{
 get { return _manager; }
}

public ContactRepository XdbContactRepository
{
 get { return _repository; }
}

public string Extract(string identifierOrContactId)
{
 try
 {
  Contact contact;
  if (!string.IsNullOrEmpty(identifierOrContactId) && identifierOrContactId.Contains("@"))
  {
   contact = XdbContactRepository.LoadContactReadOnly(identifierOrContactId);
  }
  else
  {
   contact = GetXdbContact(Sitecore.Data.ID.Parse(identifierOrContactId).Guid);
  }
  if (contact == null)
  {
   return null;
  }

  IEnumerable allInteractions = _repository.LoadHistoricalInteractions(contact.ContactId, int.MaxValue, DateTime.Now.AddYears(-1), DateTime.Now);
  var consentCollection = contact.GetFacet(ConsentCollection.CONSENT_COLLECTION_FACET_NAME);

  var contactData = new ContactData
  {
   Contact = contact,
   AllInteractions = allInteractions,
   Consents = new List()
  };

  foreach (var consentKey in consentCollection.Entries.Keys)
  {
   var consent = consentCollection.Entries[consentKey];
   contactData.Consents.Add(
    new CookieConsent
    {
     ConsentDate = consent.ConsentDate,
     ConsentGiven = consent.ConsentGiven,
     CookieValue = consent.CookieValue,
     PolicyDate = consent.PolicyDate
    });
  }

  JsonConvert.SerializeObject(contactData);
 }
 catch (Exception ex)
 {
  Log.Error("Error - Retrieving data from xDB.", ex);
 }
}


Right to be forgotten

To support this it is recommended to not actualy delete the data from Sitecore but to anonymize the data. this to prevent errors in reindexing.
For Sitecore 8.2 update 7 you can execute:

 
var args = new Sitecore.Analytics.Pipelines.RemoveContactPiiSensitiveData.RemoveContactPiiSensitiveDataArgs(contactId);
Sitecore.Pipelines.CorePipeline.Run("removeContactPiiSensitiveData", args);

For Sitecore 9 you can execute the 'ExecuteRightToBeForgotten' method on the XConnectClient. 
Custom code for pre Sitecore 8.2 update 7 is:

 
/// 
/// Anonymize contact in xDB and related (reporting) databases and index files to comply with GDPR regulations
/// 
/// 
/// 
public void ForgetContact(string identifier)
{
 try
 {
  ContactRepositoryBase contactRepository = Factory.CreateObject("contactRepository", true) as ContactRepositoryBase;
  LeaseOwner leaseOwner = new LeaseOwner("SomeUniqueString", LeaseOwnerType.OutOfRequestWorker);
  string contactIdentifier;
  LockAttemptResult lockResult;
  

  if (!string.IsNullOrEmpty(identifier) && identifier.Contains("@"))
  {
   lockResult = contactRepository.TryLoadContact(identifier, leaseOwner, TimeSpan.FromMinutes(1));
  }
  else
  {
   lockResult = contactRepository.TryLoadContact(Sitecore.Data.ID.Parse(identifier).Guid, leaseOwner, TimeSpan.FromMinutes(1));
  }

  if (lockResult.Object == null)
  {
   string msg = "No contact was found with identifier: ";
   if (lockResult.Status == LockAttemptStatus.AlreadyLocked)
   {
    msg = "Please, try again later. There is an active session for user: ";
   }

   var ex = new NullReferenceException(msg + identifier)
   {
    Source = this.GetType().ToString()
   };
   throw ex;
  }

  Contact contact = lockResult.Object;
  var contactId = contact.ContactId;
  bool isIdentified = false;

  if (lockResult.Status == LockAttemptStatus.Success)
  {
   // Make the contact anonymous.

   contact.Identifiers.Identifier = null;
   contact.Identifiers.IdentificationLevel = ContactIdentificationLevel.Anonymous;
   // get the email facet
   var contactEmail = contact.GetFacet("Emails");
   if (contactEmail != null && contactEmail.Entries != null && contactEmail.Entries.Contains("Home"))
   {
    // reset the email
    var emailObject = contactEmail.Entries["Home"];
    emailObject.SmtpAddress = null;
   }

   // Save and release the contact.
   var options = new ContactSaveOptions(release: true, owner: leaseOwner);
   //SaveContact(contact);
   contactRepository.SaveContact(contact, options);
  }

  var analyticsIndexBuilder = (IAnalyticsIndexBuilder)Factory.CreateObject("helpfulcore/analytics.index.builder/analyticsIndexBuilder", true);
  analyticsIndexBuilder.RebuildContactIndexables(new[] { contactId });

  if (isIdentified)
  {
   this.AnonymiseContactInReportingDB(identifier);
  }

  //var poolPath = "aggregationProcessing/processingPools/contact";
  //var pool = Factory.CreateObject(poolPath, true) as ProcessingPool;
  //if (pool != null)
  //{
  //    var poolItem = new ProcessingPoolItem(contactId.ToByteArray());
  //    poolItem.Properties.Add("Reason", "Updated");
  //    pool.Add(poolItem);
  //}
 }
 catch (Exception ex)
 {
  Log.Error("Error forgetting a contact: ", ex);

  throw;
 }
}

public Contact GetXdbContact(Guid contactId)
{
 var contact = XdbContactRepository.LoadContactReadOnly(contactId);

 if (contact != null) return contact;

 return null;
}

private const string AnonymisationPrefix = "anonymous";

private void AnonymiseContactInReportingDB(string identifier)
{
 string constr = ConfigurationManager.ConnectionStrings["reporting"].ConnectionString;
 using (SqlConnection connection = new SqlConnection(constr))
 {
  string query = $"UPDATE [dbo].[Contacts]   SET[ExternalUser] = @Anonymizer WHERE ExternalUser = @Identifier";
  using (SqlCommand cmd = new SqlCommand(query))
  {
   cmd.Connection = connection;
   connection.Open();
   cmd.Parameters.AddWithValue("@Anonymizer", AnonymisationPrefix);
   cmd.Parameters.AddWithValue("@Identifier", identifier);
   int result = cmd.ExecuteNonQuery();
   connection.Close();

   if (result != 1)
   {
    throw new System.InvalidOperationException($"Entry {identifier} was not removed from DB!!!");
   }
  }
 }
}

Note here that the custom code also contains code to actually anonymize data from the Reporting database. When identifying contacts, an aggregation process actually stores the user's identifier in the Reporting database, in the Contacts table, in column [ExternalUser]. This is an ommission in the Sitecore 8.2 update 7 and Sitecore 9.01 versions for right to be forgotten. The official recommendation from Sitecore here is to rebuild your reporting database. But doing that for each right to be forgotten request seems like a massive overkill, hence this customization.
Note2: This code makes use of external nuget package 'helpfulcore' to ensure reindexation of the contact, so that eventual private data is also removed from index files.


Security by design

With GDPR organizations are next to supporting the user's rights, also obliged to follow security by design principles. Next to following the Sitecore security hardening guidelines we should be aware of all the area's that data is stored and make sure we consider GDPR implications. Consider who has access to this data and make sure you have a compliance officer that guards this.
So think of:

  • Experience profile: all custom and OOTB facets where you store data.
  • User manager if you use membership solutions
  • Forms/WFFM and access to their reports and data exports
  • Lists and list manager
  • EXM
  • Marketing automation (show identifiers in supervisor view, next to process state information)

Data retention

For all data stored you have to have a retention policy in place. This can be implemented in any way you like. Going from custom pipelines that selects date ranges and deletes or anonymizes it, to xDB queries to do the same.

Reacties

Een reactie posten

Populaire posts van deze blog

I Robot - Sitecore JSS visitor identification

Sitecore campaigns and UTM tracking unified

Sitecore JSS - Sitecore first