The Data Centric Security Model

Oct 16, 2014
13 min read

Originally Posted on May 10, 2012:

There is no debate regarding the purpose of Information Services security. IS Security exists to safeguard data. This seems like a very blunt and assuming statement, but it is absolute in the opinion of this author. Data exists in a variety of forms and is housed in and utilized by a myriad of systems. Data has many different levels of importance ranging from highly confidential to publically available. Data saturates every aspect of a modern business enterprise. As such, in all situations and in every form or level, data must be protected and secured. Based on this absolute truth, a data centric IS security model would seem to be the most direct mechanism to achieve the goal of IS security.

What does Information Security really mean?

Before we can begin to understand and apply a data centric security model, we must first define and understand what it means to keep data secure. As stated in the previous paragraph, data exists in a variety of forms and has many different levels of importance. One of those levels is data which is considered publicly available and free for general consumption. This begs the question “how do you secure something that is available freely to everyone?” To properly answer that question, you first must define all of the aspects of information security. The definition is best represented in that tried and true Information Security methodology, the AIC Triad. The acronym AIC represents the 3 core security focuses for IS systems and/or data: Availability, Integrity, and Confidentiality. Availability refers to the ability of a client to access a system, or at its most basic, the ability to access data when necessary. Integrity refers to the reliability of a system and/or its data. When referencing data integrity, said data has not been improperly altered. Confidentiality refers to the access or viewing of data or systems only by those with valid and specific privileges. From these 3 core focuses, an IS security professional can extrapolate the necessary access controls to protect and secure data within an IS environment.

Let’s go back to the example of a piece of data that is publically available for the entire world to see and apply the principles of the AIC triad. Clearly confidentiality is not the primary focus. Everyone has access by design. What about integrity and availability? IS Security needs to take those focuses into account. Who is the author or owner of the data? How do you validate that information? How consistent is the availability of the data? Can everyone who has access get to the data in a timely manner? Can you recover the data in the event it is lost or damaged? These are all questions IS Security must answer in the form of access controls governing data at every level within an enterprise.

What are Access Controls?

We have defined what information security means and what focuses or principles a security model should take into account. We have also determined that access controls are necessary to properly align data and systems with the focuses of an IS security model. Now we need to better define what an access control really is. Access controls are mechanisms designed to manage resources and data based on the defined intentions of an organization or enterprise. The mechanisms are implemented to grant or revoke access to systems or data, define policies and procedures associated with systems or data, and/or to monitor systems or data for change. There are three fundamental types of access controls: Administrative, Technical, and Physical.

Administrative controls are documented controls or personnel-oriented actions that are established to provide an acceptable level of protection for IS resources. Examples include training, education, separation of duties, policies and procedures, supervision, contingency and recovery plans, organizational ethics statements, etc.

Technical or logical controls involve the use of hardware or software mechanisms to protect IS resources and to ensure that unauthorized access is prevented and/or detected. Examples include antivirus/anti-spam/anti-spyware software, encryption, audit logs, intrusion detection/prevention systems, etc.

Physical controls are manual, structural, or environmental controls that protect facilities and IS resources from unauthorized access or natural threats. Examples include locks, doors, fences, guards, alarms, badges, CCTV, motion detectors, sensors, etc.

Within each of these types of controls are multiple different categories including preventative, detective, corrective, directive, deterrent, recovery, and compensating controls. Clearly, the process can become quite granular when considering the type, category and need of an access control for a specific system or type of data within an IS security model.

The 4 W’s of Data Centric Security

To this point, we have defined the key aspects of Information Security and identified the three core focuses of a sound IS Security methodology. We have also identified the types and roles of access controls and how they control and protect data. We can now apply this information to that absolute truth we discovered at the beginning of this document – data must be protected and secured – in the form of a data centric IS security model.

At the heart of a data centric IS security model are the 4 W’s of data centric security: 1) Where is the data? 2) What is the data? 3) Who has access to the data? 4) Why do they need access to the data? The answers to these 4 questions, when applied to any data set in any environment, either physical or logical, should provide you all of the necessary information to formulate a plan of access controls to properly secure and control the data in question.

There are a few assumptions to these 4 questions that must be considered and mitigated before formulating your plan. First, the answers to the questions must be honest. Second, the answers must be complete and accurate. Third, the answers must be supported and understood by all of the parties involved in designing and supporting the access controls. At the core of these assumptions is the concept that you must perform your due diligence in the analysis of the nature of the data, where it resides, who has access to it, and, most importantly, why. The research necessary to properly answer these 4 questions must be exhaustive in order for the planned controls to be effective. Also, buy-in and support of these answers must be consent, steadfast and rooted in the data owners themselves and openly endorsed by upper management.

Now that we have defined the structures, boundaries and assumptions surrounding these 4 questions, we need to take a closer look at each question individually and how the answers to each question relate to the access controls necessary to the protect the data being evaluated.

Question #1 – Where is the data?

This question seems quite straight-forward initially, but it is important to realize that there are two aspects to the answer, one logical and one physical. Physically, data exists on specific media (hard drives, optical disk, solid state media, etc.) and that media resides in physical locations, storage area networks, servers, PC’s, and other devices. In some cases, data resides in portable devices such as thumb drives, smartphones and other mechanisms under the control of end users. Physical security is a cornerstone on which all other security controls rely. All of the aforementioned media, servers, devices and mechanisms must be protected via physical controls in ensure that they remain free from compromise. In fact, as we discuss the other 3 W’s of data centric security, it will become clear that physical security controls play a role in every question and its answers. This is the reason why you have to ask the “where” question even before you ask the “what” question. Regardless of what type of data you have, where it resides must always be the starting point for its security.

Where data resides dictates the type of physical controls that may be necessary to isolate the data from threats and provide a forensic path in the event data is compromised. If data resides in media or systems maintained in a physical data center, then the controls of that facility come into play. Fire suppression must be present. Locks must be in place. Video surveillance must be utilized. Consistent and redundant power must be provided. Backup and recovery tools must be deployed and tested. What if the data resides with an end user? That means the data is leaving the safe confines of your protected and hardened facility and is effectively “in the wild”. In this situation, end user security awareness is vitally important. Users must be trained on the best practices surrounding the security and maintenance of those portable media devices. Policies and procedures must be in place to ensure those defined best practices are followed. Encryption may be necessary to protect the data in the event of loss or theft. Based on just the few physical controls discussed so far, it becomes clear that the physical location of data is of paramount importance in its overall security.

To this point we have only discussed the physical aspect of the “where” question. As mentioned earlier, there is also an important logical factor to consider. Logically, data can reside in a variety of environments. Those environments may be based on the operating system hosting the data or they may be based on the nature of the host itself. The proliferation of virtual machines has changed the nature of the “where” question and helped to merge, if not intersect, the physical and logical aspects of this question. Part of the logical aspect of this discussion is the role or nature of the server or system hosting the data. In certain situations, data resides on a system with other data of higher importance. As such, controls to protect on set of data may not be compatible with the controls necessary to protect the other set of data. It is at this point system tiering becomes a valuable control to deploy. System tiering is an administrative control in which all systems and their corresponding data are tiered based on their importance to the organization. This ordering of systems helps to prevent shared system conflicts and ensures that physical controls are in place to maintain the proper availability and confidentiality of key organizational resources.

Question #2 – What is the data?

Asking what is the data in question is basically an exercise in determining its nature and classification. This process is a necessary and fundamental procedure that every organization should deploy and maintain. At its heart, data classification is an administrative control used to categorize data based on its sensitivity, value, or criticality to an organization. This is usually accomplished by reviewing the composition of the data and placing it in one of a set of scaled categories. Categories typically range from publically available data to strictly confidential, with varying levels in between. Each of these categories would then have corresponding physical, technical and administrative controls to protect the defined data based on importance and sensitivity.

As initially stated, data classification is a control every organization should deploy and maintain. Unfortunately, that is not always the case. Many organizations skip this step or make the claim that all data is considered sensitive or confidential. This attitude can result in a very costly and ineffective security posture. Usually, one of two scenarios develops. In the first scenario, if an organization does follow through with the attitude that all data is sensitive and confidential, then that organization finds itself significantly over spending on certain security controls and wasting precious man-hours implementing solutions on unimportant data. Other necessary controls are sacrificed and the overall security structure of the organization is weakened. In the other scenario, the organization becomes desensitized to the sensitivity of its data and it fails to implement and/or maintain the proper security controls from the onset. As with the first scenario, the overall security structure of the organization is weakened. Proper classification of data generally leads to balanced security control deployment and maintenance and a stronger, more efficient overall security approach.

Another aspect of the “what” question is how you should treat data based on its definitional nature. One applicable administrative control is the concept of data retention. It is not cost effective to maintain data forever on certain systems. There are substantial costs associated with disks, backup and recovery and general maintenance. It therefore becomes imperative for an organization to decide how long it will archive data and how long it will keep data available on a production system. These durations often cannot be universal or arbitrary values. Data retention is governed in many circumstances by legal and regulatory requirements. Therefore, when asking the “what” question, one must include thoughts like “is this data related to a particular law or regulation?” “How long do I need this data to be available for a particular process?” All of these questions help to define the nature of the data, which will springboard the security model into the next two questions – Who has access and Why?

Question #3 – Who has access to the data?

As with Question #1 (Where), there is both a logical and a physical aspect to the question – who has access to the data? Access can mean many things. It may be related to who can open a door or possess a key. It may be based on having a username and password to a computer. It may even be as granular as having the ability to alter a file versus only the ability to read it. Generally, security efforts surrounding the answer to the “who” question are the heart and soul of any IS security model. Even in situations where questions 1, 2 and 4 are never asked, question 3 is almost universally considered and answered to one degree or another. And in most situations, the answer to the “who” questions begins and ends with the concept of authentication and a username and password. Username/Password authentication is a security control familiar to everyone, so we will not focus on it except to say that it should be considered the initial baby step of a security structure designed to answer the “who” question, and never the end of that process.

There are several physical and technical controls beyond basic username/password solutions that can enhance and support a security model designed to manage who has access to data. Network segmentation is one such control. By managing network traffic at the department and data center level, we can provide barriers to protect against intentional and unintentional harm to systems and data. Segmentation also augments solutions focused on separation of duties strategies and the prevention of permissions creep. If a user in an organization should only have access to certain servers or specific sets of data, network segmentation can help enforce that structure. If a user changes departments or roles within a department, but his/her permissions are not adjusted properly, segmentation can provide an additional layer of security against the access of unauthorized resources.

Network access control or NAC is another technical control that complements the efforts surrounding answers to the “who” question. At its most basic level, this control is designed to monitor and govern who and/or what has access to the network and under what conditions. Network access control can be as advanced as an enterprise class system of network and client tools monitoring MAC addresses and PC’s and logical network segments. It can also be as straight-forward as a process to control which network jacks are hot and whether A/V software is required before a PC can talk to a network resource. Monitoring is a key component, which aids in overall asset management. Knowing what assets an organization has and how those assets should be authorized to access data is a fundamental security need and a key resource necessary in answering the “who” question.

Question #4 – Why do they need access to the data?

The question of why someone needs access to data is a question that is familiar to all security professionals, and in fact, most individuals working in information services. All good professionals quickly become accustomed to asking the “why” question. Asking why exists far beyond the realm of security and often takes on a very negative tone. Why did this system fail? Why did our redundancies not protect us? Why did our procedures not work? Asking why in this context is a function of Root Cause Analysis. Something has failed or broken and we must determine why. Utilizing Question 4 from a security perspective can be quite the opposite scenario. If done properly, you are answering this question well before anything has happened and well before any forensic analysis. Answering Question 4 should be a preventative exercise to protect data and ensure proper access is being provided and for the correct reasons.

Answering Question 4 must be a very specific process. Far too often the question of “why” is answered in very generic terms. Why does a department need access to data? Why does a C-level associate need access to data? This group approach is not granular enough. In answering Question 4, it is important to consider specific users and specific job functions. By doing so, we can properly define the nature and use cases for data and how those scenarios should be authorized. Grouping those scenarios and functions into roles after Question 4 has been answered then makes sense because all relevant information has been collected and efficiency can be taken into account from an administrative perspective. If you took the opposite approach and worked your way down from the group to the individual, details would be easily lost.

There is another dimension to the “why” question. When answering Question 4, you must take into account what granting access to data means. A user may effectively define why he or she needs access to a particular file, but from a security perspective, that definition must include the levels of access needed. This scenario builds upon the argument that answering Question 4 must be a very granular process. Access levels must always be taken into consideration, from read, write, execute, delete, copy, modify, and others. Several technical and physical access controls exist to help in managing the answers to Question 4 including the technical control, DLP (Data Loss Protection). DLP is a system of client, server, and network tools designed to enforce predefined rules for the usage of data in an organization. These rules are based on the data classifications set forth by the organization and roles defined for its users. At its most basic, DLP prevents a user from doing something with data that they are not supposed to do.

Let us consider an example. The CIO of a company is authorized to view extremely sensitive sales data for an organization. The CIO needs this access to prepare for departmental meetings and understand the organization’s goals. The organization classifies this sales data as highly confidential and, using DLP, assigns certain rules to govern the data’s use. The data can be read by certain C-level executives, but it cannot be altered or transmitted outside the organization. If the CIO attempts to alter the sales data spreadsheet on his local PC and then copy the data back to the server, DLP agents on the server and/or PC would enforce the organization’s rules, preventing the copy. If the CIO attempted to email the spreadsheet to his home computer, DLP agents on the organization’s firewall would identify the content of the message and prevent its delivery. In all of these scenarios, the DLP solution would alert system administrators that someone was attempting to misuse confidential data. As you can see from this example, DLP can be an effective tool in enforcing the answers to the “why” question.

Conclusion

Aside from its people, data is the most important asset of any organization, and IS Security exists to protect and safeguard it. Therefore, any security model deployed by IS Security must be data centric and take into account the data needs of the organization. Defining the data needs of the organization means answering the 4 W’s – Where, What, Who, and Why. This sounds like a very straight-forward process, but as this document has hopefully demonstrated, there are a variety of scenarios that must be considered, a wealth of information that must be gathered, and a number of controls that must be implemented before any data centric security model can be considered complete. Despite the time-intensive work that must be performed and the complications that inevitably arise, the deployment of this security model is a worthwhile endeavor that every organization should consider.