Before I as a user of Organization A send data about me to organization B, I read the privacy policies enforced by organization B
If I agree to the privacy policies of organization B, then I will send data about me to organization B
If I do not agree with the policies of organization B, then I can negotiate with organization B
Even if the web site states that it will not share private information with others, do I trust the web site
Note: while confidentiality is enforced by the organization, privacy is determined by the user. Therefore for confidentiality, the organization will determine whether a user can have the data. If so, then the organization van further determine whether the user can be trusted
What is Privacy
Privacy is about a patient determining what patient/medical information the doctor should be released about him/her
A bank customer determine what financial information the bank should release about him/her
FBI would collect information about US citizens. However FBI determines what information about a US citizen it can release to say the CIA
Some Privacy concerns
Medical and Healthcare
Employers, marketers, or others knowing of private medical concerns
Allowing access to individual’s travel and spending data
Introduce random values into the data and/or results
Challenge is to introduce random values without significantly affecting the data mining results
Give range of values for results instead of exact values
Secure Multi-party Computation
Each party knows its own inputs; encryption techniques used to compute final results
Rules, predictive functions
Approach: Only make a sample of data available
Limits ability to learn good classifier
Platform for Privacy Preferences (P3P): What is it?
P3P is an emerging industry standard that enables web sites to express their privacy practices in a standard format
The format of the policies can be automatically retrieved and understood by user agents
It is a product of W3C; World wide web consortium
When a user enters a web site, the privacy policies of the web site is conveyed to the user; If the privacy policies are different from user preferences, the user is notified; User can then decide how to proceed
Several major corporations are working on P3P standards including
Platform for Privacy Preferences (P3P): Organizations
Several major corporations are working on P3P standards including:
Web sites have also implemented P3P
Semantic web group has adopted P3P
Platform for Privacy Preferences (P3P): Specifications
Initial version of P3P used RDF to specify policies; Recent version has migrated to XML
P3P Policies use XML with namespaces for encoding policies
P3P has its own statements and data types expressed in XML; P3P schemas utilize XML schemas
P3P specification released in January 20005 uses catalog shopping example to explain concepts; P3P is an International standard and is an ongoing project
Example: Catalog shopping
Your name will not be given to a third party but your purchases will be given to a third party
P3P and Legal Issues
P3P does not replace laws
P3P work together with the law
What happens if the web sites do no honor their P3P policies
Then appropriate legal actions will have to be taken
XML is the technology to specify P3P policies
Policy experts will have to specify the policies
Technologies will have to develop the specifications
Legal experts will have to take actions if the policies are violated
Does privacy work for Defense and Intelligence applications?
Is it meaningful to have privacy for surveillance and geospatial applications
Once the image of my house is on Google Earth, then how much privacy can I have?
I may want my location to be private, but does it make sense if a camera can capture a picture of me?
If there are sensors all over the place, is it meaningful to have privacy preserving surveillance?
This suggestion that we need application specific privacy
It is not meaningful to examine PPDM for every data mining algorithm and for every application
Data Mining and Privacy: Friends or Foes?
They are neither friends nor foes
Need advances in both data mining and privacy
Need to design flexible systems
For some applications one may have to focus entirely on “pure” data mining while for some others there may be a need for “privacy-preserving” data mining
Need flexible data mining techniques that can adapt to the changing environments
Technologists, legal specialists, social scientists, policy makers and privacy advocates MUST work together
Popular Social Networks
Face book - A social networking website. Initially the membership was restricted to students of Harvard University. It was originally based on what first-year students were given called the “face book” which was a way to get to know other students on campus. As of July 2007, there over 34 million active members worldwide. From September 2006 to September 2007 it increased its ranking from 60 to 6th most visited web site, and was the number one site for photos in the United States.
Twitter- A free social networking and micro-blogging service that allows users to send “updates” (text-based posts, up to 140 characters long) via SMS, instant messaging, email, to the Twitter website, or an application/ widget within a space of your choice, like MySpace, Facebook, a blog, an RSS Aggregator/reader.
My Space - A popular social networking website offering an interactive, user-submitted network of friends, personal profiles, blogs, groups, photos, music and videos internationally. According to AlexaInternet, MySpace is currently the world’s sixth most popular English-language website and the sixth most popular website in any language, and the third most popular website in the United States, though it has topped the chart on various weeks. As of September 7, 2007, there are over 200 million accounts.
Social Networks: More formal definition
A structural approach to understanding social interaction.
Networks consist of Actors and the Ties between them.
We represent social networks as graphs whose vertices are the actors and whose edges are the ties.
Edges are usually weighted to show the strength of the tie.
In the simplest networks, an Actor is an individual person.
A tie might be “is acquainted with”. Or it might represent the amount of email exchanged between persons A and B.
In a sociogram, the actors are represented as points in a two-dimensional space. The location of each actor is significant. E.g. a “central actor” is plotted in the center, and others are placed in concentric rings according to “distance” from this actor.
Actors are joined with lines representing ties, as in a social network. In other words a social network is a graph, and a sociogram is a particular 2D embedding of it.
These days, sociograms are rarely used (most examples on the web are not sociograms at all, but networks). But methods like MDS (Multi-Dimensional Scaling) can be used to lay out Actors, given a vector of attributes about them.
Social Networks were studied early by researchers in graph theory (Harary et al. 1950s). Some social network properties can be computed directly from the graph.
Others depend on an adjacency matrix representation (Actors index rows and columns of a matrix, matrix elements represent the tie strength between them).
Social Network Analysis of 9/11 Terrorists (www.orgnet.com)
Early in 2000, the CIA was informed of two terrorist suspects linked to al-Qaeda.
Nawaf Alhazmi and Khalid Almihdhar were photographed attending a meeting of
known terrorists in Malaysia. After the meeting they returned to Los Angeles,
What do you do with these suspects? Arrest or deport them immediately? No, we need to use them to discover more of the al-Qaeda network.
Once suspects have been discovered, we can use their daily activities to uncloak their network. Just like they used our technology against us, we can use their planning process against them. Watch them, and listen to their conversations to see...
who they call / email
who visits with them locally and in other cities
where their money comes from
The structure of their extended network begins to emerge as data is discovered via surveillance.
Social Network Analysis of 9/11 Terrorists
A suspect being monitored may have many contacts -- both accidental and intentional. We must always be wary of 'guilt by association'. Accidental contacts, like the mail delivery person, the grocery store clerk, and neighbor may not be viewed with investigative interest.
Intentional contacts are like the late afternoon visitor, whose car license plate is traced back to a rental company at the airport, where we discover he arrived from Toronto (got to notify the Canadians) and his name matches a cell phone number (with a Buffalo, NY area code) that our suspect calls regularly. This intentional contact is added to our map and we start tracking his interactions -- where do they lead? As data comes in, a picture of the terrorist organization slowly comes into focus.
How do investigators know whether they are on to something big? Often they don't. Yet in this case there was another strong clue that Alhazmi and Almihdhar were up to no good -- the attack on the USS Cole in October of 2000. One of the chief suspects in the Cole bombing [Khallad] was also present [along with Alhazmi and Almihdhar] at the terrorist meeting in Malaysia in January 2000.
Once we have their direct links, the next step is to find their indirect ties -- the 'connections of their connections'. Discovering the nodes and links within two steps of the suspects usually starts to reveal much about their network. Key individuals in the local network begin to stand out. In viewing the network map in Figure 2, most of us will focus on Mohammed Atta because we now know his history. The investigator uncloaking this network would not be aware of Atta's eventual importance. At this point he is just another node to be investigated.
Social Network Analysis of 9/11 Terrorists
Figure 2 shows the two suspects and
Social Network Analysis of 9/11 Terrorists
Social Network Analysis of 9/11 Terrorists
We now have enough data for two key conclusions:
All 19 hijackers were within 2 steps of the two original suspects uncovered in 2000!
Social network metrics reveal Mohammed Atta emerging as the local leader
With hindsight, we have now mapped enough of the 9-11 conspiracy to stop it. Again, the investigators are never sure they have uncovered enough information while they are in the process of uncloaking the covert organization. They also have to contend with superfluous data. This data was gathered after the event, so the investigators knew exactly what to look for. Before an event it is not so easy.
As the network structure emerges, a key dynamic that needs to be closely monitored is the activity within the network. Network activity spikes when a planned event approaches. Is there an increase of flow across known links? Are new links rapidly emerging between known nodes? Are money flows suddenly going in the opposite direction? When activity reaches a certain pattern and threshold, it is time to stop monitoring the network, and time to start removing nodes.
The author argues that this bottom-up approach of uncloaking a network is more effective than a top down search for the terrorist needle in the public haystack -- and it is less invasive of the general population, resulting in far fewer "false positives".
Social Network Analysis of Steroid Usage in Baseball (www.orgnet.com)
Figure 2 shows the two suspects and
When the Mitchell Report on steroid use in Major League Baseball [MLB], was published, people were surprised at who and how many players were mentioned. The diagram below shows a human network created from data found in the Mitchell Report. Baseball players are shown as green nodes. Those who were found to be providers of steroids and other illegal performance enhancing substances appear as red nodes. The links reveal the flow of chemicals -- from provider to player.
Knowledge Sharing in Organizations: Finding Experts
Organizational leaders are preparing for the potential loss of expertise and knowledge flow due to turnover, downsizing, outsourcing, and the coming retirements of the baby boom generation. The model network (previous chart) is used to illustrate the knowledge continuity analysis process.
Each node in this sample network (previous chart) represents a person that works in a knowledge domain. Some people have more / different knowledge than others. Employees who will retire in 2 years or less have their nodes colored red. Those who will retire in 3-4 years are colored yellow. Those retiring in 5 years or later are colored green.
A gray, directed line is drawn from the seeker of knowledge to the source of expertise. A-->B indicates that A seeks expertise / advice from B. Those with many arrows pointing to them are sought often for assistance.
The top subject matter experts -- SMEs -- in this group are nodes 29, 46, 100, 41, 36 and 55.
The SMEs were discovered using a network metric in InFlow that is similar to how the
Google search engine ranks web pages -- using both direct and indirect links.
Of the top six SMEs in this group, half are colored red or yellow[46, 55]. The loss of person 46 has the greatest potential for knowledge loss. 90% of the network is within
3 steps of accessing this key knowledge source.
Social Networks: Security and Privacy Issues: European Network and Information Security Agency
The European Network and Information Security Agency (ENISA) has released its first issue paper “Security Issues and Recomendations for Online Social Networks".
Four groups of threats: privacy related threats, variants of traditional network and information security threats, identity related threats, social threats.
Recommendations are given for governments (oversight and adaption of existing data protection legislation), companies that run such networks, technology developers, and research and standardisation bodies.
Some concenrs: recommnendation to use automated filters against "offensive, litigious or illegal content". This brings potential freedom of speech issues. European Digital Rights has started a campaign against a similar recommendation by the Council of Europe. Issue of portability of profiles social graphs are also addressed. However what is missing is that “Information about social links is not about only one user, but also the others which he is linked to. They have to agree if this information is moved to different platforms”.
Social Networks: Security and Privacy Issues: Microsoft Recommendations http://www.microsoft.com/protect/yourself/personal/communities.mspx
Online communities require you to provide personal information. Profiles are public. Comments you post are permanently recorded on the community site.You might even mention when you plan to be out of town.
E-mail and phishing scammers count on the appealing sense of trust that is often fostered in online communities to steal your personal information. The more you reveal in profiles and posts, the more vulnerable you are to scams, spam, and identity theft.
Here are some features to look for when you're considering joining an online community:
•Privacy policies that explain exactly what information the service will collect and how it might be used.• User guidelines that outline a basic code of conduct for users on their sites. Sites have the option to penalize reported violators with account suspension or termination.•Special provisions for children and their parents, such as family-friendly options geared towards protecting children under a certain age.•Password protection to help keep your account secure..•E-mail address hiding, which lets you display only part of your e-mail address on the site's membership lists. Filtering options: Offered on blogging sites, these tools let you to choose which subscribers can see what you've written.
FOAF (an acronym of Friend of a Friend) is a machine-readable ontology describing persons, their activities and their relations to other people and objects. Anyone can use FOAF to describe him or herself. FOAF allows groups of people to describe social networks without the need for a centralised database.
FOAF's descriptive vocabulary is expressed using RDF Resource Description Framework and OWL Web Ontology Language.
Computers may use these FOAF profiles to find, for example, all people living in Europe, or to list all people both you and a friend of you know. This is accomplished by defining relationships between people. Each profile has a unique identifier (such as the person's e-mail addresses, a URI of the homepage or weblog of the person), which is used when defining these relationships.
The FOAF project, which defines and extends the vocabulary of a FOAF profile, was started in 2000 by and . It can be considered the first Social Semantic Web application, in that it combines RDF technology with 'Social Web' concerns.
Tim Berners-Lee in a recent essay redefined the Semantic web concept into something he calls the Giant Global Graph, where relationships transcend networks/documents. He considers the GGG to be on equal grounds with Internet and World Wide Web, stating that "I express my network in a FOAF file, and that is a start of the revolution."
The following FOAF profile (written in XML format) states that Jimmy Wales is the name of the person described here. His e-mail address, homepage and depiction are resources, which means that each of them can be described using RDF as well. He has Wikipedia as an interest, and knows Angela Beesley (which is the name of a 'Person' resource).