# Chapter13. Probabilistic Contagion and Models of influnce

## Probabilistic Contagion and Models of Influence

### Epidemics vs Cascade Spreading

* 결정을 기반으로한 모델 노드들은 전략을 채택해서 드는 비용에 기반하여 결정을 내립니다.&#x20;
* In epidemic spreading(전염병 확산 문제)
  * Lack of decision making
  * Process of contagion is complex and unobservable(복잡하고 관찰하기 힘듦)
    * In some cases it involves (or can be modeled as) randomness

### Example with k=3

* 감염 확률이 커지면 전염병은 커지고, 낮은 감염 확률을 가지면 질병은 사라집니다.&#x20;

![](/files/-M7MJe4indi9IyYmIDHY)

### Probabilistic Spreading Models

* Epidemic Model based on Random Trees
  * a variant of branching processes
  * A patient meets d new people
  * With probability q > 0 she infects each of them
* Q : For which values of d and q does the epidemic run forever?

![](/files/-M7MK_Hyzhu-ufZU9xmQ)

![](/files/-M7MKewhRHL6OrHR9emc)

### Probabilistic Spreading Models

* q와 d로 depth가 커질 때 확률값이 얼마인지 계산할 필요가 있습니다. 우리는 이값을 인접한 부모, 자식 노드와의 관계를 통해 iterative하게 정의할 수 있습니다.&#x20;

![](/files/-M7MLWKlbpo4Kyr9LRPC)

### Fixed Point : f(x) = 1 - (1-qx)^d

![](/files/-M7MLjI9vVdqt-OMPbZP)

## Probabilistic Contagion and Models of Influence

### Epidemics vs Cascade Spreading

* 결정을 기반으로한 모델 노드들은 전략을 채택해서 드는 비용에 기반하여 결정을 내립니다.&#x20;
* In epidemic spreading&#x20;
  * Lack of decision making
  * Process of contagion is complex and unobservable
    * In some cases it involves (or can be modeled as) randomness

### Example with k=3

![](/files/-M7MJe4indi9IyYmIDHY)

### Probabilistic Spreading Models

* Epidemic Model based on Random Trees
  * a variant of branching processes
  * A patient meets d new people
  * With probability q > 0 she infects each of them
* Q : For which values of d and q does the epidemic run forever?

![](/files/-M7MK_Hyzhu-ufZU9xmQ)

![](/files/-M7MKewhRHL6OrHR9emc)

### Probabilistic Spreading Models

![](/files/-M7MLWKlbpo4Kyr9LRPC)

### Fixed Point : f(x) = 1 - (1-qx)^d

![](/files/-M7MLjI9vVdqt-OMPbZP)

If we want to epidemic to die out, then iterating f(x) must go to zero. So, f(x) must be below y=x.

* What's the shape of f(x)

what do we know about the shape of f(x)?

If we want to epidemic to die out, then iterating f(x) must go to zero. So, f(x) must be below y=x.

![](/files/-M7MNF_jXQc7CIwy7oc4)

* What's the shape of f(x)

### Fixed Point: When is the zero?

what do we know about the shape of f(x)?

![](/files/-M7MNNzNHCndO327r4s-)

![](/files/-M7MNF_jXQc7CIwy7oc4)

### Important Points

### Fixed Point: When is the zero?

* Reproductive number R0 = q\*d:
  * It determines if the disease will spread or die out.
* There is an epidemic if R0>= 1
* Only R0 matters:
  * R0 >= 1 : epidemic never dies and the number of infected people increases exponentially
  * R0 < 1 : Epidemic dies out exponentially quickly

![](/files/-M7MNNzNHCndO327r4s-)

### Measures to Limit the Spreading

### Important Points

* When R0 is close 1, slightly changing q or d can result in epdemics dying out or happening
  * Quaratining people / nodes \[reducing d]
  * Encouraging better sanitary practices reduces germs spreading \[reducing q]
  * HIV has an R0 between 2 and 5
  * Measles has an R0 between 12 and 18
  * Ebola has an R0 between 1.5 and 2
* Reproductive number R0 = q\*d:
  * It determines if the disease will spread or die out.
* There is an epidemic if R0>= 1
* Only R0 matters:
  * R0 >= 1 : epidemic never dies and the number of infected people increases exponentially
  * R0 < 1 : Epidemic dies out exponentially quickly

## Application : Social cascades on Flickr and estimating R0 from real data

### Measures to Limit the Spreading

### Dataset

* When R0 is close 1, slightly changing q or d can result in epdemics dying out or happening
  * Quaratining people / nodes \[reducing d]
  * Encouraging better sanitary practices reduces germs spreading \[reducing q]
  * HIV has an R0 between 2 and 5
  * Measles has an R0 between 12 and 18
  * Ebola has an R0 between 1.5 and 2
* Flickr social network
  * Users and connected to other users via friend links
  * A user can like/favorite a photo
* Data:
  * 100 days of photo likes
  * Number of users : 2 million
  * 34,734,221 likes on 11, 267, 320 photos

## Application : Social cascades on Flickr and estimating R0 from real data

### Cascades on Flickr

### Dataset

* Users can be exposed to a photo via social influence (cascade) or external links
* Did a particular like spread through social links
  * No, if a user likes a photo and if none of his friends have previously liked the photo
  * Yes, if a users likes a photo after at least one of her friends liked the photo-> Social cascade
* Example social cascade: A->B and A->C->E
* Flickr social network
  * Users and connected to other users via friend links
  * A user can like/favorite a photo
* Data:
  * 100 days of photo likes
  * Number of users : 2 million
  * 34,734,221 likes on 11, 267, 320 photos

![](/files/-M7MUe-WHzSUL7x5omR-)

### Cascades on Flickr

### How to estimate R0 from real data?

* Users can be exposed to a photo via social influence (cascade) or external links
* Did a particular like spread through social links
  * No, if a user likes a photo and if none of his friends have previously liked the photo
  * Yes, if a users likes a photo after at least one of her friends liked the photo-> Social cascade
* Example social cascade: A->B and A->C->E

![](/files/-M7MUo5FVTTyxGl-Y5oG)

![](/files/-M7MUe-WHzSUL7x5omR-)

### R0 correlation across all photos

### How to estimate R0 from real data?

* Data from top 1, 000 photo cascades
* Each + is one cascade

![](/files/-M7MUo5FVTTyxGl-Y5oG)

![](/files/-M7MV2bXohgB32UiDGNq)

### R0 correlation across all photos

### Discussion

* Data from top 1, 000 photo cascades
* Each + is one cascade
* The basic reproduction number of popular photos is between 1 and 190
* This is much higher than very infectious diseases like measles, indicating that social networks are efficient transmission media and online content can be very infectious.

![](/files/-M7MV2bXohgB32UiDGNq)

## Epidemic models

### Discussion

### Spreading Models of Viruses

* The basic reproduction number of popular photos is between 1 and 190
* This is much higher than very infectious diseases like measles, indicating that social networks are efficient transmission media and online content can be very infectious.

Virus Propagation : 2 Parameters:

## Epidemic models

* Virus Birth rate
  * probability that an infected neighbor attacks
* Virus Death rate
  * Probability that an infected node heals

### Spreading Models of Viruses

![](/files/-M7MXWr5fYgPlD3ryrTV)

Virus Propagation : 2 Parameters:

### More Generally : S+E+I+R Models

* Virus Birth rate : beta
  * probability that an infected neighbor attacks
* Virus Death rate : delta
  * Probability that an infected node heals
* General scheme for epidemic models
  * Each node can go through phases
    * Transition probs. are governed by the model parameters

![](/files/-M7MXWr5fYgPlD3ryrTV)

![](/files/-M7MXopz-fdyO0h09Xoo)

### More Generally : S+E+I+R Models

### SIR Model

* General scheme for epidemic models
  * Each node can go through phases
    * Transition probs. are governed by the model parameters
* Subceptible : 병에 걸리기 쉬운, Expose: 노출, Infection, Recover, Immune
* SIR model : Node goes through phases
  * Models chickenpox or plague:
    * Once you heal, you can never get infected again

![](/files/-M7MXopz-fdyO0h09Xoo)

![](/files/-M7MYE4nvfG5mXaig55-)

### SIR Model

* Assuming perfect mixing (The network is a complete graph) the model dynamics are:
* SIR model : Node goes through phases
  * Models chickenpox(수두) or plague:
    * Once you heal, you can never get infected again

&#x20;

![](/files/-M7MYE4nvfG5mXaig55-)

![](/files/-M7MY_QUCBkp4Fg5SCeR)

* Assuming perfect mixing (The network is a complete graph) the model dynamics are:

![](/files/-M7MYeemJXBbQvO6Efbr)

&#x20;

### SIS Model

![](/files/-M7MY_QUCBkp4Fg5SCeR)

* Susceptible-Infective-Susceptible (SIS) model
* Cured nodes immediately become susceptible
* Virus "strength" : s = b/r
* Node state transition diagram:

![](/files/-M7MYeemJXBbQvO6Efbr)

![](/files/-M7MZ3JGkBIDyRD27eyi)

### SIS Model

### SIS Model

* Susceptible-Infective-Susceptible (SIS) model
* Cured nodes immediately become susceptible
* Virus "strength" : s = beta/delta
* Node state transition diagram:
* Models flu:
  * Susceptible node becomes infected
  * The ndoe then heals and become susceptible again
* Assuming perfect mixing (a complete graph):

![](/files/-M7MZ3JGkBIDyRD27eyi)

![](/files/-M7MZjGTvp3Q3V8LebF_)

* Models flu:
  * Susceptible node becomes infected
  * The ndoe then heals and become susceptible again
* Assuming perfect mixing (a complete graph):

![](/files/-M7MZASve-4WO_vqHm1B)

![](/files/-M7MZjGTvp3Q3V8LebF_)

### Question : Epidemic threshold

![](/files/-M7MZASve-4WO_vqHm1B)

![](/files/-M7M_2aILQnVXKIpWxdG)

### Question : Epidemic threshold Tau

### Epidemic Threshold in SIS Model

![](/files/-M7M_2aILQnVXKIpWxdG)

![](/files/-M7M_9rzMvpyvw9wBeGP)

### Epidemic Threshold in SIS Model

### Experiments (AS graph)

![](/files/-M7M_9rzMvpyvw9wBeGP)

![](/files/-M7M_LVaTldQNwN9mESv)

### Experiments (AS graph)

### Experiments

![](/files/-M7M_LVaTldQNwN9mESv)

* Does it matter how many people are initially infected?

### Experiments

![](/files/-M7M_VoYtNWWuLtRBvqP)

* Does it matter how many people are initially infected?

### Modelling Ebola with SEIR

![](/files/-M7M_VoYtNWWuLtRBvqP)

![](/files/-M7M_c5LygD6yx7_j0TW)

### Modelling Ebola with SEIR

### Example : Ebola

![](/files/-M7M_c5LygD6yx7_j0TW)

![](/files/-M7M_jQt4zlGQTsPTr5l)

### Example : Ebola

### Example : Ebola, R0 = 1.5-2.0

![](/files/-M7M_jQt4zlGQTsPTr5l)

![](/files/-M7Ma7zXyrjF_i4e8pUz)

### Example : Ebola, R0 = 1.5-2.0

## Application : Rumor spread modeling using SEIZ model

![](/files/-M7Ma7zXyrjF_i4e8pUz)

### SEIZ model : Extension of SIS model

## Application : Rumor spread modeling using SEIZ model

![](/files/-M7MaR0SjvLm2t5gd-Aa)

### SEIZ model : Extension of SIS model

### Recap: SIS model

![](/files/-M7MaR0SjvLm2t5gd-Aa)

![](/files/-M7MacKdZ0wBKAbwGNI9)

### Recap: SIS model

### Details of the SEIZ model

![](/files/-M7MacKdZ0wBKAbwGNI9)

![](/files/-M7MarFSPuMWYNtogmJM)

### Details of the SEIZ model

### Dataset

![](/files/-M7MarFSPuMWYNtogmJM)

![](/files/-M7MaztvVHF9bVK5Psnl)

### Dataset

### Method : Fitting SEIZ model to data

![](/files/-M7MaztvVHF9bVK5Psnl)

* SEIZ model is fit to each cascade to minimize the difference |I(t) - tweets(t)|:
  * tweets(t) = number of rumor tweets
  * I(t) = the estimated number of rumor tweets by the model
* Use grid-search and find the parameters with minimum error

### Method : Fitting SEIZ model to data

![](/files/-M7MbOkCU_34aO-B_VyB)

* SEIZ model is fit to each cascade to minimize the difference |I(t) - tweets(t)|:
  * tweets(t) = number of rumor tweets
  * I(t) = the estimated number of rumor tweets by the model
* Use grid-search and find the parameters with minimum error

### Fitting to "Boston Marathon Bombing"

![](/files/-M7MbOkCU_34aO-B_VyB)

![](/files/-M7MbYjaDO16iRf22Iaz)

### Fitting to "Boston Marathon Bombing"

### Fitting to "Pope resignation" data

![](/files/-M7MbYjaDO16iRf22Iaz)

![](/files/-M7MbhR0zV6yCtqEQsNG)

### Fitting to "Pope resignation" data

### Rumor detection with SEIZ model

![](/files/-M7MbhR0zV6yCtqEQsNG)

![](/files/-M7Mc1kigCd7k42f_WOd)

### Rumor detection with SEIZ model

### Rumor detection by Rsi

![](/files/-M7Mc1kigCd7k42f_WOd)

![](/files/-M7Mc9R-UArf6uL4x3E7)

### Rumor detection by Rsi

## Independent Cascade Model

![](/files/-M7Mc9R-UArf6uL4x3E7)

* Initially some nodes S are active
* Each edge(u, v) has probability(weight) puv&#x20;

## Independent Cascade Model

![](/files/-M7McVo8JB2KBRlCtfyq)

* Initially some nodes S are active
* Each edge(u, v) has probability(weight) puv&#x20;
* When node u becomes active/ infected
  * It activates each out-neighbor v with prob. puv
* Activations spread through the network!
* Independent cascade model is simple but requires many parameters!
  * Estimating them from data is very hard
* Solution : Make all edges have the same (which brings us back to the SIR model)
  * Simple, but too simple
* Can we do something better?

![](/files/-M7McVo8JB2KBRlCtfyq)

### Exposures and Adoptions

* When node u becomes active/ infected
  * It activates each out-neighbor v with prob. puv
* Activations spread through the network!
* Independent cascade model is simple but requires many parameters!
  * Estimating them from data is very hard
* Solution : Make all edges have the same (which brings us back to the SIR model)
  * Simple, but too simple
* Can we do something better?
* From exposures to adoptions
  * Exposure : Node's neighbor exposes the node to the contagion
  * Adoption : The node acts on the contagion

### Exposures and Adoptions

![](/files/-M7MdGeTGqYZ0ktN_AWJ)

* From exposures to adoptions
  * Exposure : Node's neighbor exposes the node to the contagion
  * Adoption : The node acts on the contagion

### Exposure Curves

![](/files/-M7MdGeTGqYZ0ktN_AWJ)

* Exposure curve:
  * Probability of adopting new behavior depends on the total number of friends who have already adopted
* What's the dependence?

### Exposure Curves

![](/files/-M7MeSyZGbEeJj2aaKpn)

* Exposure curve:
  * Probability of adopting new behavior depends on the total number of friends who have already adopted
* What's the dependence?

![](/files/-M7MeP9k0Sk02Ihs7J_l)

![](/files/-M7MeSyZGbEeJj2aaKpn)

* From exposures to adoptions
  * Exposure : Node's neighbor exposes the node to information
  * Adoption : The node acts on the information
* Examples of different adoption curves:

![](/files/-M7MeP9k0Sk02Ihs7J_l)

![](/files/-M7MelNRojW_82gqqQld)

* From exposures to adoptions
  * Exposure : Node's neighbor exposes the node to information
  * Adoption : The node acts on the information
* Examples of different adoption curves:

### Diffusion in Viral Marketing

![](/files/-M7MelNRojW_82gqqQld)

* Senders and followers of recommendations receive discounts on products

### Diffusion in Viral Marketing

![](/files/-M7Mf6weY_9vwlmgxXZ3)

* Senders and followers of recommendations receive discounts on products
* Data: Incentivized Viral Marketing program
  * 16 million recommendations
  * 4 million people, 500k products

![](/files/-M7Mf6weY_9vwlmgxXZ3)

### Exposure Curve : Validation

* Data: Incentivized Viral Marketing program
  * 16 million recommendations
  * 4 million people, 500k products

![](/files/-M7MfMOy01BY51lnZAAf)

### Exposure Curve : Validation

### Exposure Curve: LiveJournal

![](/files/-M7MfMOy01BY51lnZAAf)

* Group memberships spread over the network:
  * Red circles represent existing group members
  * Yellow squares may join
* Question:
  * How does prob. of joining a group depend on the number of friends already in the group?

### Exposure Curve: LiveJournal

![](/files/-M7MfvoGI1fTZh_-Mc-q)

* Group memberships spread over the network:
  * Red circles represent existing group members
  * Yellow squares may join
* Question:
  * How does prob. of joining a group depend on the number of friends already in the group?

### Exposure Curve : Live Journal

![](/files/-M7MfvoGI1fTZh_-Mc-q)

* LiveJournal group membership

### Exposure Curve : Live Journal

![](/files/-M7Mg6BsEbbibLLtv3xa)

* LiveJournal group membership

### Exposure Curve : Information

![](/files/-M7Mg6BsEbbibLLtv3xa)

* Twitter&#x20;
  * Aug 09 to Jan 10, 3B tweets, 60M users

### Exposure Curve : Information

![](/files/-M7MhaM4_HrdBiFjNTMP)

* Twitter&#x20;
  * Aug 09 to Jan 10, 3B tweets, 60M users
* Avg. exposure curve for the top 500 hashtags
* What are the most important aspects of the shape of exposure curves?
* Curve reaches peak fast, decreases after!

![](/files/-M7MhaM4_HrdBiFjNTMP)

### Modeling the Shape of the Curve

* Avg. exposure curve for the top 500 hashtags
* What are the most important aspects of the shape of exposure curves?
* Curve reaches peak fast, decreases after!
* Persistence of P is the ratio of the area under the curve P and the area of the rectangle of height max(P), width max(D(p))
  * D(P) is the domain of P
  * Persistence measures the decay of exposure curves
* Stickiness P is max(P)
  * Stickness is the probability of usage at the most effective exposure

### Modeling the Shape of the Curve

![](/files/-M7MiwD34YZN5weMv9bo)

* Persistence of P is the ratio of the area under the curve P and the area of the rectangle of height max(P), width max(D(p))
  * D(P) is the domain of P
  * Persistence measures the decay of exposure curves
* Stickiness P is max(P)
  * Stickness is the probability of usage at the most effective exposure

![](/files/-M7Mj3BFEY1C6kwTnzmd)

![](/files/-M7MiwD34YZN5weMv9bo)

### Exposure Curve : Persistence

![](/files/-M7Mj3BFEY1C6kwTnzmd)

* Manually identify 8 broad categories with at least 20 HTs in each

### Exposure Curve : Persistence

![](/files/-M7MjJnqJePtPh43_kCf)

* Manually identify 8 broad categories with at least 20 HTs in each

![](/files/-M7MjPsBeNCWSy4OBk6k)

![](/files/-M7MjJnqJePtPh43_kCf)

### Exposure Curve : Stickness

![](/files/-M7MjPsBeNCWSy4OBk6k)

![](/files/-M7MjY23ZIRM3sjpJX8x)

### Exposure Curve : Stickness

* Technology and Movies have lower stickness than that of a random subset of hashtags
* Music has higher stickness than that of a random subset of hashtags(of the some size)

![](/files/-M7MjY23ZIRM3sjpJX8x)

* Technology and Movies have lower stickness than that of a random subset of hashtags
* Music has higher stickness than that of a random subset of hashtags(of the some size)


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://tobigs.gitbook.io/tobigs-graph-study/chapter13..md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
