Projects

Projects

From June 26th till July 8th I will be at two conferences: Sunbelt 2018 https://sunbelt.sites.uu.nl/ and NetGlow 2018 http://ngw.spbu.ru/ . In this period I will not have time to react to your e-mails.

For each project write a report (in Word, LaTeX, or some other text formatting program) containing at least: project task, main results, your comments/interpretation of results and as appendices description of steps and decisions in your analysis. In the report include also pictures. ZIP the report and supporting files.

Read carefully the requirements of each project.

Project 1

Select a set S = { S₁, S₂, …, S_n } of sequences of events from the set E. n > 200, the average length of sequences > 5. The size of set E has to be at least 10.

For example: S are the sentences from some book (project Gutenberg), events are letters in the sentences - E is the alphabet extended with punctuation marks and space.

Instead of sentences from a book you can consider as sequences names (first and last together) of a group of people. For example question and replies on some mailing list such as SOCNET. (Skip the questions with less than 4 replies; reduce the size of network by shrinking less frequent names to a single node OTHERS.)

Another source of events can be KEDS / WEIS / CAMEO datasets about international events (B, I, dic). To each active pair of actors we assign a sequence that consists of codes of actions between them.

You can use also some other set of sequences of events.

Construct a transition matrix of a Markov chain summarizing selected set S. Consider two options: you start counting with

all entries set to 0
all entries set to 1

Define a value v(S) of a sequence S = e₁ e₂ … e_k as a geometric mean of probabilities of the corresponding transitions

v(S) = ( p(e₁,e₂).p(e₂,e₃).p(e₃,e₄)…p(e_k-1,e_k) )^1/(k-1)

Compute the distribution of values v(S) for all sequences from S. Compare it with (a distribution of) values of some other sequences - for example another book in some other language.

Project 2

Select an undirected labeled network with at least 50 nodes and average degree at least 4. Determine the graphlet spectra for its nodes. On their basis construct a dissimilarity between nodes and use it to cluster the nodes.

Project 3

Make a CUG test on a network of your choice.