Term paper and Master thesis ideas

Term paper and Master thesis ideas

US counties

It is a very rich dataset available at

http://www.census.gov/support/USACdataDownloads.html

We used it in

http://vladowiki.fmf.uni-lj.si/lib/exe/fetch.php?media=pajek:slides:9nationssunbelt32.pdf

and an improved version in p. 369-381

https://www.wiley.com/en-us/Understanding+Large+Temporal+Networks+and+Spatial+Networks%3A+Exploration%2C+Pattern+Searching%2C+Visualization+and+Network+Evolution-p-9780470714522

Unfortunately, it seems that they limited the access to https://data.census.gov/ for users outside the US ??? - I get the message:

Access Denied
You don't have permission to access "http://www.census.gov/data" on this server.

Subsets of variables can be obtained from other sources. For example:

Try the above links and some other (Google “US counties data”). If you can get some data we can proceed to details.

Analysis of flight data

For this topic, Софья Кошовец s.koshovets@gmail.com already expressed her interest, but she didn't inform me about her decision. Here is my answer to her.

The idea comes from the Viszards session on Sunbelt XXXI (2011)

http://vladowiki.fmf.uni-lj.si/lib/exe/fetch.php?media=pajek:slides:visz31.pdf

The original data set was borrowed from the DataExpo 2009 flights contest:

http://stat-computing.org/dataexpo/2009/

but you can also collect (download) recent data from

https://www.transtats.bts.gov/OT_Delay/OT_DelayCause1.asp

Keywords descriptions from the complete text

Can the keywords for a paper be determined with a program?

In the term paper, you can make an overview of existing approaches to this problem and search for available programs (and start collecting test corpus of papers). The keywords can be “free” or selected from a given list. Present results of the application of some programs on some example papers.

In the master thesis, you can make an evaluation of available programs comparing keywords suggested by programs with keywords selected by authors. You can also try to develop your own procedure - for list-based keywords you can “learn” from existing papers.

Stemming and lemmatization (in named-entity recognition)

They are important in named-entity recognition.

In the term paper, you can make an overview of existing approaches to this problem and search for available programs (special attention to the Russian language).

In the master thesis, you can develop a program for XML tagging of plain text based on a dictionary of “interesting” terms and lemmatization and illustrate its use with application to selected data. I collected some links at

http://vladowiki.fmf.uni-lj.si/doku.php?id=notes:text:lem&#named-entity_recognition