1.Informatik 5 (DBIS), RWTH Aachen UniversityTeLLNetGALAThe MediaBaseRalf Klamma Webinar December 16, 2010Lehrstuhl Informatik 5(Informationssysteme) Prof. Dr. M. JarkeI5-KL-111010-12. TeLLNetGALA The Overall ApproachLehrstuhl Informatik 5(Informationssysteme) Prof. Dr. M. JarkeI5-KL-111010-23. What is unique about the MediaBase? Interdisciplinary multidimensional model of digital networks – Social network analysis (SNA) is defining measures for socialTeLLNetCommunity relationsGALA – Actor network theory (ANT) is connecting human and media agents – I* framework is defining strategic goals and dependencies – Theory of media transcriptions is studying cross-media knowledge social software Media Networks network of artifactsWiki, Blog, Podcast, IM, Chat, Microcontent, Blog entry, Message, Burst, Thread,Email, Newsgroup, Chat …Comment, Conversation, Feedback (Rating)i*-Dependencies (Structural, Cross-media)network of membersLehrstuhl Informatik 5Members (Social Network Analysis: Centrality,(Informationssysteme) Prof. Dr. M. Jarke Efficiency) Communities of practiceI5-KL-111010-34. Modeling DependenciesUsing the i* FrameworkCoordination IterantCoordinator BrokerTeLLNetGALAisA isA isA MemberGatekeeper ArtifactisAURLHub Legend:AgentGoal CommunicationNetworkResourceLehrstuhl Informatik 5Task(Informationssysteme) Prof. Dr. M. Jarke Eric S. K. Yu, Towards Modeling and Reasoning Support for Early-Phase Requirements Engineering, RE 1997I5-KL-111010-45. What can you do with the Mediabase Community Interface for (Firefox Plugin) – Adding media for crawling, searching & viewingTeLLNetGALA – Observing social networks over time – Retrieving structural patterns of media – Applying Web 2.0 operations (tagging, etc.) on media Writing your own crawlers Applying all kind of social network measures – Centrality measures – Finding influential & powerful persons – Network statistics – Understand networks at large Advanced queries in RDF Store on concepts and relations – Who is the owner of company x?Lehrstuhl Informatik 5(Informationssysteme) Prof. Dr. M. Jarke– Structured input for conceptual mapping toolsI5-KL-111010-56. What is the MediaBase? Collection of Social Software artifacts:TeLLNet Mailing lists (>200 k) WikipediasGALA Blogs (>300 k) RSS Feeds Websites Forums Newsletters … The MediaBase • IBM DB 2 data store • 24/7 Perl crawlers for media artifacts • Community oriented Commander Interface • Social network analysis & visualization toolsLehrstuhl Informatik 5 • PALADIN: A pattern language for automatic behavior detection Automatic extraction of concepts and relations in RDF(Informationssysteme) Prof. Dr. M. JarkeI5-KL-111010-6 •7. TeLLNetGALA The Data ModelLehrstuhl Informatik 5(Informationssysteme) Prof. Dr. M. JarkeI5-KL-111010-78. MediaBase ModelTeLLNet A Mediabase is a six-tuple graphGALA M = (A, R, µ , ν , η , L) R ⊆ A×Aµ :A → L ν :R → L η : R → {0, 1}Lehrstuhl Informatik 5(Informationssysteme) Prof. Dr. M. JarkeI5-KL-111010-89. Simplified Meta ModelAttribute has ActorTeLLNetGALAisAMediumArtifact Process Agent CommunityisAstores createsis affected by belongs go representsconsumes performsranksLehrstuhl Informatik 5 Browse AddressTranscribe…Localize(Informationssysteme) Prof. Dr. M. Jarke Latour: On Recalling ANT, 1999I5-KL-111010-910. Actors in the MediabaseTeLLNetA ⊆ {Medium, Artefact, Process, Agent, Network}GALAMailing lists, Newsletter, Newsgroup, Feed, Medium ⊆ Web - site, Blog, Podcast, Chat room, Wiki, Forum, Social bookmarking site, Folksonomy Message, E - mail, Index, Comment, RSS Entry, Transaction, Host, Feedback, Conversation, Burst, Blog entry, Thread, Artifact ⊆ Executions, Tag, Trackback, Review, URL, Rating, Multimedia,Rankíng, Reference Acquisition, Search, Monitoring, Process ⊆ Retrieval, Transcription, Addressing Administrator, Member, Lurker, Reviewer, Dead, Answering person,Lehrstuhl Informatik 5 Agent ⊆ (Informationssysteme)Questioner, Troll, Spammer, Conversationalist, Expert Prof. Dr. M. Jarke I5-KL-111010-1011. Medium – Artifact CompatibilityMailing Transaction-ChatEmail Blog WikiURL ForumTeLLNetListbased WebsiteRoomGALA Message ++ ----- + Thread - + --+-- +Burst++ +++-- + Conversation - - ---+- +Blog Entry- - +---- -Comment - - +++-- +Web Page- - --+-+ - Transaction- - -+--- -Feedback- - -+--- +Lehrstuhl Informatik 5(Informationssysteme) Prof. Dr. M. Jarke I5-KL-111010-1112. TeLLNetGALA The CrawlersLehrstuhl Informatik 5(Informationssysteme) Prof. Dr. M. Jarke I5-KL-111010-1213. Crawling Technologies Mix of dumps (Wikis) and special purpose crawlers:TeLLNetGALA W = Media ∪ Artifact I = Media ∪ Artifact ∪ Process ∪ Agent G = Media ∪ Artifact ∪ Process ∪ Agent ∪ Network MW = Mailing list ∪ Message ∪ Thread ∪ IndexLehrstuhl Informatik 5(Informationssysteme)BW = Blog ∪ Blogroll ∪ Blogentry ∪ Comment ∪ Index Prof. Dr. M. Jarke I5-KL-111010-1314. Crawler OverviewTeLLNetGALALehrstuhl Informatik 5(Informationssysteme) Prof. Dr. M. Jarke I5-KL-111010-1415. Website CrawlerTeLLNetGALALehrstuhl Informatik 5(Informationssysteme) Prof. Dr. M. Jarke I5-KL-111010-1516. Feed CrawlerTeLLNetGALALehrstuhl Informatik 5(Informationssysteme) Prof. Dr. M. Jarke I5-KL-111010-1617. Mailinglist CrawlerTeLLNetGALALehrstuhl Informatik 5(Informationssysteme) Prof. Dr. M. Jarke I5-KL-111010-1718. News CrawlerTeLLNetGALALehrstuhl Informatik 5(Informationssysteme) Prof. Dr. M. Jarke I5-KL-111010-1819. Podcast CrawlerTeLLNetGALALehrstuhl Informatik 5(Informationssysteme) Prof. Dr. M. Jarke I5-KL-111010-1920. TeLLNetGALA The MediaBase CommanderLehrstuhl Informatik 5(Informationssysteme) Prof. Dr. M. Jarke I5-KL-111010-2021. Media Base Web 2.0 Commander Personalization (user annotates resources with tags and has his page) Community-awareness (resources and annotation of others are open)TeLLNet User-friendly interface (Firefox plug-in, easy insertion of resources, tags, tracking ofGALA recent changes)Lehrstuhl Informatik 5(Informationssysteme) Prof. Dr. M. Jarke I5-KL-111010-2122. Application Programmer Interfaces Under DevelopmentTeLLNet – GraphService – Visualization and PALADINGALA– http://dbis.rwth-aachen.de/~atlas/module_build/JavaDoc//atlas_las_services_graph-service/HEAD/javadoc/index.html – TargETLy Service – RDF Data Generator– http://dbis.rwth-aachen.de/~atlas/module_build/JavaDoc/atlas_theses_da_krenge_TargETLy2/HEAD/javadoc/index.htmlLehrstuhl Informatik 5(Informationssysteme) Prof. Dr. M. Jarke I5-KL-111010-2223. GraphService AbstractDigitalNetwork – Representation ofTeLLNetMetaModelGALA Classes for Networks – Blogs, Mailinglists, etc. Classes for Basic SNA Classes for Pattern Analysis Classes for GraphLayoutLehrstuhl Informatik 5(Informationssysteme) Prof. Dr. M. Jarke I5-KL-111010-2324. TargETLy Service Connection to RDF Store OpenCalais Service – RDF GeneratorTeLLNetGALA Pattern Analysis IntentAnalysis Collection of predefined RDF Queries – e.g. companyCompetitor, companyEmployeeNumber – e.g. patentFiling, patentIssuance – e.g. personEmailAddress, creditRatingLehrstuhl Informatik 5(Informationssysteme) Prof. Dr. M. Jarke I5-KL-111010-2425. TeLLNetGALA PALADIN – Pattern AnalysisLehrstuhl Informatik 5(Informationssysteme) Prof. Dr. M. Jarke I5-KL-111010-2526. PALADIN: Disturbances in Cross-media Social Networks What is a disturbance?TeLLNet– Sensing an incompatibilityGALA between theories exposed and theories-in-use Disturbances are starting points of learning processes – Disturbances disturb, prevent … but they are creating reflection Disturbances are hard to detect or to forecastLehrstuhl Informatik 5(Informationssysteme) Prof. Dr. M. Jarke I5-KL-111010-2627. Pattern Language for PALADIN: Example Troll Troll Pattern: This pattern tries to discover the cases when a troll exists in a digital social network. A troll in the network is considered a disturbance.TeLLNet Disturbance:GALA (EXISTS [medium | medium.affordance = threadArtefact]) & (EXISTS [troll |(EXISTS [thread | (thread.author = troll) & (COUNT [message | (message.author = troll) & (message.posted = thread)]) > minPosts]) & (~EXISTS[ thread1, message1| (thread1.author1 != troll) & (message1.author = troll & message1.posted = thread1 ]))])]) Forces: medium; troll; network; member; thread; message; url Force Relations: neighbour(troll, member); own thread(troll, thread) Solution: No attention must be paid to the discussions started by the troll. Rationale: The troll needs attention to continue its activities. If no attention is paid, he/sheLehrstuhl Informatik 5 will stop participating in the discussions. Pattern Relations: Associates Spammer pattern.(Informationssysteme) Prof. Dr. M. Jarke I5-KL-111010-2728. Pattern Discovery Process Pattern 1. Set patternPattern Template parametersDisturbanceDisturbanceVariablesPatternTeLLNet 4a. VariablesParametersChangeGALA Pattern Instance PatternParametersDisturbance Digital Social Network2. Instantiatedisturbances 4b. Apply Variables Pattern Pattern Solution ParametersPattern Template InstanceForcesForceRelations Disturbance InstancesDescription Solution VariablesPatternParametersRationaleDependencies3. EvaluatedisturbancesLehrstuhl Informatik 5(Informationssysteme) Pattern Relations Prof. Dr. M. Jarke I5-KL-111010-2829. PALADIN Case Study10 patterns of disturbance over 119 social network instances,TeLLNet 17359 individuals, 215 345 mailsGALA PatternOccurrences Remarks Burst22The pattern finds out topics which were very important for certainperiod of time. Scalability is necessary. No Conversationalist 76The existence implies little communication in the network. No Questioner 67 The existence implies that the network is not popular. No Answering Person 61 Occurs in small networks. The effects of the lack of an answeringperson must be further checked with content analysis. Troll2 Troll occurs very rarely in cultural communities. True negatives exist. Spammer 86 Spammers can be found often in discussion groups. False positivesexist. Leader37 The pattern occurs in the network centered around a member. No Leader 40 Occurs in big networks where the members are distributed indifferent clusters. Structural Hole 67 Occurs for members having neighbors with only one contact.Lehrstuhl Informatik 5(Informationssysteme)Independent 13 Occurs in large networks where disconnected subnetworks exist. Prof. Dr. M. Jarke I5-KL-111010-29 DiscussionsScalability is necessary.30. TeLLNetGALA Visualization & AnalysisLehrstuhl Informatik 5(Informationssysteme) Prof. Dr. M. Jarke I5-KL-111010-3031. Social Network Analysis of Open Source Communities Eclipse components network based on analysis ofTeLLNetsource code repository (Software Architecture)GALA Eclipse components network based on analysis of mailing list communication (Social Structure)Lehrstuhl Informatik 5(Informationssysteme) Prof. Dr. M. Jarke I5-KL-111010-3132. Community Reflection aboutDevelopment ProcessTeLLNetGALA Social platform: Eclipse forum eclipsezone Forum: Eclipse communication framework (ECF) Measure: degree centralityLehrstuhl Informatik 5(Informationssysteme) Statistics: 225 nodes, 283 edges Prof. Dr. M. Jarke I5-KL-111010-3233. Conversationalist Pattern Social platform: Eclipse mailing listTeLLNet Forum: Device debugging developer discussionGALALehrstuhl Informatik 5(Informationssysteme) Prof. Dr. M. Jarke I5-KL-111010-3334. Questioner Pattern Social platform: Eclipse mailing listTeLLNet Forum: Device debugging developer discussionGALALehrstuhl Informatik 5(Informationssysteme) Prof. Dr. M. Jarke I5-KL-111010-3435. Identification of End-Users and Developers in OSS CommunitiesCommunityTeLLNet ClusteringGALALehrstuhl Informatik 5(Informationssysteme) Prof. Dr. M. Jarke I5-KL-111010-3536. Textual Analysis of Postings from Community ExpertsTeLLNetGALA Postings from experts of one of the identified communitiesLehrstuhl Informatik 5(Informationssysteme) Prof. Dr. M. Jarke I5-KL-111010-3637. Computer Science Knowledge Network: the VisualizationTeLLNetGALALehrstuhl Informatik 5(Informationssysteme) Prof. Dr. M. Jarke I5-KL-111010-3738. Computer Science Knowledge Network:ClusteringTeLLNetGALALehrstuhl Informatik 5(Informationssysteme) Prof. Dr. M. Jarke I5-KL-111010-3839. Interdisciplinary Venues: Top Betweenness CentralityTeLLNetGALALehrstuhl Informatik 5(Informationssysteme) Prof. Dr. M. Jarke I5-KL-111010-3940. High Prestige Series:Top PageRankTeLLNetGALALehrstuhl Informatik 5(Informationssysteme) Prof. Dr. M. Jarke I5-KL-111010-4041. Data Sets DBLP (http://www.informatik.uni-trier.de/~ley/db/)- 788,259 author’s namesTeLLNet- 1,226,412 publicationsGALA- 3,490 venues (conferences, workshops, journals) CiteSeerX (http://citeseerx.ist.psu.edu/)- 7,385,652 publications (including publications in reference lists)- 22,735,240 citations- Over 4 million author’s names Combination- Canopy clustering [McCallum 2000]- Result: 864,097 matched pairs- On average: venues cite 2306 andare cited 2037 timesLehrstuhl Informatik 5(Informationssysteme) Prof. Dr. M. Jarke I5-KL-111010-4142. WikiWatcher – System Design Stage 1: SAX-based Parser in PERL Wiki Network DataTeLLNet AuthorsGenerating XML Parsing wiki data/GALAdump/export filesdatabase transferArticle pages,JoeURLS,Revisions Lizarticle Tim [[Article]] RDB [[requested]]123.45.67.89 Stage 2: Dynamic Analysis and Visualization article[http://…] [[Article2]] Generating Networks Measurement [[never exists]]MetadataLehrstuhl Informatik 5(Informationssysteme)Visualization Network Analysis Prof. Dr. M. Jarke I5-KL-111010-4243. Network Heterogeneity Author NetworksTeLLNet– Author nodesGALA (anonymous/registered users) – Edges represent collaboration between authors during a period t Article Networks – Article nodes (incl. wiki namespaces) – Directed edges (links) between articlesLehrstuhl Informatik 5(Informationssysteme) As expected both kind of Prof. Dr. M. Jarke I5-KL-111010-43 networks stay heterogenous44. Importance of Network Actors Articles: High betweennessTeLLNetcentrality controls the flow ofGALA information within a Wiki Betweenness values grow up or stay nearly constant during the evolution process Determines – Important actors – Important articles – VandalismLehrstuhl Informatik 5(Informationssysteme) Prof. Dr. M. Jarke I5-KL-111010-4445. Evolution of Shortest Paths Densification Power Law:TeLLNetComplex networks mayGALA become denser during their growth Generally this could not verified for wiki author networks! The average distances stagnate at nearly 2 for all considered author networksLehrstuhl Informatik 5(Informationssysteme) Prof. Dr. M. Jarke I5-KL-111010-4546. Evolution of Author Networks Strongly connected components merged by collaboration of two wiki authorsTeLLNetGALALehrstuhl Informatik 5(Informationssysteme) Prof. Dr. M. JarkeAuthor Network of German Wikia in July 2007 Author Network of German Wikia in August 2007 I5-KL-111010-4647. TeLLNetGALA Visualization & AnalysisLehrstuhl Informatik 5(Informationssysteme) Prof. Dr. M. Jarke I5-KL-111010-4748. What you cannot do with theMediabase (in the moment ) Creating a new Mediabase in a new environmentTeLLNet – Maintenance with databases, scripts and interfaces is tediousGALA – Interfaces integrated into Zope/Plone Not all media are equally supported – Very good support for mailing lists, forums, web sites and blogs – Less support for wikis, podcasts, social bookmarks Lacking support for – Conceptual navigation interface (Conzilla!) – Discourse management tools – Weak signal analysis tools – Topic & sentiment & opinion mining toolsLehrstuhl Informatik 5(Informationssysteme) Prof. Dr. M. Jarke– Automatic generation of recommendations I5-KL-111010-4849. The Future of the Mediabase:CommunityBaseTeLLNetGALA Activity Theory [Enge87]Actor Network Self-Community Self- Theory [Lato05] modelingexperiencereflection repository Community of Practice [Weng98] + disturbance +/- - disturbance disturbance [PeKl08]Lehrstuhl Informatik 5(Informationssysteme) Prof. Dr. M. Jarke I5-KL-111010-49 Self-modeling phase contributes to self-reflection phase and vice versa