WikiTracer : WantedData

About :: Browse :: PageIndex :: RecentChanges :: RecentlyCommented :: Login/Register

Wanted Data


Please add here your suggestions of new variables not yet included in the draft specs

Some useful data for developer should be collected, but should only be presented in aggregated anonymous form.



Some more page data might be useful


Namespaces

Andi writes:

Edits per user and edits per page distribution


Camille
Distributions of edits per page and edits per users (that's an activity profile, so it should be related to overall activity perspectives), would already be good

computing the distributions of edits per user/page shouldn't be hard to do *provided that* they are updated for every edit somewhere in the page history and user account

Dario
andi notes: "Counting total numbers (= of edits) might be very ressource intensive, we could provide edits per day"

Camille
when you ask for the revision history in MediaWiki, the list of revisions is immediately available, so should its number, right? (same for users & contributions)

Dario
well the problem is to get the whole distribution *per page* or *per user* in cases with large userbases or huge amounts of content

Felipe Ortega (WikiXRay) suggested that it would be great to have less crude indicators, but those would need to be precomputed and stored in a DB to which WT should have access

the problem is, this is a major barrier against adoption (compared to the "plugin-drop-in-and-register" approach)

Camille
mmm I agree, the external DB model would be way too tough for adoption.
let's assume the whole distribution per page/per user is too large, ok, so what about just updates:

each day, counters are incremented for each user who is doing mods/each page that is getting modified, this is sent to WT in a compressed format: the wiki platform would append edit information in a separate file and then send daily this file in a compressed format to WT, eventually erasing the file daily

the file would be like: each edit is a double word (user ID or page ID), and it's the simple list of all edits which is transmitted to WT

so for 1 million edits, that's about 4MB before compression

Dario

yeah that sounds sensible, but again, it requires something more than just a plugin that can generate content on the fly, right? this would need structural changes

Camille

in 6 yrs on Wikipedia there's been 250M edits
let's assume presently the rate is approximately equivalent to half of this figure, that's about 125M edits per yr, that is, 300k edits per day.

that makes a 1.2MB file, let's say 1MB compressed for the English wikipedia: sounds ok!

Dario
yes, I'm not concerned about size, rather than structural changes needed to generate that
but maybe I'm overestimating the problem

Camille
i agree with your fears, but i think even having these distributions requires no more than adding a line in the edition saving process that dumps the ID of the user + page in a single file, which is to be transmitted over to WT

computationally, this can't be heavy.
-- and we'll do the distribution computation part, as it is now with FT

Dario
certainly not, but for most engines you would need to change the edit form to do this, which is very likely to be core functionality, not a plugin


CategorySpecs
There are no comments on this page.
Valid XHTML 1.0 Transitional :: Valid CSS :: Powered by WikkaWiki
Page was generated in 0.0615 seconds