Knowledge Sharing and Yahoo Answers:
Everyone Knows Something
Lada A. Adamic1, Jun Zhang1, Eytan Bakshy1, Mark S. Ackerman1;2
1School of Information, 2Department of EECS
University of Michigan
Ann Arbor, MI
{ladamic,junzh,ebakshy,ackerm}@umich.edu
Lada A. Adamic1, Jun Zhang1, Eytan Bakshy1, Mark S. Ackerman1;2
1School of Information, 2Department of EECS
University of Michigan
Ann Arbor, MI
{ladamic,junzh,ebakshy,ackerm}@umich.edu
●The goal of these studies was to design better systems and online spaces to support people in sharing knowledge and expertise in the Internet age.
==============================================================
Why do I choose this paper?
Because I seek informations in the Yahoo Answers Forum sometimes.
I am very curious about what they were studying in the Yahoo Answers Forum?
Hence, I study this paper as my final report.
Below are my comments or insights regarding some analysis of statistics or conclusions from this paper:
◎ Basic characteristics
※Average Thread Length( the number of replies per post)
※Average Post Length( how verbose the answers are)
1.) The technical subjects tend to attract few replies, but those replies will be relatively lengthy.
2.) Jokes and Riddles categories with many short replies.
3.) The discussion categories attract many replies of moderate length.
According to this analysis of statistics, we can recognize each characteristic of category.
If the questions are about specialty, the repliers would be less.Oppositely, the questions is very popular, or very easy, the repliers would be much more.I think it is not very important about the post length. We should focus on that the replies whether is useful or not.
◎ Cluster analysis of categories & Degree distributions
(Figure 2.)
※The thread length for a category is given by the average number of responses for each answered question.
※Content length is given by the average number of characters in all responses within a category.
※The asker/replier overlap is the cosine similarity between the asking and replying frequency for each user.
(Red circle)The interaction of categories of specialty is very low. Maybe due to those questions have factual answers.
(Blue and Green circle)Because there are no single factual answers, everyone can discuss freely.For example they can share their experience, idea,etc.Hence, the asker/replier overlap and average thread length are high.
(Figure 3.)
※ indegree (number of users one has received answers from)
※ outdegree (number of users one has answered)
In the Programming category this reflects a few highly active individuals who consistently help others with their tasks and problems, but do not necessarily ask for help themselves.
Some answer many question or two. On the other extreme there were users who asked or answered dozens of questions.
◎ Network structure analysis
From this figure, we can see that the neighbors of some of the highly active users in Wrestling are themselves highly connected, which indicates that they are likely to be "discussion persons".On the contrary, in the Programming category, the most active users are "answer people" because most of their neighbors, the people they are helping, are not connected.
◎ User entropy
Entropy is just such a measure - the more concentrated a person's answers, the lower the entropy, and the higher the focus.
A user who answers in a variety of subcategories of the same top level category would have a lower entropy than someone who answered in the same number of subcategories, but with each falling into different top level categories.
Figure 8(a) shows the entropy distribution of all users who posted 40 or more questions.
Figure 8(b) the proportion of best answers by users.
Table 4. Summarizes the prediction results for three categories from the category clusters.
For all categories, the length of the reply and the number of other answers the asker had to choose from were the two most significant features.
Figure 9. shows the difference in length distribution for best answers and non-best answers in the Programming category.
Conclusion:
From the conclusion of this paper, author thought that it remains unclear whether depth was sacrificed for breadth. I think it is a good way to encourage the YA participations by top level experts to share their experience or answer the questions. Furthermore, I think Answer-Rank just like the Page-Rank from Google can help people realize this answer whether is useful or not. By determining the number of hyperlink of this answer, or determining how many people resolved their questions after reading this answer, this analysis could assure us of this answer is useful or not.
No matter how this results would be, YA forum give us a platform to discuss or share what we want to know.
==============================================================
Why do I choose this paper?
Because I seek informations in the Yahoo Answers Forum sometimes.
I am very curious about what they were studying in the Yahoo Answers Forum?
Hence, I study this paper as my final report.
Below are my comments or insights regarding some analysis of statistics or conclusions from this paper:
◎ Basic characteristics
※Average Thread Length( the number of replies per post)
※Average Post Length( how verbose the answers are)
1.) The technical subjects tend to attract few replies, but those replies will be relatively lengthy.
2.) Jokes and Riddles categories with many short replies.
3.) The discussion categories attract many replies of moderate length.
According to this analysis of statistics, we can recognize each characteristic of category.
If the questions are about specialty, the repliers would be less.Oppositely, the questions is very popular, or very easy, the repliers would be much more.I think it is not very important about the post length. We should focus on that the replies whether is useful or not.
◎ Cluster analysis of categories & Degree distributions
(Figure 2.)
※The thread length for a category is given by the average number of responses for each answered question.
※Content length is given by the average number of characters in all responses within a category.
※The asker/replier overlap is the cosine similarity between the asking and replying frequency for each user.
(Red circle)The interaction of categories of specialty is very low. Maybe due to those questions have factual answers.
(Blue and Green circle)Because there are no single factual answers, everyone can discuss freely.For example they can share their experience, idea,etc.Hence, the asker/replier overlap and average thread length are high.
(Figure 3.)
※ indegree (number of users one has received answers from)
※ outdegree (number of users one has answered)
In the Programming category this reflects a few highly active individuals who consistently help others with their tasks and problems, but do not necessarily ask for help themselves.
Some answer many question or two. On the other extreme there were users who asked or answered dozens of questions.
◎ Network structure analysis
From this figure, we can see that the neighbors of some of the highly active users in Wrestling are themselves highly connected, which indicates that they are likely to be "discussion persons".On the contrary, in the Programming category, the most active users are "answer people" because most of their neighbors, the people they are helping, are not connected.
◎ User entropy
Entropy is just such a measure - the more concentrated a person's answers, the lower the entropy, and the higher the focus.
A user who answers in a variety of subcategories of the same top level category would have a lower entropy than someone who answered in the same number of subcategories, but with each falling into different top level categories.
Figure 8(a) shows the entropy distribution of all users who posted 40 or more questions.
Figure 8(b) the proportion of best answers by users.
Table 4. Summarizes the prediction results for three categories from the category clusters.
For all categories, the length of the reply and the number of other answers the asker had to choose from were the two most significant features.
Figure 9. shows the difference in length distribution for best answers and non-best answers in the Programming category.
Conclusion:
From the conclusion of this paper, author thought that it remains unclear whether depth was sacrificed for breadth. I think it is a good way to encourage the YA participations by top level experts to share their experience or answer the questions. Furthermore, I think Answer-Rank just like the Page-Rank from Google can help people realize this answer whether is useful or not. By determining the number of hyperlink of this answer, or determining how many people resolved their questions after reading this answer, this analysis could assure us of this answer is useful or not.
No matter how this results would be, YA forum give us a platform to discuss or share what we want to know.
1 則留言:
good job!
張貼留言