Popular Tags

|

This is the list of all tags I've used in my blog to those point (this includes or will include tags used in content aggregated from other sites as well).

Categories

$container_id = 181;
$sql =
"SELECT d.nid, d.title, c.description, ".
" MAX(n.created) AS updated, ".
" COUNT(*) AS count, ".
" SUM(2.5/LOG(0.25*((UNIX_TIMESTAMP()-n.created)/2592000)+1.5)-1) AS score ".
"FROM {node} d ".
" INNER JOIN {category} c ON c.cid = d.nid ".
" INNER JOIN {category_node} cn ON cn.cid = c.cid ".
" INNER JOIN {node} n ON n.nid = cn.nid ".
"WHERE c.cnid = %d AND n.status = 1 ".
"GROUP BY d.nid, d.title, c.description ".
"ORDER BY score DESC";

$count_sql = "SELECT COUNT(*) FROM {category} c WHERE c.cnid = $container_id";

$result = pager_query($sql, 30, 0, $count_sql, $container_id);

while ($category = db_fetch_object($result)) {
$items[] = array(
l($category->title, 'node/'. $category->nid,
array('title' => $category->description)),
$category->count,
round($category->score, 1),
t('%time ago', array('%time' => format_interval(time() - $category->updated, 3))),
);
}

print theme('table', array('tag', 'count', 'score', 'last update'), $items);
print theme('pager');
?>

How it works

The list is sorted in order of popularity using an inverse-logarithm scoring method I developed myself. Basically, the most recently used terms get a very high score that tapers off to a relatively low score once the post is a year old or older. Each use of a term is cumulative so the more commonly used terms will appear higher on the list even if they haven't been used as recently as others. This list will sort itself as time goes on according to whatever my latest posts are about.

Originally, when I used the built-in taxonomy module of Drupal, I used this SQL query to do the calculation:

SELECT d.tid, d.name, d.description, 
    MAX(n.created) AS updated, 
    COUNT(*) AS count, 
    SUM(2.5/LOG(0.25*((UNIX_TIMESTAMP() 
        - n.created)/2592000)+1.5)-1) AS score 
FROM {term_data} d 
    INNER JOIN term_node USING (tid) 
    INNER JOIN node n USING (nid) 
WHERE d.vid = ? 
    AND n.status = 1 
GROUP BY d.tid, d.name, d.description 
ORDER BY score DESC

However, since I recently switched to the Category module, I updated the SQL to reflect this with:

SELECT d.nid, d.title, c.description,
       MAX(n.created) AS updated,
       COUNT(*) AS count,
       SUM(2.5/LOG(0.25*((UNIX_TIMESTAMP()-n.created)/2592000)+1.5)-1) AS score
FROM {node} d
     INNER JOIN {category} c ON c.cid = d.nid
     INNER JOIN {category_node} cn ON cn.cid = c.cid
     INNER JOIN {node} n ON n.nid = cn.nid
WHERE c.cnid = ? AND n.status = 1
GROUP BY d.nid, d.title, c.description
ORDER BY score DESC

Here's the function describing the math involved. I developed this function for calculating the score by arbitrarily modifying the log curve to suit my needs.

s equals the sum iterating over eye from one to en of two-point-five over one-point-five plus em log zero-point-two-five quantity minus one

In the equation, s represents the final score, n is the number of terms and i is the iterator over the terms. The mi represents the number of months since the creation of the post the ith use of the term belongs to. Since Drupal stores time in seconds since the epoch (i.e., January 1, 1970), the current time is calculated by subtracting the node's creation time from the current time and then dividing by 2,592,000, which is the number of seconds in a month.

If you view a curve plot for an individual iteration, you would note that when the delta (mi) is 0, the score for that term will be around 5.2. When the delta is 12, the score is about 1.

About this Entry

This page contains a single entry by Andrew Sterling Hanenkamp published on August 6, 2006 7:59 PM.

Psalm 19 was the previous entry in this blog.

Drupal: The glory days is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.