Printout from http://www.systasis.com on Friday, 13 December 2024
Copyright: 2003-2024 Symeon Charalabides. All rights reserved.

Information representation: Systasis comments, part 2
by Symeon Charalabides (symeon@systasis.com) on Sunday, 6 April 2003

In the first installment of this article, I tried to analyse the considerations and concerns we faced regarding the development of the Systasis commenting system and how we went about addressing them. We were quite pleased to have finally come up with a way of presenting the content of one or more comments in a pop-up window while keeping the overall thread structure visible in the main window, because, after all, proper presentation is what matters most, right?
Well... not really; we try to maintain high quality standards throughout the mechanics of our websites (you should see the administration panel of this website, man it's sweet!). It's just that the code to create the tree view of the comments was already there and waiting.

Allow me to elaborate:

A little over a year ago, I read the book My SQL/PHP4 Database Applications by Jay Greenspan and Brad Bulger, and if I'm shamelessly advertising it here, it's because I found it to be excellent. I will not elaborate on that (you can read my review on Amazon.co.uk if you follow the link) more than mentioning that they have written the sole beginner's PHP book I know that dares to introduce the concept of recursion. However, I will try to discuss around it.

The book displays a threaded discussion application that produces a multi-topic, multi-threaded disussion board. There are thousands of them on the internet, but I'm not sure if any use a recursive algorithm. Recursion is an elegant process and worthy of admiration when properly applied. In this case, it came up as a solution to the problem that the structure of the database which is to hold the comments cannot be normalized without a potentially apalling loss of eficiency.

Ideally, the comments table would have an ID column, a user_name or user_email or user_ID (the latter if linked to a users table) column, some kind of date column, a title column, and message. This pretty much records all that is needed. However, there is the need to connect every posting to the one it replies to (or indicate that it's a "base" posting). The 3rd normal form way would be to create a new table for the replies to every comment, and use some kind of naming convention to identify it as such. This, however, yields the problem of a potential great number of tables in the same database. There would be one for every level of every thread of every article, and no upper limit to restrict them. The MySQL manual, §5.4.8 says:

Drawbacks to Creating Large Numbers of Tables in the Same Database

If you have many files in a directory, open, close, and create operations will be slow. If you execute SELECT statements on many different tables, there will be a little overhead when the table cache is full, because for every table that has to be opened, another must be closed.


The solution to hold all comments on a certain topic (article in our case) in the same table demands that we add another column, parent_ID, which points to the posting the current row holds a reply to. Obviously, there is some data duplication here, and hence the table is not properly normalized. If, however, MySQL tables have any kind of headers (such "internal" facts are unclear to me), then we are definitely gainining disk space using this technique. Even if not, at least we don't end up with a zillion tables in our database.

So, did we just copy the structure and code presented? Actually, we made some changes of our own: in the example, a third column, root_ID is used, which is supposed to point to the fixed topic the thread in question is about. Though we could have tied such a column to each of our articles, we decided to use one table for each article. Yes, with a naming convention to connect the two. We do not expect to have zillions of articles (although we wouldn't mind), and it should be mentioned that recursion (the way in which PHP is going to dig into the database and present the threads) is harsh on the CPU and shouldn't be applied to very large tables. In essence, we have adopted a middle solution, trying to keep a balance between table rows and number of tables.

Also, studying the recursive algorithm in a Frankfurt flat without the utility of a PHP interpreter for a while, it dawned on me that there was a simple way to present the threaded discussion in a way similar to the way Windows Explorer presents directories and subdirectories. In fact, only 4 simple graphics were needed, and the recursive algorithm to present them could slot in the main algorithm perfectly. Although the process is probably best understood by examining the code, I will try to summarize it below.

Every posting is assigned a number n (possibly 0), expressing the number of postings preceeding it in the thread, its "depth". We define two graphics, trail1.gif (trail1.gif) and trail0.gif (trail0.gif), which we stack horizontally i=0...n -1times, according to whether the i-th posting has another reply after this or not, respectively. Similarly, we define the graphics end1.gif (end1.gif) and end0.gif (end0.gif), which we use at the n-th position according to the above criteria again. I'm not sure whether I'm following this at this stage, or whether I am getting it right even, so here's the PHP code that will hopefullly clear things up:

#If table exists, start:

function show_thread($table, $parent, $trail=array('')) {

#Retrieve all comments with a certain parent ID, newest first

$comment_query= mysql_query('SELECT ID, date, user, title FROM '.$table.' WHERE parent='.$parent.' ORDER BY date DESC');

$i= 1;

while ($com= mysql_fetch_array($comment_query)) {

#The tricky part. The last element of the array $trail is
#assigned 1 if there are more remaining rows in the last
#query, or 0 if it is the last row

$trail[count($trail)-1]= ($i==mysql_num_rows($comquery))? 0: 1;

#Each of the first n-1 elements of $trail is printed out as
#the relevant graphic, trail0.gif or trail1.gif respevtively.

for ($x= 0; $x<count($trail)-1; $x++)
print '<IMG SRC=trail'.$trail[$x].'.gif WIDTH=16 HEIGHT=18 ALIGN=top>';

#Print the n-th element of $trail as the relevant graphic,
#end0.gif or end.gif, then the comment link

print '<IMG SRC=end'.$trail[$x].'.gif WIDTH=16 HEIGHT=18 ALIGN=top>

print '<A HREF=display_comment.php?id=ARTICLE_ID&cid=COMMENT_ID>'.htmlspecialchars($com['title']).'</A>

print '('.$com['date'].', '.$com['user'].')<BR>';

#Add a new element to the end of the array $trail

$trail[]= '';

#Call this function again with the above result as the parent

show_thread($table, $com[0], $trail);

#Subtract the last element of the array $trail

array_pop($trail);

$i++;

}

}

#Call the function with 0 as the parent comment

thread(TABLE_NAME, 0);


In the presentation of the above script, we have used a few conventions to enhance its legibility. The A tag has been stripped of the DHTML events we normally use to hide the long ugly link content in the bottom bar. The variable $table doesn't have to be passed to the function, as it doesn't change. It is used inside the function, though, so the alternative is to declare it as a global. Adding and deleting array elements is handled easily by PHP whatever the size of the array: this algorithm would have been hell to write in a language like C.

The above is the only non-standard part of the whole process associated with entering and retrieving records. Writing a comment to the database is done in a straightforward and intuitive way. In fact, it only relies on the structure of the individual comments table, so there is no point going through it here (we readily assume that if you've followed this far, you are easily able to develop the rest of the code yourself). If, like us, you decide not to create a comments table for every article offhand, but allow it to be created by the first posting, the discreet MySQL command

CREATE TABLE IF NOT EXISTS...

can save you a lot of hassle.

Ah, yes! We've stated our intention of deleting comments that we don't feel have a right to remain in the website. Unless this is clearly stated and mercilessly practiced, you can find your website being the pushover of the internet. Unfortunately, the mechanics by which the posting tree is presented ensures that if an entry is deleted, none of the replies to it will be visible either. Morally, we have no problem with that: if somebody's behaving badly, they should be ignored. Technically, this has the potential to load our database with idle entries, and we don't like waste. Obviously, a similarly recursive algorithm that deletes a posting and all replies to it is due. Here it is:

function kill_branch($table, $parent) {

#Kill the given posting

mysql_query('DELETE FROM '.$table.' WHERE ID="'.$parent.'"');

$findquery= mysql_query('SELECT ID FROM '.$table.' WHERE parent="'.$parent.'"');

#Call the same function for all its "children"

while ($find= mysql_fetch_array($findquery))
kill_branch($table, $find['ID']);

}

kill_branch(TABLE_NAME, COMMENT_ID);


The above routine is quite straightforward, and perhaps the best introduction to recursion possible. Again, it has been simplified for the purpose of this presentation (for example, the full code we use scans the users table for every DELETE in the comments table and subtracts 1 from the total of postings of every user). We also include extensive error-checking and reporting safeguards in all our administration scripts. In fact, it is a late addition to the website. Initially, we though we could just issue the MySQL command

DELETE FROM comments WHERE ID=COMMENT_ID OR parent_ID=COMMENT_ID

and be reasonably safe. Eventually, we realised that
[a] "reasonably" is not good enough for our grandiose plans and
[b] even we take the occasional vacation.

This pretty much concludes the second and final installment of this article. We have tried to present the non-trivial features of the Systasis commenting system, both in terms of information presentation (how the user interacts with it) and representation (how the machine interacts with it), as well as the reasons behind the development of the particular mechanisms. We hope to attract some healthy dialogue over this issue, quite wide in most web-development environments. Mostly, though, we hope that we have demonstrated how integrated development can lead to design terseness, and, rarely, although it would be foolhardly to lay such claim (and only during our frequent feats of arrogance do we ever), the wondrous moment of inspiration.

[ Back to journal | Expand comments | Print this page ]