Select Page

Search term highlighting

I am trying out a Google search term highlighter on my page. I noticed a lot of people coming to my server and finding category or date based indexes for their searches. They then go and use the Google Cache to come and view the page that does the search item highlighting in order to find the specific bit they are interested in.
To help those poor people find things easier I figured I will try highlighting all search referrals. I tried a PHP script that can do this for many search engines but I think it had issues with my existing PHP code on the site, now I am using a Javascript tool called Google Highlighter. It is only for Google searches but that is fine since that is the most used search engine these days.
To see how it works use the search box on my site to search for something, like “devco”, and click on one of the results you will see your terms highlighted in colors similar to those used by Google.
I think it may actually be best to only put it on pages where confusion is most likely, the front page and date based pages. The assumption is that if you get to a individual entry page it will be fairly obvious quickly if it applies to you or not.

a9.com

Slashdot carries the news that a9.com has finally launched. A9 is Amazon‘s search engine that uses Google for search results and image searches, IMDb for movie information and GuruNet for Reference lookups.
It features a nice DHTML interface complete with drag and drop facilities. The whole idea here is to combine useful search related stuff into one page. It remembers all your searches. It tracks everything you click on and it provides a Diary facility to let you annotate your searches and sites you visit. Using all this information it can provide you with recommendations for sites that will interest you etc. It links all of this with your amazon.com account and no doubt use this gathered information to suggest books for you to buy etc.
The history of sites you visited and searches you ran can be edited by you, and I tested this it does stop entries from appearing in your history, but I also noted that by deleting a specific topic from your search history it does not undo what it learnt from you, it still knows you are interested in what you searched for even if you deleted the historic record of those searches!
Some tasty bits from their T’c & C’s:

Use of Third Party Service Providers: We may, from time-to-time, employ other companies and individuals to perform functions on our behalf. Examples include sending e-mail and analyzing data. They have access to personal information needed to perform their functions, but may not use it for other purposes.

We work closely with some third parties. In some cases, we will include offerings from these businesses on A9.com. In other cases, we may include joint offerings from A9.com and these businesses on A9.com. Click here for examples of co-branded and joint offerings. You can tell when a third party is involved in the offering, and we share customer information related to those transactions with that third party.

For reasons such as improving personalization of our service, we might receive information about you from other sources and add it to our information.

On the point of what information they record as provided by you they say the following:

You provide most such information when you use A9.com to search or otherwise communicate with us. For example, you provide information when you enter search terms; set bookmarks; download and use our toolbar; communicate with us by phone, e-mail, or otherwise; and employ our other services. As a result of those actions, you might supply us with personally identifiable information or information about things that interest you.

So if you are still considering using a9.com at this point, they also offer the following advice to you:

If you would prefer not to be recognized on our site, we recommend that you use our alternate service located at generic.A9.com. On generic.A9.com, we will not recognize your A9.com or Amazon.com cookie. Information we gather on generic.A9.com will not be used in our data analysis (other than to detect abuse) and will not be used to personalize the services we offer you.

While I like what they have done, I think I will stick to generic.a9.com for now.

IOL RSS Feeds

Via MJ whose blog I discovered by a referer entry in my logs I notice that IOL has RSS Feeds for their news, this is excellent news since I always liked their news but could never be bothered using an actual browser to read their site regularly.

TCP Header Analysis

I have been spending a lot of time looking at network dumps of SMPP traffic to assist in debugging some network issues. I was a bit rusty on some of the finer details of all the various TCP packet headers and my reference was at home. Google found an amazing resource on firewall.cx titled Anylising the TCP header.
The document spans 7 sections covering the following:

Section 1: Source & Destination Port Number
Section 2: Sequence & Acknowledgement Numbers
Section 3: Header Length
Section 4: TCP Flag Options
Section 5: Window Size, Checksum & Urgent Pointer
Section 6: TCP Options
Section 7: Data

It is beautifully colorful and well written. Something that can easily be passed on to someone who does not know a lot about networking or as a simple resource to just catch up on forgotten knowledge.
Firewall.cx has huge amounts of very good documentation on it, well worth poking around in for networking people.

Fine-tunning SpamAssassin

Via RootPrompt I found a nice article titled Fine-Tuning SpamAssassin. It covers quite a bit of detail about SpamAssassin and is well worth a read.

Over time, however, many of the spammers have figured out how to fine tune their spam and bypass the default ruleset. I find the default setup still picks up at least half the spam, maybe two thirds on a good day, but too much leaks through. If the spammers are tuning their messages, I guess the only thing to do is to tune my scoring. There are at least 8 possible ways of improving SpamAssassin’s hit rate.
1. Blacklisting known offenders
2. DNS Blocklists
3. Enable Bayesian filtering
4. Reduce the point threshold for spam
5. Increase the scores on existing rulesets
6. Upgrade SpamAssassin to the latest version
7. Install more rulesets
8. Write your own rulesets