Archive for: January 2008
January 30, 2008
Ya. I know. Boring. But, I’m laying the groundwork for something interesting. Well, I think it’s interesting.
In the last episode I talked about what content is, where to find it, and what it is made up of. In this edition I want to explore the way words come together to make our content.
There are a couple of terms that should be defined before I get much farther into this mess.
In a dictionary, the lemma “go” represents the inflected forms “go”, “goes”, “going”, “went”, and “gone”. The relationship between an inflected form and its lemma is usually denoted by an angle bracket, e.g. “went” < “go”………….
Lemmas are used often in corpus linguistics for determining word frequency. In such usage the specific definition of “lemma” is flexible depending on the task it is being used for.
From Apple Developer.
A collection of one or more documents, typically related, and available to an information retrieval system. Plural: corpora.
From Macmillan English Dictionary.
a collection of written and/or spoken language stored on a computer and used for language research and writing dictionaries
Yep, pretty heavy stuff. There’s a point to all of this, so stay with me.
I made a reference to the Oxford site in the last post, and it’s time to revisit that page. The link will open a new window and you may want to leave it open as we study a few things on that page. AskOxford: Language Facts
In the fifth paragraph, the author cites some interesting facts. For instance, “Just ten different lemmas (the, be, to, of, and, a, in, that, have, and I) account for a remarkable 25% of all the one billion words used in the Oxford English Corpus. If you were to read through the corpus, one word in four would be an example of one of these ten lemmas. Similarly, the 100 most common lemmas account for 50% of the corpus, and the 1,000 most common lemmas account for 75%. But to account for 90% of the corpus you would need a vocabulary of 7,000 lemmas, and to get to 95% the figure would be around 50,000 lemmas.” The remaining 5% might only show up once in several million.
My takeaway on this is the statement that the 1k most common lemmas will cover 75% of the corpus. The author states that the Oxford corpus is in the range of one billion words!
If my ‘old’ math is working, that means that about one-thousand base words would cover 750 million words (in their corpus). Impressed the hell outa me.
Further, the author states that 25k lemmas will represent the set of most significant words in English: those which occur reasonably frequently and which account for all but a small part of everything we may encounter in speech or writing. It includes all the words that we actively use in general everyday life.
Now, go take a gander at the table the author has under the heading ‘What is the commonest word?’. That’s the 100 most common English words. Are you beginning to see some light in this deep dark cave I’ve built for you?
In the next table down he shows us the most used ‘content words’. And he shows them by nouns, verbs, and adjectives. Are there any gears spinning at your place?
What if we could build our own corpus? Our corpus would be a collection of text articles related to a website subject. Within our corpus would be the common words most used by others relative to the subject, and our chosen keywords. Each webmaster would have a different corpus, and, would very likely have several different corpora.
Most of you do have your own corpus on your hard drive. That is, a collection of articles (scraped or written, doesn’t matter) which you use to generate content on demand. You very likely have several corpora devoted to different niches. Some of you will leave the corpus on the web and retrieve it as/when you need it.
You will use your corpus depending on how you want your content to read. If you are only interested in having some content for spider food, you may leave your corpus intact. Perhaps injecting a keyword here and a link there, but basically intact.
If you are after more than keeping a spider busy, you may elect to massage the corpus a little. Here is where the fun begins. And it’s a good enough place to end this edition.
Stay tuned. Next time I’ll give you my most famous recipe. Frickasie corpora served on crepe Suzieanne with a vintage, (Nov.) tokay. Mmmmmm. Won’t want to miss that one. lol
~dink
January 29, 2008
I’ve preached the need to have a good selection of keywords often enough for you to be sick of it by now. I’ve mentioned content a time or two. Now it’s time to get down and dirty with content.
Content is KING . . . yaay!
From a strict point of view, a web page is made up entirely of content. That is, everything that is visible from the ‘view source’ tab is content.
From a webmaster point of view, content includes the meta tags, the title and description, and the portion between the <body> </body> tags. There could be content in the footer, but we’ll pretend that there isn’t.
Generally tho, the content we refer to is that which is between the body tags. That is the content we will be discussing here today. To relieve some of the pressure, we will dismiss the possibility of images and javascript being part of our content, even though those are important pieces of some puzzles. I’ll also disregard the navigation portion of the page and the linkage and anchor text.
Since I’m a resident of the U.S., I will be using only English for my discussion. The Brits will be quick to tell you that I don’t have a clue what real English is. I think I agree with that. I only know what I have learned. You are likely in the same boat.
Our content is made up of words. Words strung together to express concepts, ask questions, and so forth. Our English words are made up of letters of the alphabet. There are only 26 of them. Five vowels and 21 consonants. Except that it might be six vowels and 20 consonants depending on how you treat the ‘y’.
Our words are grouped together to form phrases. Phrases grouped together to form sentences and sentences grouped to form paragraphs. A few paragraphs together makes up a page.
Our mission as blackhat marketers is to produce content that will fool the search engine algos into believing that we actually know something. Produce same quickly and (preferably) automatically. Bonus points given if it actually reads well enough to pass a cursory manual inspection.
The very best content is hand written by someone who knows a subject very well indeed. Very best content is difficult to scale, and takes a lot of time.
Good content comes in several flavors and is found in several places on the web. Good content can include hand written content by someone who does not know the subject so well, but has researched enough to make some sense.
Good content includes articles by others, entries in a wiki, entries in an encyclopedia or dictionary, books, magazines, and other publications. Good content could be transcriptions from audio media, translations of foreign language items, and many others.
Good content can be scaled. Good content can be discovered, stored, manipulated, transformed, and republished in many different forms. Good content, and an off-shoot that I call ‘good enough’ content, is what works best for me. Well, in a blackhat sense. (I do a little mango-tango in the wh market now and then.)
The KING is dead . . . long live words
A quick and dirty search on the web indicates that there are about 995,112 words in the English language{1}. That number is suspect for many reasons, but it gives us some place to start.
Here is a little something to think about{2}:
New words are constantly being invented, developed from existing words, or adopted from other languages. Most will be used rarely, or only by a small group of people. Hence an unlimited number of words may occur in speech and writing which will never be recorded in even the largest dictionary.
And, this little item may require head scratching{3}:
…..what exactly is a word? Clearly we should include single units such as cat and dog. But are the plurals cats and dogs separate words? Should we include compounds such as walking stick, which are made up of two existing words? What about abbreviations like BBC and Dr, which may be freely formed in limitless combinations: are they words? What about proper names?
So we have a whole bunch of potential words to work with. Some of the words are very common, some are so rare that they prolly wouldn’t be recognized as English words anyhow. Some are only a single letter, some are like alphabet soup. What shall we do with them?
I know, let’s put them on our web pages and make money. Part two coming soon.
~dink
- - - - - - - - - -
Notes:
{1}Language Monitor
{2}Oxford English Corpus
{3}Sketchengine
January 27, 2008
I’ve been approached by a group to write a review of their service here on the blog. The offer is for cash. Which is just as good as money. Heh.
“So, what’s the problem with that?”, you asked. “You have posted about a whole bunch of programs on this blog.”
Yes, I have promoted several programs here. But the difference is that I have tried, and use, the programs that I tell you about.
The service that the correspondent asked me to write about is not one that I would use. I wouldn’t even use it just so I could write about it. It doesn’t fit into any of my game plans.
Does that mean one of you wouldn’t use it? No, it doesn’t. What it does mean is that I can’t suggest to you that it is worthwhile. I can’t say if it is or isn’t a good deal. I’ve only investigated it enough to know that I won’t use it.
I have bought a lot of worthless shit. I have been given a bunch of worthless shit. I’m an affiliate for a bunch of worthless shit. I haven’t pointed any of those to you. And I won’t. Now, that doesn’t mean that this service is a piece of shit.
This is my place. I built it. I maintain it. You have chosen to come by here, from time to time, and see what I’m raving about .
Many of you are good friends. All of you are welcome here. I’m not about to jeopardize my friendship (or any future friendships) with you for a few hundred bucks. Note…if we were talking about millions of bucks, you’d be history. LOL
So, why make this post at all? Because I didn’t want to just email the folks and say “no” without an explanation. And, because it really may be of benefit to some of you.
To solve the delemma, I’m going to post a couple of the links the folks gave me, at no charge. You may go see what it’s all about, or not. Completely up to you. That way you have some knowledge that such a service exists, the correspondent gets something, and I leave feeling like I have done my duty.
Here’s a link to the home page of tnx.net. Here is the link to their post over at DP.
Stay outa jail.
~dink
January 26, 2008
I’m disappointed. Not disappointed in you. Disappointed in me. I’ve failed you. I have let you down.
No emails. No one wanted to participate in this stunt. So, why not? What was it that I did wrong? Let’s see what I can think of.
- Nobody needs inbound links. Doesn’t seem likely. Links are the lifeblood of our business. So, maybe the advanced blackhats can create their backlinks fast enough not to need something like this. That doesn’t explain the beginning bh’s not needing links though.
- The whole idea was stupid. Ya, maybe so.
- Contests are even more stupid than the idea. True. May be the prime reason.
- They went ahead and made blogs but didn’t tell you. Possible. Even probable. I hope at least some of them did.
- They don’t trust me. Also very likely. I mean, do I trust myself? Only sometimes, and certainly not my judgment on this thing.
Well, any and all of those things could be true and could be the reason no one wished to participate.
It was meant to be a learning experience. And a fun competition. Didn’t turn out that way.
Perhaps the whole thing will be remembered, someday, by those who need to get backlinks. Perhaps they’ll think back and say to themselves “hey, wasn’t that what ole Dink was talking about?” Yeah, right.
I’m terribly sorry that I let you down. It’s not my first failure and, certainly won’t be my last. However, I do learn from my follies and don’t repeat them. Often.
**the lil red demon on my right shoulder just whispered in my ear….aw quit yer snivellin and whinin, biatch. Get back to work.
Demons are always right. See you around.
~dink
January 24, 2008
The contest is in this post.
I’ve said this before. In different ways. In different media. My view is that all SEO’s are spammers, and, the two main differences between traditional SEO’s and Blackhats are the speed of deployment and depth of penetration by the BH.
With those thoughts in mind, this contest is open to anyone. Doesn’t matter what color your hat is. Or, even if you don’t wear a hat. You can participate if you wish.
Alrighty then. So you want/need inbound links for indexing and ranking purposes. You are getting sick of being blamed for all of the forum spam in the world. You’re tired of sending your trackbacks into a spam trap. You don’t like working your tail off to get a link with nice anchor text, only to see it disappear from where you put it.
I made a post about a way that I use to keep most of those things from ruining my day here. The key to making it work for me is to use free blogs. Some of our newer friends don’t know how to find or use these important resources. So, the contest.
I’ve found a good place to put up some blogs for ranking and indexing. I want to show you where it is and how I use it. Then I want you to show me that you can do this. The folks who can do it the fastest (within reason) will be the official winners. Of course, even if you don’t finish in the top ten, you will still be a winner because you have a new resource.
The reasons I chose the site (below) are, first it has no captcha, second it is very new. The no captcha should be obvious. The new part means that it hasn’t had time to be spammed to death yet. First few thousand blogs stand a better chance of being in it for the long run.
The site: Google The sign-up page: Fish Google
The rules: Create blogs. Not too difficult, huh?
The Prize: Bet you thought you’d win a new Mercedes. Not.
First part. 10 points for each blog created in the first 36 hrs. Plus 10 points for each post created on each blog. The top five entries will get a link with their choice of anchor text for one month on my blog side bar. After a month it will be moved to a permanent page and stay there.
Second part. Lasts 7 days. 5 points for each blog created during the week. The entrant with the most blogs created (with at least one post on each) will get 20 bonus points. Second most gets 15, third 10, fourth 5. At the end of the week, the top 5 in points will get a permanent link on my side bar with the anchor text of their choice. And a permanent link on the permanent page too.
The points collected for the first 36 hours will be included in the 7 day total. So, the early adopters will have a distinct advantage.
The winners will be selected by an impartial jury. That’d be me. Here’s what you do: At the end of the 36 hour period, email me a list of your domains addressed to contest, care of this place. I’ll tabulate the winners and post them. The 36 hour period ends at 12 midnight Friday, Jan 25, 2008. My time is GMT -6, so plan accordingly.
At the end of the 7 day period (12 noon Thursday, Jan 31, 2008) do it the same way. Be sure to include any blogs you created in the first phase.
Following is an example of my way of doing this. Note that I have violated one of my practices in order to show you this. Namely, I put up links on the very first post. Normally I wouldn’t have a page like this until the third post. Here it is in all of it’s radiant glory: Enhance your sexual experience
No, I don’t give a rats ass what you think of my content. No, I don’t give a rats ass if you don’t like my choice of aff programs. No, I don’t give a rats ass how you do it either.
No, I don’t give a rats ass if you don’t go and get your free blog. It matters not to me. I don’t have any connection to the site. I won’t benefit from any activities you may,or may not, engage in. The point is to get links that the spiders will follow to index your pages.
Newbies, listen up. You’ll very likely see some huge numbers of blogs created on the free site. Don’t be discouraged because you can’t create them that quickly. That will come later in your career development. For now, concentrate on getting up some blogs and making posts on them. Then put up links in your sidebar.
I have the distinct feeling that I have forgotten something, but I don’t have a clue what it might be.
Edit..Crap….I remember now. You should use a different email addy for each blog. Keeps the admin from bagging you right off.
Well, that’s it. Go make some blogs. Let’s have some fun. May the best spammer win.
~dink
January 23, 2008
Before I begin, I want to thank Nick (where in the hell did your url go?) for rattlin’ my cage; and Perk for reminding me that my pursuit of world domination is not what everyone else thinks is important.
<!– begin pre-launch hype –>
Huge contest coming soon
- Thousands of prizes
- Hundreds of winners
- Gallons of ‘Secret Sauce’
- Link juice spilling over
- Fun for the whole family
- Easy to enter
- Easier to win….Big
<!– end hype –>
I don’t want to give away all of it yet, but this idea sounds like it will be fun. It will be useful to some of our newer members, and allow the more advanced spammers to show off their skills.
I’m hammering out the final details (read: the fookin server is down) on this little plot, even as I type this. So, oil up your mousepad, slam in another stick of memory, warm up your fav spam machine, and keep your feed reader open. This is gonna be fun.
———————
Interesting tidbit: My friend XMCP over at slightlyshadyseo wrote an excellent article that was published on YOUmoz. (Nah, I don’t need to link to them.) If you ever wondered how to increase your profits, generate more traffic, and have more sex in your life, go read it.
Note to XMCP : Don’t hang with them too much or you’ll turn from slightly shady to mostly whitey. rofl
~dink
January 10, 2008
It just struck me that all of my blathering about keywords and their importance might not be very clear to some folks.
I posted a rather long story about keywords and the way to select the right keywords for your niche over on the syndk8 blog site.
The story is aimed at those who may be rather new to the keyword generation subject. It covers a lot of the basics and a few advanced keyword selection tactics.
If you want to read up on how to select keywords, or if you want to get a little refresher on keywords generation, go read Good keywords are the key to marketing success.
~dink
January 9, 2008
Yep. Free. As in no charge, no strings, no obligation.
L3vi is celebrating his birthday by offering you a month of keyword heaven at no charge. The thing is…the deal will only last until next Monday. If you have wanted to give Wordze a spin, now’s your chance.
There is also a little known, and awesome, script that L3vi has written that can make you some serious cash. It uses his api for Wordze and looks up the freshest entries on Google Trends. The script then polls the Wordze program for the long-tail keywords that are associated with the trends.
Since Gooooogle updates the trends every hour, you should be able to hook up a cron and grab the freshest of the fresh. Work a little magic with your favorite content generator and….hey presto! the latest long-tail keywords to make pages for your site(s). How many pages can you generate per day?
Do yourself a favor and take the best keyword generator on the market for a free one month test drive. You won’t be sorry you did.
~dink
|