levo

msg:4550178
5:17 pm on Mar 1, 2013 (gmt 0)Google has just launched an interactive guide that "follow[s] the entire life of a search query, from the web, to crawling and indexing, to algorithmic ranking and serving, to fighting webspam."
[google.com...]
The best part of it is the Live spam screenshots, you can see the example pages that Google has just removed from its index.
Here's the official announcement on the launch of Google's How Search Works Today we're releasing a similar website called How Search Works.
Here you can follow the entire life of a search query, from the web, to crawling and indexing, to algorithmic ranking and serving, to fighting webspam. The site complements existing resources, including this blog, the help center, user forums, Webmaster Tools, and in-depth research papers.
A few things you'll find:
An interactive, graphical explanation of Google SearchA view into major search algorithms and features
A 43-page document explaining how we evaluate our results
A live slideshow of spam as we remove it
Graphs illustrating the spam problem and how we fight it
A list of policies that explain when we'll remove content
How Search Works Launched [insidesearch.blogspot.co.uk]
9:39 pm on Mar 1, 2013 (gmt 0)
I'm loving the live spam removal screenshots. Actual page title: BUY CHEAP LOWER BEST PRICED WHERE
I've seen several examples of thin affiliates. Plenty of content, but all duped from affiliate sponsors' sites. And boatloads of telephone number lookup sites.
It's kind of fascinating, a new guilty pleasure.
I'm loving the live spam removal screenshots.
I'm waiting for a modern-day reenactment of the scene where Walter Cronkite is reading out the hot-off-the-press Nixon Enemies List ... and comes to his own name.
:)
I think my favorite title so far is
Intitle Network Camera Inurl Cgistart Page Single, Bach Cello Suite ...
Now, is that the actual title of the actual page on the actual site ... or yet another example of g###s much-commented-on Renaming To Match The Search?
The eye-opener for me is that they still have to manually remove some of this stuff. A grammar check combined with topic detection could certainly flag most of it. Still, it's nice to know human spambusters aren't completely obsolete for the time being.
1:09 am on Mar 2, 2013 (gmt 0)
The eye-opener for me is that they still have to manually remove some of this stuff. A grammar check combined with topic detection could certainly flag most of it. Still, it's nice to know human spambusters aren't completely obsolete for the time being.
There are many flaws in math (the algorithm) where you simply cannot perfectly detect spam with a 0% threshold without including a montage of false positives.
Getting rid of spam is a high priority but accidentally getting rid of quality because in emulates your math in some way is a higher priority to avoid.
Getting rid of spam is a high priority but accidentally getting rid of quality because in emulates your math in some way is a higher priority to avoid.
Well said! Google always looked at positive signals, long before they added a spam detection team. Retuning good sites is their #1 priority, even if they do struggle with it.
8:38 am on Mar 2, 2013 (gmt 0)
< moved from another location >
Amazing information from Google here...
Fighting Spam
http://www.webmasterworld.com/r.cgi?f=30&d=4550176&url=http://www.google.com/insidesearch/howsearchworks/fighting-spam.html [google.com]:
1. A live screenshot of spam sites - which are totally out of Google.
2. Types of Spam: We SEO's say it Panda but i think Google say it "PURE SPAM". They had defined all terms related to it.
3. Spam stats: A graphical representation of manual actions taken against spam.
4. A list of spam updates with timing from February 2005 to April 2012 of Penguin.
5. A graph for Notifying Website Owners each month from May 2007 to February 2012
6. Feedback for reconsideration requests from December 2006 to June 2012
.
[edited by: Robert_Charlton at 10:04 am (utc) on Mar 2, 2013]
[edit reason] moved from another location, added section title [/edit]
:: pause for belated "D'oh!" as I realize why LinkedIn thinks I know some guy named Charlton ::
Has anyone else noticed this exasperating detail? Once you're in the How Search Works area it's but a short step to the Google Playground, leading to the Demo Slam, leading to an illustration of posterizing {basketball player whom I've no idea whether I'm supposed to have heard of or not} with in-your-face visible illustration of using G### Image Search to grab a picture straight off the internet without the tiresome formality of visiting the page it lives on...
Sigh.
If it was just removed 37 minutes ago (this number changes, as does the total number of examples) why is the first page always the same?
We?ve removed some pornographic content and malware from this demo, but otherwise this is an unfiltered stream of fresh English examples of ?pure spam? removals.
Mmm well, for a given definition of "English" anyway. Dang! The one about how to beat a mouth-swab test isn't there any more. Was going to bookmark that ;)
If you do not currently use polysyllabic point loans you are making your existence
many ticklish than it needs to be.
We need a sister thread in Foo where people can record their favorites.
In the section that shows live spam screen shots, it says that the first page shown was "Removed from search results 38 minutes ago" .
My question is: If this page is such obvious spam, then why was it ever allowed to get into the search results in the first place?
Why doesn't the Google algorithm screen newly-discovered pages and immediately discard all the obvious spam, so that it never gets into the index at all?
wow, downgrade dynamic DNS is something i allways thought about. Now they confirmed it and we use dynamic DNS for downloading PDF Files and have been hit badly.
To downgrade all users form a specific dynamic DNS shows me that g* will although downgrade complete IP ranges and "nearby" hosted pages.
2:49 pm on Mar 4, 2013 (gmt 0)
Excerpt from the Raters Guide:-
Utility: The utility of the landing page is a measure of how helpful the page is for the user intent. Pages with good
utility are helpful for users. Pages with no utility are useless. Utility is the most important aspect of search engine
quality, and is therefore the most important thing for you to think about when evaluating webpages.
An insight for Panda sufferers into what quality is perhaps....
A fuller description that Martin Ice Web is talking about is - "dynamic DNS provider that has a significant fraction of spammy content." Know your neighbors seems to be the motto here.
8:36 am on Mar 5, 2013 (gmt 0)
tedster, thanx, thats what i meant. Although i don?t catch it why g* downgrades a whole dynamic DNS provider? DOn?t they have the power or are there really most of them spammy and harmfull pages? To mayn pages to compile? to expensive in regard to quality? I know that many Security/virus scanners are doing it the same way.
In 12/2012 i began not to link my PDF files dirct do a dynamic host but to do it by rewriting it with php. I saw a siginificant jump the next panda update. But i don?t know how long g* will look back at page history to evaluate trust? And, will g* be able to loop back the php rewrite?
@claaarky, now the point is to know how g* measures utility ( your favourised users stats like pages views, bounce rate )? or are they able to calculate a pages usability by comparing it with templates?
i don?t catch it why g* downgrades a whole dynamic DNS provider
Truthfully, I don't either. I always thought the idea should be to rank by domain, not IP. Still, this kind of bad neighborhood problem is not something new with Google.
No comments:
Post a Comment