Google网页排名背后的技术

No Comments

来源:http://googleblog.blogspot.com/2008/07/technologies-behind-google-ranking.html

这是 Google 工程师 Amit Singhal 发表在 Google 官方博客的一篇文章,讲述了 Google 搜索排名背后的一些技术,涉及到 Google 对网页,对语义,对用户意图的理解。

Google 搜索排名的核心技术源自已有50年历史的学术课题 “信息获取” (IR),IR 技术使用统计学原理对文字的使用频率等属性进行研究并对结果进行排名。建立在 IR 理论上的 Google 搜索同时借助链接,网页结构等等技术形成独特的搜索技术。

理解网页:

Google 多年来在网络爬虫与索引系统上投入巨资,因此,Google 拥有非常庞大并且是最新的网页索引,除此之外,Google 还使用一些最新技术提高索引质量,比如,他们开发了一种技术,可以在字面意思之外理解一个网页所表达的重要概念,人们使用意大利语言搜索 “galleria sprovieri londra“,会找到伦敦的 Sprovieri Gallery,尽管 Sprovieri Gallery 主页上既没有 London,也没有 Londra 字样。在美国,人们搜索 “cool tech pc vancouver, wa”,会找到 www.cooltechpc.com,然而 www.cooltechpc.com 的主页上没有任何文字表明他们位于 Vancouver。其它技术包括,区分一个网页中的重要或非重要文字,以及网页内容的新鲜度。

理解语义:

Google 可以通过用户提供的几个搜索关键词,理解用户的真实意图。他们在拼写纠正,词义,以及概念分析方面处于非常领先的位置。很多人都或多或少体验过 Google 的拼写纠正功能,比如搜索 “kofee annan“,Google 会问你搜索的是否 “kofi annan“,然而,当有人搜索 “kofee beans“,Google 会纠正成 “coffee beans”。(Google 事实上已经在尝试语义技术 – 译者)

词义是 Google 尝试理解查询语义的基础,也是 Google 遇到的最大难题。一些在人看来显而易见的东西,机器却很难自动处理。用户并不想对使用什么词汇进行查询而费神,人们甚至压根不知道该使用什么进行查询。在这种时候,Google 的词义系统便可以发挥作用,词义系统可以对查询语句进行非常复杂的修正,比如,查询 “Dr Zhivago” 的时候,Google 知道 Dr 代表 Doctor ,而查询 “Rodeo Dr” 的时候,Dr 代表 Drive。用户搜索 “back bumper repair” 的时候,结果是 rear bumper repair,而搜索 “Ramstein ab“,Google 能够将 ab 理解成 Air Base,”b&b ab” 会理解为 Alberta 的 Bed and Breakfasts 。Google 将这种词义理解系统发展到上百种不同语言。

Google 在搜索排名中使用的另一项技术是概念识别,该技术可以对查询的内容进行概念识别,比如,我们查询 “new york times square church“,Google 知道我们实际上查询的是纽约时代广场上的那座著名教堂,而不是纽约时报中的某篇文章。概念识别技术并不止这些,Google 还对其进行加强以正确地识别语义,比如,搜索 “PC and its impact on people“,事实上是搜索计算机对社会的影响。Google 的搜索分析算法中这类技术比比皆是,而且面向几乎所有语言。

理解用户:

Google 尝试理解用户的目的是为用户返回他们真正需要的结果,而不是他们在搜索语句中所说的东西。该技术基于一个世界级的本地化系统,外加先进的个性化技术,以及各种用户意图识别技术。

Google 对本地结果的重视体现在他们的本地化工作中。同样一个查询语句在不同国家会返回不同结果,比如,查询 “bank]”,在美国返回的是银行,而英国则可能是 Bank Fashion 的服装连锁店,或者英国的银行,而在其它英语国家,如澳大利亚,加拿大,新西兰,南非,返回的则应仍旧是当地的银行。如果你在一些非英语国家查询这个词,象埃及,以色列,日本,俄罗斯,沙特,瑞士,返回的结果将更有趣。就象 Football 在美国和英国表示不同的运动项目一样,同一个词在不同国家查询的结果可能截然不同。 

个性化查询是 Google 另一项先进搜索技术,一个已经登录的用户,如果开通了 Web History 服务,随着他查询时间的增长,Google 会根据他的查询历史,自动调整返回的结果,比如,一个经常查询 Football 相关话题的人,会逐渐从 Google 得到更多足球相关的结果。如果你青睐某个购物站返回的结果,在以后的查询中,会从那个购物站得到更多结果。

Google 在返回用户真正想要的结果方面的另一个例子是,假如你搜索 “chevrolet magnum“,我们知道 Magnum 不是 Chevrolet 产的,是 Dodge 产的,Google 会自动返回 dodge magnum 的结果。还有一个例子,有人搜索 “bangalore“,不仅返回 Bangalore 这个城市的主页,而且返回 Bangalore 的地图,以及一些与 Bangalore  市景,交通相关的视频,这些视频会让你有身临其境的感觉。

如何加入使用微软Live Mesh服务

No Comments

微软的Live Mesh服务是一套文件和设备同步服务,能够将PC、手机和其他设备上的文件通过“云计算”或“网格存储”的方式进行同步。目前刚刚推出的Tech Preview版只支持PC间的文件共享,未来将会加入Mac和手机(可能只是Windows Mobile手机)、XBox、数字相框等等。在遥远的未来,还有可能实现Desktop App和Webapp级别的数据共享。

Live Mesh Tech Preview

有人开始遐想:用手机拍摄一张照片,无需拷贝,就可以直接同步到家里的数字相框上;用PC或者手机就可以控制家中的电视、微波炉……

试用方法:

1. 访问https://account.live.com将自己的Windows Live ID的Location(区域)改为United States,然后登录www.mesh.com就可以了。

2. 在自己的各台PC上下载适合OS版本的LiveMesh程序,目前只支持Windows XP/Vista 32bit和Windows Vista 64bit. 要安装该LiveMesh客户端需要将本地PC的Formats设置为English (United States)。

Add Devices to Live Mesh

3. 安装终端客户端软件后,将本地PC加入到Live Mesh。

Add Device via Live Mesh Client

4. 启动Live Desktop,会看到一个模拟的桌面,在这个桌面建立的文件夹、文件会在各个终端设备的桌面上实现同步。当前的Live Desktop提供5G的存储空间。

5. 将Windows Live ID的区域改回China,继续使用Live Mesh。

FAQ: how to clear ARP cache?

2 Comments

netsh interface ip delete arpcache

Top 5 Helpful Tips for Your Firefox

No Comments

The thing I like best about Firefox is that just when you think you know everything there is to know about the browser, something new comes along and surprises you. The following top 5 tips work on Firefox 2.x (I’ll update the list if there are new ones):

1. Quick search – without going to a search engine first.

Are you reading a website and you subsequently discover a word or phrase that you want to put into a search engine?

Just highlight the word or phrase with your mouse’s left-click button. Then right click the text and choose “Search Google/Yahoo/etc. for …”, or drag the highlighted text into the address bar in the browser. Without pressing “enter”, Firefox will bring Google/Yahoo/etc. search results to you.

2. Delete visited URL’s

When you drop down the box underneath the address bar, you can see your recent browsing history. But what if you want to remove one URL from that list?

Just drop down the URL box, highlight the URL you want to zap then press the “delete” button on your keyboard. The URL will then be removed from the list.

3. Navigate to browser tabs using the keyboard

Instead of using the mouse to click on a tab, why not use the keyboard instead? Pressing CTRL + TAB or CTRL+SHIFT+TAB together will bounce you from tab to tab, starting from the first/last opened tab and working its way along, so this is not a recommended way. It’s better to go to a specifc tab straight away, CTRL + 2 will take you directly to the second tab from the left. CTRL + 5 will take you to the fifth tab from the left.

4. Keyword your bookmarks!

A neat bookmark-related feature of Firefox is its ability to assign keywords to bookmarks. For example, you might assign the keyword howto to your www.myhow-to.com bookmark.

It’s a simple process. Click on the Bookmarks menu, right-click on a bookmark, and select Properties. In the Keyword field, enter the word of your choice. You can now use that word in place of the site’s full URL in the address bar, and you’ll be taken right to the page you saved. Just remember to assign a unique keyword to every bookmark.

5. Grab files off webpages, even protected webpages

Have you ever wanted a picture, file or video off a webpage but you can’t, because it’s been protected? Just right-click on the page, choose “View Page Info” then the “media” tab. Find the file you’re looking for from the list and click on “save”. (note : this doesn’t work for everything but I have still had a pretty high success rate nonetheless).

推荐:Fanpop.com

No Comments

FanpopFanpop 是一个将社会标签与社会网络等各种服务聚集到一起的社区门户网站。在Fanpop上,爱好者社区群体可以发现并分享内容,还能可以围绕他们感兴趣的话题展开讨论。

用户不再需要访问多个网站论坛才能搜索获取到他们想要的信息,Fanpop能把最好的内容集中在一起。爱好者们可以提交、组织、评价包括视频、文章、站点、Blog在内的各种内容 。Fanpop的基本信念是:最狂热的爱好者往往就是他们所爱好的领域内最好、最值得信赖的信息源。

这个网站整合了很多形式:投票、社会标签、社会网络、新闻和论坛。它具有多个入口,会根据年龄层次提供不同的内容。

Fanpop 的运作方式在于让用户对于某个特定话题(Spot)进行链接共享,将所有相关链接聚集到同一个主题底下的同时,用户将有机会从中结识具有共同爱好的朋友,从而形成一个特定的人际关系圈子。Fanpop通过良好的页面导航以及UI 设计,使得用户在寻找、加入、分享相关Spot话题的过程变得更为容易,更具趣味。尽管Fanpop 提供的服务与之前的Squidoo 有相似之处,然而他们提供给与用户的更大的定制话与灵活度使得Fanpop 在短短发布了一周之内,同样获得了blogosphere 当中广泛的关注。

下面是几个该站点中不错的Spot话题区(当然,是我比较感兴趣的):

  1. Coldplay Spot
  2. Google Spot
  3. Startups and Entrepreneurship Spot
  4. Web 2.0 Spot
  5. Geeks Spot
  6. Smart phones and mobile devices
  7. Apple Spot
  8. Blogging Spot

说到Spot,或许一些其他的Google爱好者会开设一个新的“G Spot”话题区……

5 Things You'll Love About Firefox 3

No Comments

December 27, 2007 (Computerworld) — New versions of favorite applications are always a little tricky; you want to keep up with the times without fixing what ain’t broke. With that in mind, I took a look at the newly released Firefox 3 Beta 2 to see what we can look forward to when the final version ships in 2008.

Although the basic look of the browser hasn’t changed, there are actually quite a few new features coming. (For a complete list, you can check out Mozilla’s release notes.)

Some of the new features in Firefox 3 are not immediately obvious — at least, not to the casual user. Among other things, Mozilla is incorporating new graphics- and text-rendering architectures in its browser layout engine (Gecko 1.9) to offer rendering improvements in CSS and SVG; adding a number of security features, including malware protection and version checks of its add-ons; and offline support for suitably coded Web applications.

Other new features — some of which are listed here — are more visible to end users, such as the menu bar that now appears asking if you want to save a just-entered password. Indeed, I’ve wondered if the browser will become top-heavy with built-in features that were already available as add-ons.

For example, Firefox 3 Beta 2 adds the ability to save your existing tabs when you close the app down, and it has enhanced the browser’s ability to magnify Web pages from just affecting text to taking in the entire page — features that are already available via the Tab Mix Plus and Image Zoom extensions. Right now, it looks like the new version will escape that particular criticism — its memory footprint is, if anything, smaller than that of Version 2 — but time alone will tell.

Incidentally, if you’d like to try out the new beta, feel free — but be aware that this is a beta version in the traditional sense, not the sort of eternal beta you get with, say, Google Docs. As a result, there is as of yet no support for current add-ons, and there are still a few serious glitches — for example, you’re going to get an error message if you try to use Yahoo‘s newer e-mail format.

All that being said, here are the five new and/or enhanced features in Firefox 3 Beta 2 that most caught my attention:

1. Easier downloads. While the older Download Manager was quite serviceable, Mozilla has made some nice tweaks in the new version. It now lists not only the file name, but the URL it was downloaded from, and includes an icon that leads to information about when and where you downloaded it. (The Remove link has been, well, removed from the Download manager — you now have to right-click to delete a listing.)

Download Manager
The new Download Manager tracks when you downloaded files and from where. (Click for larger view.)

Address Bar
The address bar is now more readable and searches both your history and your bookmark lists. (Click for larger view.)

But the new feature I really approve of is the ability to resume a download that may have been abruptly stopped because Firefox, or your system, crashed. I tried it out by using the Task Manager to end firefox.exe during a download; when I brought Firefox up again, the Download Manager resumed the download as if nothing had happened.Since I’ve wasted a lot of time over the years having to deal with repeatedly failing downloads, this is something I appreciate.

2. An enhanced address bar. Mozilla has also made improvements in the autocomplete function of its address bar (which Mozilla calls a “location bar”), and I have to say I find it both impressive and useful. In Firefox 3 Beta 2, the autocomplete doesn’t just offer a list of URLs that you’ve been to, but includes sites that are in your bookmark list.

It then gives you a nice, clear listing of the URLs and site names in large, easy-to-read text, with the typed-in phrase underlined. It makes it really simple to find and return to that semi-remembered Web site you visited a few days ago.

3. A workable bookmark organizer. Speaking of bookmarks, the separate history/bookmarks sidebars and managers have been replaced — or, rather, augmented — by a single Places Organizer, which uses Windows Explorer’s familiar tree-on-the-left/list-on-the-right format. It offers a simple, quick way to read and manage your history and bookmarks — including the ability to immediately edit a bookmark’s name, location and tags rather than having to go into the Properties box (something that I was really sick of in Version 2).

Places Organizer
The new Places Organizer vastly improves Firefox’s management of bookmark lists. (Click for larger view.)

4. Easier bookmarking. There are, in fact, quite a few new features involving bookmarking, some of which are small but highly useful. For example, you can now quickly create a bookmark by double-clicking on a star that appears in the right side of the address bar. You can also add tags to your bookmarks, which could work nicely as an organizational tool.

starred bookmark
You can quickly create a bookmark by double-clicking on the star in the address bar. (Click for larger view.)

There is also new folder called Smart Bookmarks in the toolbar. It offers three categories of bookmarks — Most Visited, Recently Bookmarked and Recent Tags — and is automatically populated during the course of your Web sessions. Since, like most people, I have a series of sites that I tend to visit regularly, I can see how something like the Most Visited list could prove handy as a one-click resource for my daily surfing. (I could, of course, create my own folder for these sites, but it’s a lot easier to let Firefox do it.) My only quibble: A Recently Visited list would also be handy — more handy, I think, than a list of sites that were recently bookmarked.

5. Better memory management. I’m a great fan of Firefox, but there have been times when I’ve considered going back to Internet Explorer because of issues I was having with memory. After a couple of hours of adding and dropping tabs, Firefox could commandeer nearly 200MB of memory, at which point I’d usually have to shut it down to prevent my other apps from grinding to a halt. It was very frustrating — especially when the folks at Mozilla denied that it was really a problem.

Windows Task Manager showing Firefox memory usage
The new version of Firefox appears to have a smaller memory footprint than its predecessor. (Click for larger view.)

It now looks like that may have finally been taken care of. Mozilla has announced that the new version handles memory usage better, so I decided to put it through a modest test. I opened the Firefox 3 beta and my current copy of Firefox 2.0.0.11 on different systems; initial memory usage for the current version (with add-ons disabled) was 25,740KB, about 100KB less than the new beta’s usage of 25,848KB. I then opened five tabs in both versions, ran a two-minute YouTube video, and shut everything down but the initial home page. At that point, Firefox 3 Beta 2 was using 46,296KB of memory — more than 2,500KB less than the 48,968KB that Firefox 2 was using.This is admittedly not comprehensive or conclusive testing, but if that trend extends to long-term usage, I can see the latest version of Firefox taking up a lot less memory than its predecessor.

According to Mozilla’s Firefox 3 Beta 2 release notes, “There’s still more to come.” That could be good — or bad. Firefox became popular because it was lean, mean and user-tweakable, and I’d hate to see Mozilla lose that focus.

Certainly, if Firefox 3 Beta 2 is anything to go by, the Mozilla team is doing a fine job in balancing new features with a basic philosophy of “don’t fix what ain’t broke.” We can only hope that they will continue following this adage as time goes on and the final release grows closer.