<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-8205906155075223532</id><updated>2011-10-14T03:13:40.516-07:00</updated><category term='blog crawl'/><category term='www08'/><category term='social media'/><category term='bio'/><category term='algorithms'/><category term='conference'/><category term='sql'/><category term='personal'/><category term='icwsm2008'/><category term='icwsm'/><category term='life'/><title type='text'>Alexey Maykov</title><subtitle type='html'></subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://maykov.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8205906155075223532/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://maykov.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>Alexey Maykov</name><uri>http://www.blogger.com/profile/06295810199316638919</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://bp2.blogger.com/_xAF3Lelh2gI/R_CD78Tt0oI/AAAAAAAAAAM/oICc1U9O-kE/S220/amaykov.jpg'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>28</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-8205906155075223532.post-5021995745077866639</id><published>2009-02-16T13:24:00.001-08:00</published><updated>2009-02-16T13:24:30.537-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='sql'/><title type='text'>bulk insert access denied problem</title><content type='html'>&lt;p&gt;You may get this error when trying to use the bulk insert operator in SQL server 2005:&lt;/p&gt;  &lt;p&gt;Cannot bulk load because the file &amp;quot;\\maykov100\public\blogcatalog.txt&amp;quot; could not be opened. Operating system error code 5(Access is denied.).&lt;/p&gt;  &lt;p&gt;As it turns out, SQL server tries to use your credential to access the file to bulk import (impersonation). In a situation, where you access SQL server remotely (ie from a SQL Management Studio running on your machine), your credentials can't be retranslated. Possible ways to solve this problem:&lt;/p&gt;  &lt;ul&gt;   &lt;li&gt;Use BCP from your local machine&lt;/li&gt;    &lt;li&gt;Run the SQL managements studio directly on a SQL machine&lt;/li&gt;    &lt;li&gt;Modify your company's active directory to allow impresonalizati0n&lt;/li&gt; &lt;/ul&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8205906155075223532-5021995745077866639?l=maykov.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://maykov.blogspot.com/feeds/5021995745077866639/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8205906155075223532&amp;postID=5021995745077866639' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8205906155075223532/posts/default/5021995745077866639'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8205906155075223532/posts/default/5021995745077866639'/><link rel='alternate' type='text/html' href='http://maykov.blogspot.com/2009/02/bulk-insert-access-denied-problem.html' title='bulk insert access denied problem'/><author><name>Alexey Maykov</name><uri>http://www.blogger.com/profile/06295810199316638919</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://bp2.blogger.com/_xAF3Lelh2gI/R_CD78Tt0oI/AAAAAAAAAAM/oICc1U9O-kE/S220/amaykov.jpg'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8205906155075223532.post-6595196945491921786</id><published>2009-02-04T18:13:00.001-08:00</published><updated>2009-02-04T18:13:17.703-08:00</updated><title type='text'>Visual Studio, Remote Debugging, DOM problems, etc</title><content type='html'>&lt;p&gt;I think, this is familiar to any Windows developer: you start a remote debug monitor, point your VS to it and all you get is some network error message box. Today, I finally solved this mystery. Make sure that you can access the machine running Visual Studio from the machine where the program is running by its DNS name. In my case, I just got assigned a new IP address from DHCP and the remote machine had a wrong IP for my machine. Once I fixed this, remote debugging started working again.&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8205906155075223532-6595196945491921786?l=maykov.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://maykov.blogspot.com/feeds/6595196945491921786/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8205906155075223532&amp;postID=6595196945491921786' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8205906155075223532/posts/default/6595196945491921786'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8205906155075223532/posts/default/6595196945491921786'/><link rel='alternate' type='text/html' href='http://maykov.blogspot.com/2009/02/visual-studio-remote-debugging-dom.html' title='Visual Studio, Remote Debugging, DOM problems, etc'/><author><name>Alexey Maykov</name><uri>http://www.blogger.com/profile/06295810199316638919</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://bp2.blogger.com/_xAF3Lelh2gI/R_CD78Tt0oI/AAAAAAAAAAM/oICc1U9O-kE/S220/amaykov.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8205906155075223532.post-4849067769789256706</id><published>2009-02-04T18:09:00.001-08:00</published><updated>2009-02-04T18:09:20.029-08:00</updated><title type='text'>C# HttpListener: HttpListenerException: The I/O operation has been aborted because of either a thread exit or an application request</title><content type='html'>&lt;p&gt;In this blog, I'm going to start focusing more on small details of C# programming. With this said:&lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;HttpListener is a great C# 2.0 class. It allows you to create web-servers without having to have an IIS server on your machine. This became possible since a while ago, IIS was divided into two parts: HTTP.SYS and proper IIS. HTTP.SYS is supplied with any machine running XP SP2 or Server 2003. HttpListener allows .net programs to use HTTP.sys&lt;/p&gt;  &lt;p&gt;My program creates HTTPListener, calls Start and BeginGetContext methods. Then, the thread terminates. Whenever an HTTP request is received a delegate is called on a thread pool thread which call EndGetContext and generates HTTP request. It worked fine on Vista, but started throwing exceptions on Server 2003. After I made sure that the thread calling HTTPLIstener.Start didn't terminate, the exception went away.&lt;/p&gt;  &lt;p&gt;I hope, this will be useful to some pure soul trying to debug.&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8205906155075223532-4849067769789256706?l=maykov.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://maykov.blogspot.com/feeds/4849067769789256706/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8205906155075223532&amp;postID=4849067769789256706' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8205906155075223532/posts/default/4849067769789256706'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8205906155075223532/posts/default/4849067769789256706'/><link rel='alternate' type='text/html' href='http://maykov.blogspot.com/2009/02/c-httplistener-httplistenerexception-io.html' title='C# HttpListener: HttpListenerException: The I/O operation has been aborted because of either a thread exit or an application request'/><author><name>Alexey Maykov</name><uri>http://www.blogger.com/profile/06295810199316638919</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://bp2.blogger.com/_xAF3Lelh2gI/R_CD78Tt0oI/AAAAAAAAAAM/oICc1U9O-kE/S220/amaykov.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8205906155075223532.post-3403058178108495613</id><published>2008-12-26T15:20:00.001-08:00</published><updated>2008-12-26T15:20:57.835-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='personal'/><title type='text'>Snow</title><content type='html'>&lt;p&gt;Seattle area is not a snowy place. When a snow fells several times per year, usually it doesn't stick for longer than one or two days. This year, we're having an anomalously long period of snow. In the last week, we went through all phases of a typical Russian winter. It started with a heavy December snowfall on Thursday, then there was a January low temperature and a sunshine on Friday and Saturday. A February blizzard followed. Now, we're going through the early spring snow melting. We and other people in my neighborhood got a full advantage of snow activities. We went sledding, cross country skiing, built lots of snowmen. I even saw downhill skiers and snowboarders. Traffic, of course, was the mess. The lack of snowplows in Seattle manifested itself in a fact that they closed the steep road near my neighborhood. Some people kept using it since it was the only way to get to their house. Instead of plowing it, they put a policeman giving out warnings and tickets for driving on that road.&lt;/p&gt;  &lt;p&gt;Here are some photos which I took:&lt;/p&gt;  &lt;p&gt;Sledding hill:&lt;/p&gt;  &lt;p&gt;&lt;img style="border-top-width: 0px; border-left-width: 0px; border-bottom-width: 0px; border-right-width: 0px" height="184" alt="IMG_2722" src="http://lh3.ggpht.com/_xAF3Lelh2gI/SVVmz242DPI/AAAAAAAAAEY/4r7xOVNb7-Q/IMG_2722_thumb.jpg" width="244" border="0" /&gt;&lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;US Mail truck making it through the snow:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh6.ggpht.com/_xAF3Lelh2gI/SVVmz46l7oI/AAAAAAAAAEc/jANO_boSC4c/IMG_27272.jpg"&gt;&lt;img style="border-top-width: 0px; border-left-width: 0px; border-bottom-width: 0px; border-right-width: 0px" height="184" alt="IMG_2727" src="http://lh3.ggpht.com/_xAF3Lelh2gI/SVVm0WSov6I/AAAAAAAAAEg/waQB0cestK4/IMG_2727_thumb.jpg" width="244" border="0" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;The UPS truck got stuck and was abandoned for the night. &lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh3.ggpht.com/_xAF3Lelh2gI/SVVm1Pa5sPI/AAAAAAAAAEk/CERburyUg68/IMG_27362.jpg"&gt;&lt;img style="border-top-width: 0px; border-left-width: 0px; border-bottom-width: 0px; border-right-width: 0px" height="184" alt="IMG_2736" src="http://lh6.ggpht.com/_xAF3Lelh2gI/SVVm1eczTzI/AAAAAAAAAEo/Rdy84Zu2cI4/IMG_2736_thumb.jpg" width="244" border="0" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;An Igloo house which we built on our lawn.&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh3.ggpht.com/_xAF3Lelh2gI/SVVm1q-jsvI/AAAAAAAAAEs/36keMJJ3Onc/IMG_27312.jpg"&gt;&lt;img style="border-top-width: 0px; border-left-width: 0px; border-bottom-width: 0px; border-right-width: 0px" height="184" alt="IMG_2731" src="http://lh4.ggpht.com/_xAF3Lelh2gI/SVVm2UoFtQI/AAAAAAAAAEw/bd7gHKODdt8/IMG_2731_thumb.jpg" width="244" border="0" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8205906155075223532-3403058178108495613?l=maykov.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://maykov.blogspot.com/feeds/3403058178108495613/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8205906155075223532&amp;postID=3403058178108495613' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8205906155075223532/posts/default/3403058178108495613'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8205906155075223532/posts/default/3403058178108495613'/><link rel='alternate' type='text/html' href='http://maykov.blogspot.com/2008/12/snow.html' title='Snow'/><author><name>Alexey Maykov</name><uri>http://www.blogger.com/profile/06295810199316638919</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://bp2.blogger.com/_xAF3Lelh2gI/R_CD78Tt0oI/AAAAAAAAAAM/oICc1U9O-kE/S220/amaykov.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh3.ggpht.com/_xAF3Lelh2gI/SVVmz242DPI/AAAAAAAAAEY/4r7xOVNb7-Q/s72-c/IMG_2722_thumb.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8205906155075223532.post-5190646924241569492</id><published>2008-12-01T21:47:00.001-08:00</published><updated>2008-12-01T21:47:12.492-08:00</updated><title type='text'>Magnificent!</title><content type='html'>&lt;div&gt;&lt;embed src="http://www.dailymotion.com/swf/k3RaxqZBbRGaPCNsxm&amp;amp;related=0&amp;amp;canvas=medium" width="480" height="405" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" /&gt;    &lt;br /&gt;&lt;b&gt;&lt;a href="http://www.dailymotion.com/video/x70ni4_video-projection-monumentale-2009_creation"&gt;VIdeo projection Monumentale 2009&lt;/a&gt;&lt;/b&gt;    &lt;br /&gt;&lt;i&gt;Uploaded by &lt;a href="http://www.dailymotion.com/uruk"&gt;uruk&lt;/a&gt;&lt;/i&gt;&lt;/div&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8205906155075223532-5190646924241569492?l=maykov.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://maykov.blogspot.com/feeds/5190646924241569492/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8205906155075223532&amp;postID=5190646924241569492' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8205906155075223532/posts/default/5190646924241569492'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8205906155075223532/posts/default/5190646924241569492'/><link rel='alternate' type='text/html' href='http://maykov.blogspot.com/2008/12/magnificent.html' title='Magnificent!'/><author><name>Alexey Maykov</name><uri>http://www.blogger.com/profile/06295810199316638919</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://bp2.blogger.com/_xAF3Lelh2gI/R_CD78Tt0oI/AAAAAAAAAAM/oICc1U9O-kE/S220/amaykov.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8205906155075223532.post-921727008982374430</id><published>2008-11-10T20:44:00.001-08:00</published><updated>2008-11-10T20:44:46.118-08:00</updated><title type='text'>Very nice LiveJournal visualization</title><content type='html'>&lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;&lt;a href="http://aqua.livejournal.ru/"&gt;&lt;img style="border-right: 0px; border-top: 0px; border-left: 0px; border-bottom: 0px" height="373" alt="image" src="http://lh6.ggpht.com/_xAF3Lelh2gI/SRkNshpWAvI/AAAAAAAAAEU/KZNzZsh3WbE/image%5B8%5D.png" width="644" border="0" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;&lt;a title="http://aqua.livejournal.ru/" href="http://aqua.livejournal.ru/"&gt;http://aqua.livejournal.ru/&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;BTW, all these &amp;amp;#1076; nonsense on the background is a porno spam in Russian language.&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8205906155075223532-921727008982374430?l=maykov.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://maykov.blogspot.com/feeds/921727008982374430/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8205906155075223532&amp;postID=921727008982374430' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8205906155075223532/posts/default/921727008982374430'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8205906155075223532/posts/default/921727008982374430'/><link rel='alternate' type='text/html' href='http://maykov.blogspot.com/2008/11/very-nice-livejournal-visualisation.html' title='Very nice LiveJournal visualization'/><author><name>Alexey Maykov</name><uri>http://www.blogger.com/profile/06295810199316638919</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://bp2.blogger.com/_xAF3Lelh2gI/R_CD78Tt0oI/AAAAAAAAAAM/oICc1U9O-kE/S220/amaykov.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh6.ggpht.com/_xAF3Lelh2gI/SRkNshpWAvI/AAAAAAAAAEU/KZNzZsh3WbE/s72-c/image%5B8%5D.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8205906155075223532.post-8095745398611114196</id><published>2008-11-03T23:02:00.001-08:00</published><updated>2008-11-03T23:03:07.576-08:00</updated><title type='text'>Politics</title><content type='html'>&lt;p&gt;As we are getting closer to the unwinding of the current political conundrum, here is a video which may help you to make your decision. Here we go: GOP maverick&lt;/p&gt; &lt;object height="330" width="400" classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000"&gt;&lt;param name="_cx" value="10583"&gt;&lt;param name="_cy" value="8731"&gt;&lt;param name="FlashVars" value=""&gt;&lt;param name="Movie" value="http://pics.smotri.com/scrubber_custom8.swf?file=v110899519c&amp;amp;bufferTime=3&amp;amp;autoStart=false&amp;amp;str_lang=eng&amp;amp;xmlsource=http%3A%2F%2Fpics.smotri.com%2Fcskins%2Fblue%2Fskin_color_lightaqua.xml&amp;amp;xmldatasource=http%3A%2F%2Fpics.smotri.com%2Fskin_ng.xml"&gt;&lt;param name="Src" value="http://pics.smotri.com/scrubber_custom8.swf?file=v110899519c&amp;amp;bufferTime=3&amp;amp;autoStart=false&amp;amp;str_lang=eng&amp;amp;xmlsource=http%3A%2F%2Fpics.smotri.com%2Fcskins%2Fblue%2Fskin_color_lightaqua.xml&amp;amp;xmldatasource=http%3A%2F%2Fpics.smotri.com%2Fskin_ng.xml"&gt;&lt;param name="WMode" value="Window"&gt;&lt;param name="Play" value="0"&gt;&lt;param name="Loop" value="-1"&gt;&lt;param name="Quality" value="High"&gt;&lt;param name="SAlign" value=""&gt;&lt;param name="Menu" value="-1"&gt;&lt;param name="Base" value=""&gt;&lt;param name="AllowScriptAccess" value="always"&gt;&lt;param name="Scale" value="ShowAll"&gt;&lt;param name="DeviceFont" value="0"&gt;&lt;param name="EmbedMovie" value="0"&gt;&lt;param name="BGColor" value="FFFFFF"&gt;&lt;param name="SWRemote" value=""&gt;&lt;param name="MovieData" value=""&gt;&lt;param name="SeamlessTabbing" value="1"&gt;&lt;param name="Profile" value="0"&gt;&lt;param name="ProfileAddress" value=""&gt;&lt;param name="ProfilePort" value="0"&gt;&lt;param name="AllowNetworking" value="all"&gt;&lt;param name="AllowFullScreen" value="true"&gt; &lt;embed src="http://pics.smotri.com/scrubber_custom8.swf?file=v110899519c&amp;amp;bufferTime=3&amp;amp;autoStart=false&amp;amp;str_lang=eng&amp;amp;xmlsource=http%3A%2F%2Fpics.smotri.com%2Fcskins%2Fblue%2Fskin_color_lightaqua.xml&amp;amp;xmldatasource=http%3A%2F%2Fpics.smotri.com%2Fskin_ng.xml" quality="high" allowscriptaccess="always" allowfullscreen="true" wmode="window" width="400" height="330" type="application/x-shockwave-flash"&gt;&lt;/embed&gt;&lt;/object&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8205906155075223532-8095745398611114196?l=maykov.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://maykov.blogspot.com/feeds/8095745398611114196/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8205906155075223532&amp;postID=8095745398611114196' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8205906155075223532/posts/default/8095745398611114196'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8205906155075223532/posts/default/8095745398611114196'/><link rel='alternate' type='text/html' href='http://maykov.blogspot.com/2008/11/politics.html' title='Politics'/><author><name>Alexey Maykov</name><uri>http://www.blogger.com/profile/06295810199316638919</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://bp2.blogger.com/_xAF3Lelh2gI/R_CD78Tt0oI/AAAAAAAAAAM/oICc1U9O-kE/S220/amaykov.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8205906155075223532.post-7327536912678766622</id><published>2008-10-14T22:09:00.001-07:00</published><updated>2008-10-14T22:09:11.374-07:00</updated><title type='text'>PPTPlex</title><content type='html'>&lt;p&gt;A very cool technology from Microsoft Office Lbs. Allows presentation in a style close to the visualistion on my &lt;a href="http://maykov.blogspot.com/2008/09/why-we-drink.html"&gt;previous post&lt;/a&gt; &lt;/p&gt;  &lt;embed src="http://images.video.msn.com/flash/soapbox1_1.swf" width="432" height="364" id="vrckkucu" type="application/x-shockwave-flash" allowFullScreen="true" allowScriptAccess="always" pluginspage="http://macromedia.com/go/getflashplayer" flashvars="c=v&amp;v=f362631f-c86c-4547-a544-9b8eda9975e3&amp;ifs=true&amp;fr=shared&amp;vc=catalog.video.msn.com&amp;d=video.msn.com"&gt;&lt;/embed&gt;&lt;noembed&gt;&lt;a href="http://video.msn.com/?playlist=videoByUuids:uuids:f362631f-c86c-4547-a544-9b8eda9975e3&amp;amp;showPlaylist=true&amp;amp;from=shared" target="_new" title="An Overview of pptPlex"&gt;Video: An Overview of pptPlex&lt;/a&gt;&lt;/noembed&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8205906155075223532-7327536912678766622?l=maykov.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://maykov.blogspot.com/feeds/7327536912678766622/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8205906155075223532&amp;postID=7327536912678766622' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8205906155075223532/posts/default/7327536912678766622'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8205906155075223532/posts/default/7327536912678766622'/><link rel='alternate' type='text/html' href='http://maykov.blogspot.com/2008/10/pptplex.html' title='PPTPlex'/><author><name>Alexey Maykov</name><uri>http://www.blogger.com/profile/06295810199316638919</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://bp2.blogger.com/_xAF3Lelh2gI/R_CD78Tt0oI/AAAAAAAAAAM/oICc1U9O-kE/S220/amaykov.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8205906155075223532.post-3111710588913463356</id><published>2008-10-09T10:11:00.001-07:00</published><updated>2008-10-09T10:11:07.530-07:00</updated><title type='text'>Political Streams</title><content type='html'>&lt;p&gt;Check out the product which we just released: &lt;a title="http://socialstreams.livelabs.com/politics/" href="http://socialstreams.livelabs.com/politics/"&gt;http://socialstreams.livelabs.com/politics/&lt;/a&gt; .&amp;#160; From &lt;a href="http://livelabs.com/social-streams/faq/"&gt;FAQ&lt;/a&gt;: Political Streams is an application which mines social media content in real time for political discussion. It surfaces the news articles and documents that are being discussed as well as the people and places that appear in those articles.&lt;/p&gt;  &lt;p&gt;A number of great people worked on it. &lt;a href="http://datamining.typepad.com/"&gt;Matthew Hurst&lt;/a&gt; and &lt;a href="http://justinrudd.wordpress.com/"&gt;Justin Rudd&lt;/a&gt; are among them and have blogs.&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8205906155075223532-3111710588913463356?l=maykov.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://maykov.blogspot.com/feeds/3111710588913463356/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8205906155075223532&amp;postID=3111710588913463356' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8205906155075223532/posts/default/3111710588913463356'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8205906155075223532/posts/default/3111710588913463356'/><link rel='alternate' type='text/html' href='http://maykov.blogspot.com/2008/10/political-streams.html' title='Political Streams'/><author><name>Alexey Maykov</name><uri>http://www.blogger.com/profile/06295810199316638919</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://bp2.blogger.com/_xAF3Lelh2gI/R_CD78Tt0oI/AAAAAAAAAAM/oICc1U9O-kE/S220/amaykov.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8205906155075223532.post-7355517750622371043</id><published>2008-09-19T15:02:00.001-07:00</published><updated>2008-09-19T15:02:52.380-07:00</updated><title type='text'>Why we drink</title><content type='html'>&lt;p&gt;Amazingly cool visualization:&lt;/p&gt;  &lt;p&gt;&lt;object height="400" width="400" classid="clsid:D27CDB6E-AE6D-11cf-96B8-444553540000"&gt;&lt;param name="_cx" value="10583"&gt;&lt;param name="_cy" value="10583"&gt;&lt;param name="FlashVars" value=""&gt;&lt;param name="Movie" value="http://current.com/e/89048684/en_US"&gt;&lt;param name="Src" value="http://current.com/e/89048684/en_US"&gt;&lt;param name="WMode" value="Transparent"&gt;&lt;param name="Play" value="0"&gt;&lt;param name="Loop" value="-1"&gt;&lt;param name="Quality" value="High"&gt;&lt;param name="SAlign" value="LT"&gt;&lt;param name="Menu" value="-1"&gt;&lt;param name="Base" value=""&gt;&lt;param name="AllowScriptAccess" value="always"&gt;&lt;param name="Scale" value="NoScale"&gt;&lt;param name="DeviceFont" value="0"&gt;&lt;param name="EmbedMovie" value="0"&gt;&lt;param name="BGColor" value=""&gt;&lt;param name="SWRemote" value=""&gt;&lt;param name="MovieData" value=""&gt;&lt;param name="SeamlessTabbing" value="1"&gt;&lt;param name="Profile" value="0"&gt;&lt;param name="ProfileAddress" value=""&gt;&lt;param name="ProfilePort" value="0"&gt;&lt;param name="AllowNetworking" value="all"&gt;&lt;param name="AllowFullScreen" value="true"&gt; &lt;embed type="application/x-shockwave-flash" src="http://current.com/e/89048684/en_US" width="400" height="400" wmode="transparent" allowfullscreen="true" allowscriptaccess="always"&gt;&lt;/embed&gt;&lt;/object&gt;&lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt; Here is a link to the author's site with even more cool stuff: http://www.clemenskogler.net  &lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8205906155075223532-7355517750622371043?l=maykov.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://maykov.blogspot.com/feeds/7355517750622371043/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8205906155075223532&amp;postID=7355517750622371043' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8205906155075223532/posts/default/7355517750622371043'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8205906155075223532/posts/default/7355517750622371043'/><link rel='alternate' type='text/html' href='http://maykov.blogspot.com/2008/09/why-we-drink.html' title='Why we drink'/><author><name>Alexey Maykov</name><uri>http://www.blogger.com/profile/06295810199316638919</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://bp2.blogger.com/_xAF3Lelh2gI/R_CD78Tt0oI/AAAAAAAAAAM/oICc1U9O-kE/S220/amaykov.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8205906155075223532.post-739690915289121063</id><published>2008-09-15T20:55:00.001-07:00</published><updated>2008-09-15T20:55:04.193-07:00</updated><title type='text'>My Readership</title><content type='html'>&lt;p&gt;Interestingly, the search engine query which brings the most visitors to my site is &amp;quot;japan[ese] pornsite&amp;quot;. (See the related post &lt;a href="http://maykov.blogspot.com/2008/06/japan-780000-porn-site-hits-earn.html"&gt;here&lt;/a&gt;). May be, I should change the topic of my blog to something more mundane and timeless...&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8205906155075223532-739690915289121063?l=maykov.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://maykov.blogspot.com/feeds/739690915289121063/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8205906155075223532&amp;postID=739690915289121063' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8205906155075223532/posts/default/739690915289121063'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8205906155075223532/posts/default/739690915289121063'/><link rel='alternate' type='text/html' href='http://maykov.blogspot.com/2008/09/my-readership.html' title='My Readership'/><author><name>Alexey Maykov</name><uri>http://www.blogger.com/profile/06295810199316638919</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://bp2.blogger.com/_xAF3Lelh2gI/R_CD78Tt0oI/AAAAAAAAAAM/oICc1U9O-kE/S220/amaykov.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8205906155075223532.post-610617377759473508</id><published>2008-09-08T21:26:00.001-07:00</published><updated>2008-09-08T21:26:03.200-07:00</updated><title type='text'>Relational Databases are Considered Harmful for IR Systems</title><content type='html'>&lt;p&gt;When building a new system, most developers would throw in&amp;#160; a relational database without thinking too hard bout it. After all, RDBMS are designed to deal with data. However, they may be not a good fit for some kind of systems. I'm talking about systems which deal with terabytes of (text) data. I'm referring to such systems as IR (Informational Retrieval) systems, but I'm not limiting this discussion only to IR. Of course, there are tons of systems where SQL server is still useful. Let's talk about concrete examples of both. We will use a web search engine as an example of an IR system. We will use a banking software (ie the software which bank runs) as an example of a traditional system. I'm going to briefly compare two in several dimensions. Later on, I will go into details on each.&lt;/p&gt;  &lt;table cellspacing="0" cellpadding="2" width="399" border="1"&gt;&lt;tbody&gt;     &lt;tr&gt;       &lt;td valign="top" width="134"&gt;&amp;#160;&lt;/td&gt;        &lt;td valign="top" width="131"&gt;WEB Search Engine&lt;/td&gt;        &lt;td valign="top" width="132"&gt;Bank System&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="135"&gt;Number of simultaneous writers&lt;/td&gt;        &lt;td valign="top" width="131"&gt;one&lt;/td&gt;        &lt;td valign="top" width="131"&gt;thousands&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="136"&gt;Number of simultaneous readers&lt;/td&gt;        &lt;td valign="top" width="131"&gt;thousands&lt;/td&gt;        &lt;td valign="top" width="131"&gt;millions&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="136"&gt;Cost of a Data Loss (per GB of data)&lt;/td&gt;        &lt;td valign="top" width="131"&gt;low&lt;/td&gt;        &lt;td valign="top" width="131"&gt;extremely high&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="136"&gt;Data Consistency&lt;/td&gt;        &lt;td valign="top" width="131"&gt;not required&lt;/td&gt;        &lt;td valign="top" width="131"&gt;required&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="136"&gt;Data Volume&lt;/td&gt;        &lt;td valign="top" width="131"&gt;Terabytes&lt;/td&gt;        &lt;td valign="top" width="131"&gt;Gigabytes&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="136"&gt;Transactions&lt;/td&gt;        &lt;td valign="top" width="131"&gt;Few long-running&lt;/td&gt;        &lt;td valign="top" width="132"&gt;Multiple short-running&lt;/td&gt;     &lt;/tr&gt;   &lt;/tbody&gt;&lt;/table&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;In a bank system, thousands of users make deposits, transfers and so on. At the same time, in a web search engine there is only one (or a handful) of processes which update the index. &lt;/p&gt;  &lt;p&gt;A Web search engine has an extremely high number of simultaneous users. A banking system has a moderate number.&lt;/p&gt;  &lt;p&gt;A loss of any data my be catastrophic for bank and may potentially disrupt the business. A search engine can afford loosing some of its index. At the very worst, it will recrawl affected pages.&lt;/p&gt;  &lt;p&gt;It is important that data is consistent across all bank databases. A canonical transactional processing example involves a money transfer from one account to another. Transactions like this should be either completed or not. Money should be withdrawn from one account and deposited into another. Cases when the money is only withdrawn or only deposited are not allowed. For the web search engine, on contrary, it is quite alright when the index doesn't reflect the latest changes on the web. The update may be lagging several days. Different copies of the index may have different versions of the same page and this is fine.&lt;/p&gt;  &lt;p&gt;In a banks there are lots of simultaneous short running transactions&amp;#160; such as money transfers, withdrawals, deposits. A web search engine does transactions on big batches of data. I must say, this item is more of an implementation detail.&lt;/p&gt;  &lt;p&gt;Finally, the volume of data in a search engine is much higher.&lt;/p&gt;  &lt;p&gt;Clearly, a RDBMS supports banking scenario well. A wealth of thought has been put into allowing multiple simultaneous transactions.Under no circumstances, these transactions will put a database into inconsistent state. A typical RDBMS allows setups with high protection against data loss. These properties of RDBMS are not needed for a search engine. The also come with a high price both in terms of money and performance. Paying this price when not needed is hardly justified. For instance, typical SQL server configuration is several times more expensive then a typical commodity server. SQL server overhead in terms of a disk storage may be an order of a magnitude (SQL overhead, RAID mirroring, dump space, log space, etc..). &lt;/p&gt;  &lt;p&gt; What can be used as an alternative to RDBMS for building large IR systems. I'm not sure as I'm only starting in this field. Feel free to suggest. One alternative would be to build your own custom data structures. I'm going to demonstrate this on several examples in coming posts. Right now, I'm thinking about describing an inverted index and a web crawler. Feel free to suggest your own.&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8205906155075223532-610617377759473508?l=maykov.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://maykov.blogspot.com/feeds/610617377759473508/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8205906155075223532&amp;postID=610617377759473508' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8205906155075223532/posts/default/610617377759473508'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8205906155075223532/posts/default/610617377759473508'/><link rel='alternate' type='text/html' href='http://maykov.blogspot.com/2008/09/relational-databases-are-considered.html' title='Relational Databases are Considered Harmful for IR Systems'/><author><name>Alexey Maykov</name><uri>http://www.blogger.com/profile/06295810199316638919</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://bp2.blogger.com/_xAF3Lelh2gI/R_CD78Tt0oI/AAAAAAAAAAM/oICc1U9O-kE/S220/amaykov.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8205906155075223532.post-6324866209629467907</id><published>2008-08-06T11:48:00.001-07:00</published><updated>2008-08-06T11:48:37.096-07:00</updated><title type='text'>Google's street view flashmob</title><content type='html'>&lt;p&gt;&lt;a href="http://lh6.ggpht.com/maykov/SJnyAxYxtcI/AAAAAAAAACw/K5hVgRAqVso/s1600-h/image%5B3%5D.png"&gt;&lt;img style="border-right: 0px; border-top: 0px; border-left: 0px; border-bottom: 0px" height="133" alt="image" src="http://lh6.ggpht.com/maykov/SJnyBE58aUI/AAAAAAAAAC0/zZU4qoNsUFY/image_thumb%5B1%5D.png?imgmax=800" width="235" border="0" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;Somehow, Google employees found out about when their street was going to be pictured for the street view. So, they came out and started to act out: &lt;a title="http://maps.google.com/?ie=UTF8&amp;amp;layer=c&amp;amp;cbll=37.420887,-122.083965&amp;amp;panoid=0JwQNpGw9ctY-fZ7BA0_dA&amp;amp;cbp=1,6.385842491873461,,1,9.789184299225807&amp;amp;ll=37.425781,-122.084069&amp;amp;spn=0.009457,0.014248&amp;amp;z=16&amp;amp;source=embed" href="http://maps.google.com/?ie=UTF8&amp;amp;layer=c&amp;amp;cbll=37.420887,-122.083965&amp;amp;panoid=0JwQNpGw9ctY-fZ7BA0_dA&amp;amp;cbp=1,6.385842491873461,,1,9.789184299225807&amp;amp;ll=37.425781,-122.084069&amp;amp;spn=0.009457,0.014248&amp;amp;z=16&amp;amp;source=embed"&gt;http://maps.google.com/?ie=UTF8&amp;amp;layer=c&amp;amp;cbll=37.420887,-122.083965&amp;amp;panoid=0JwQNpGw9ctY-fZ7BA0_dA&amp;amp;cbp=1,6.385842491873461,,1,9.789184299225807&amp;amp;ll=37.425781,-122.084069&amp;amp;spn=0.009457,0.014248&amp;amp;z=16&amp;amp;source=embed&lt;/a&gt;&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8205906155075223532-6324866209629467907?l=maykov.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://maykov.blogspot.com/feeds/6324866209629467907/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8205906155075223532&amp;postID=6324866209629467907' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8205906155075223532/posts/default/6324866209629467907'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8205906155075223532/posts/default/6324866209629467907'/><link rel='alternate' type='text/html' href='http://maykov.blogspot.com/2008/08/google-street-view-flashmob.html' title='Google&amp;#39;s street view flashmob'/><author><name>Alexey Maykov</name><uri>http://www.blogger.com/profile/06295810199316638919</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://bp2.blogger.com/_xAF3Lelh2gI/R_CD78Tt0oI/AAAAAAAAAAM/oICc1U9O-kE/S220/amaykov.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh6.ggpht.com/maykov/SJnyBE58aUI/AAAAAAAAAC0/zZU4qoNsUFY/s72-c/image_thumb%5B1%5D.png?imgmax=800' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8205906155075223532.post-1720613594832405882</id><published>2008-07-15T00:11:00.001-07:00</published><updated>2008-07-15T00:11:32.690-07:00</updated><title type='text'>Parallel Crawlers</title><content type='html'>&lt;p&gt;Junghoo Cho and Hector Garcia-Molina &lt;a href="http://oak.cs.ucla.edu/papers/cho-parallel.pdf"&gt;&amp;quot;Parallel Crawlers.&amp;quot;&lt;/a&gt; &lt;em&gt;In Proceedings of the 11th World Wide Web conference (WWW11),&lt;/em&gt; Honolulu, Hawaii, May 2002.&lt;/p&gt;  &lt;p&gt;The paper mainly concerns different partitioning/communication techniques in building parallel crawlers. The space of all URLs to crawl may be partitioned by: URL hash, host hash, etc. Crawlers running on multiple machines may send links to each other, may crawl all links instead of sending them or may ignore links which are outside of their partition. The paper studies pros and cons of each approach.&lt;/p&gt;  &lt;p&gt;I think, that the paper is lacking on discussing performance, fault tolerance. But it is nice to have papers like this as they help generate/validate ideas.&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8205906155075223532-1720613594832405882?l=maykov.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://maykov.blogspot.com/feeds/1720613594832405882/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8205906155075223532&amp;postID=1720613594832405882' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8205906155075223532/posts/default/1720613594832405882'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8205906155075223532/posts/default/1720613594832405882'/><link rel='alternate' type='text/html' href='http://maykov.blogspot.com/2008/07/parallel-crawlers.html' title='Parallel Crawlers'/><author><name>Alexey Maykov</name><uri>http://www.blogger.com/profile/06295810199316638919</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://bp2.blogger.com/_xAF3Lelh2gI/R_CD78Tt0oI/AAAAAAAAAAM/oICc1U9O-kE/S220/amaykov.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8205906155075223532.post-8419672992192903863</id><published>2008-06-26T15:02:00.001-07:00</published><updated>2008-06-26T15:02:24.924-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='life'/><title type='text'>Japan: 780,000 porn site hits earn demotion</title><content type='html'>&lt;p&gt;A Japanese civil servant was demoted for logging more than 780,000 hits on pornographic Web sites on his office computer over nine months, an official said Friday... the man logged 170,000 hits on porn sites in July alone.&lt;/p&gt;  &lt;p&gt;&lt;a title="http://www.msnbc.msn.com/id/24426097/" href="http://www.msnbc.msn.com/id/24426097/"&gt;http://www.msnbc.msn.com/id/24426097/&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;This totals to about 14 hits every minute! This man was either running a porn crawler or some spy-ware was hitting these sites.&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8205906155075223532-8419672992192903863?l=maykov.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://maykov.blogspot.com/feeds/8419672992192903863/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8205906155075223532&amp;postID=8419672992192903863' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8205906155075223532/posts/default/8419672992192903863'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8205906155075223532/posts/default/8419672992192903863'/><link rel='alternate' type='text/html' href='http://maykov.blogspot.com/2008/06/japan-780000-porn-site-hits-earn.html' title='Japan: 780,000 porn site hits earn demotion'/><author><name>Alexey Maykov</name><uri>http://www.blogger.com/profile/06295810199316638919</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://bp2.blogger.com/_xAF3Lelh2gI/R_CD78Tt0oI/AAAAAAAAAAM/oICc1U9O-kE/S220/amaykov.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8205906155075223532.post-5215345367424168673</id><published>2008-06-24T11:45:00.001-07:00</published><updated>2008-06-24T11:45:59.960-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='life'/><title type='text'>Commute: Gas Prices</title><content type='html'>&lt;p&gt;Met a suburban driver at a gas station today. It takes $180 to fill up her Suburban. I noticed, she didn't fill it up this time. Another tidbid: &lt;a href="http://online.wsj.com/public/article/SB121426584632198453-e_9l_obc2pM_opeqd_Q3hQnnJn0_20090624.html?mod=rss_free"&gt;San-Diegans drive to Mexico to buy gas&lt;/a&gt;.&lt;/p&gt;  &lt;p&gt;With a high price of housing, some people drive till they qualify. Let's say, somebody bought a house in Snoqualmie ridge and commutes to the Microsoft main campus every day. It is 50 miles per day or 1100 miles per month. &lt;a href="http://www.fueleconomy.gov/feg/bymodel/2005_GMC_Yukon.shtml"&gt;GMC Yukon 1500 AWD 8 cyl, 6 L, Automatic 4-spd, Premium&lt;/a&gt; consumes&amp;#160;&amp;#160; 16 miles per gallon. At a fuel price of $5 per gallon (got to think ahead), it'll be $343 per month or $4125 a year for fuel. This will buy at most $80000 more of a hous (at APR of 5% and no interest). The gas prices have at least double to stop the urban sprawl.&lt;/p&gt;  &lt;p&gt;Interestingly, you pay not only the gas but also your time for a long commute. Here is another perspective: &lt;a title="http://noisetank.com/hugeasscity/2008/04/14/busting-the-drive-till-you-qualify-myth/" href="http://noisetank.com/hugeasscity/2008/04/14/busting-the-drive-till-you-qualify-myth/"&gt;http://noisetank.com/hugeasscity/2008/04/14/busting-the-drive-till-you-qualify-myth/&lt;/a&gt;&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8205906155075223532-5215345367424168673?l=maykov.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://maykov.blogspot.com/feeds/5215345367424168673/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8205906155075223532&amp;postID=5215345367424168673' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8205906155075223532/posts/default/5215345367424168673'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8205906155075223532/posts/default/5215345367424168673'/><link rel='alternate' type='text/html' href='http://maykov.blogspot.com/2008/06/commute-gas-prices.html' title='Commute: Gas Prices'/><author><name>Alexey Maykov</name><uri>http://www.blogger.com/profile/06295810199316638919</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://bp2.blogger.com/_xAF3Lelh2gI/R_CD78Tt0oI/AAAAAAAAAAM/oICc1U9O-kE/S220/amaykov.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8205906155075223532.post-1474951318548656719</id><published>2008-06-23T23:34:00.001-07:00</published><updated>2008-06-23T23:34:32.009-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='life'/><title type='text'>Trying out Public Transportation</title><content type='html'>&lt;p&gt;I was trying a public transportation in one form or another in the last week. Mostly I was using Microsoft provided shuttle buses. These buses are designed for carrying people from one building to another with the exception of a &lt;a href="http://www.microsoft.com/presspass/misc/09-06connectorFS.mspx"&gt;connector. The connector&lt;/a&gt; is a network of luxury coaches for getting to and from work. Microsoft also provides a free pass for all kinds of public transportation. Needless to say, public transit is scarce. There is a bus which stops at a walking distance from my home at Redmond Educational Hill&amp;#160; and goes directly to the downtown Bellevue where I work. The very first outgoing bus of a day makes a stop at 5am and the last one stops at 8am. Clearly, this doesn't work for me. The remaining option is to take a connector and then a shuttle. This takes an hour, while driving takes 20 minutes. I really miss subways in Moscow. Tomorrow I'm going to try biking to work. Another thing to try would be to get one of the &lt;a href="http://transit.metrokc.gov/tops/van-car/vanpool.html"&gt;vanpool&lt;/a&gt; vans. King county provides a van for people who would like to carpool. The van includes maintenance and gas. And the cost is covered my the Microsoft bus pass. I have at least 4 other Microsoft friends living nearby.&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8205906155075223532-1474951318548656719?l=maykov.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://maykov.blogspot.com/feeds/1474951318548656719/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8205906155075223532&amp;postID=1474951318548656719' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8205906155075223532/posts/default/1474951318548656719'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8205906155075223532/posts/default/1474951318548656719'/><link rel='alternate' type='text/html' href='http://maykov.blogspot.com/2008/06/trying-out-public-transportation.html' title='Trying out Public Transportation'/><author><name>Alexey Maykov</name><uri>http://www.blogger.com/profile/06295810199316638919</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://bp2.blogger.com/_xAF3Lelh2gI/R_CD78Tt0oI/AAAAAAAAAAM/oICc1U9O-kE/S220/amaykov.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8205906155075223532.post-2179501033061447413</id><published>2008-05-20T13:33:00.000-07:00</published><updated>2008-05-20T13:36:22.539-07:00</updated><title type='text'>Egg-throwing Student</title><content type='html'>http://www.youtube.com/watch?v=mtBQ4UCXQeo&lt;br /&gt;&lt;br /&gt;Threw some eggs at Steve Ballmer.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8205906155075223532-2179501033061447413?l=maykov.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://maykov.blogspot.com/feeds/2179501033061447413/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8205906155075223532&amp;postID=2179501033061447413' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8205906155075223532/posts/default/2179501033061447413'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8205906155075223532/posts/default/2179501033061447413'/><link rel='alternate' type='text/html' href='http://maykov.blogspot.com/2008/05/egg-throwing-student.html' title='Egg-throwing Student'/><author><name>Alexey Maykov</name><uri>http://www.blogger.com/profile/06295810199316638919</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://bp2.blogger.com/_xAF3Lelh2gI/R_CD78Tt0oI/AAAAAAAAAAM/oICc1U9O-kE/S220/amaykov.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8205906155075223532.post-1310747431989982033</id><published>2008-05-19T22:38:00.001-07:00</published><updated>2008-05-19T22:38:23.251-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='blog crawl'/><category scheme='http://www.blogger.com/atom/ns#' term='www08'/><category scheme='http://www.blogger.com/atom/ns#' term='algorithms'/><title type='text'>Trading Speed for Memory and for Latency</title><content type='html'>&lt;p&gt;It is a widely known that you can trade memory for speed in some circumstances. For instance, instead of doing lengthy computations each time, pre-compute, store results in a memory and lookup the result at a run time. Another example is caching: RAM or disk. A less known fact is that sometimes you can significantly speed things up by the price of allowing more latency. For instance, instead of sending data over the wire, burn them on DVDs and ship them. &lt;/p&gt;  &lt;p&gt;Here is an interesting technique which is often get used in crawlers and was described in the IRLBot paper: &lt;a title="IRLbot- Scaling to 6 Billion Pages and Beyond" href="http://irl.cs.tamu.edu/people/hsin-tsang/papers/www2008.pdf"&gt;IRLbot- Scaling to 6 Billion Pages and Beyond&lt;/a&gt;. The problem is that for each new URL, you need to determine among other things, if you have already crawled it. It is logical that you will need to have a database of seen URLs. The number of lookups and/or inserts per second will determine a potential crawler bottleneck.&amp;#160; A conventional technique of a DB lookup per URL will soon hit a wall of scale as a number of Seen URLs reaches into billions.&amp;#160; Instead, lookup request can be batched and than executed all at once by doing merge. Merge is a much for complex operation than lookup. However, lots of requests may be executed in one merge operation while each request will require a separate lookup. &lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8205906155075223532-1310747431989982033?l=maykov.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://maykov.blogspot.com/feeds/1310747431989982033/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8205906155075223532&amp;postID=1310747431989982033' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8205906155075223532/posts/default/1310747431989982033'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8205906155075223532/posts/default/1310747431989982033'/><link rel='alternate' type='text/html' href='http://maykov.blogspot.com/2008/05/trading-speed-for-memory-and-for.html' title='Trading Speed for Memory and for Latency'/><author><name>Alexey Maykov</name><uri>http://www.blogger.com/profile/06295810199316638919</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://bp2.blogger.com/_xAF3Lelh2gI/R_CD78Tt0oI/AAAAAAAAAAM/oICc1U9O-kE/S220/amaykov.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8205906155075223532.post-8945100111746005853</id><published>2008-04-22T19:03:00.001-07:00</published><updated>2008-04-22T19:03:45.045-07:00</updated><title type='text'>WWW Opening ceremony</title><content type='html'>&lt;p&gt;&lt;a href="http://lh3.ggpht.com/maykov/SA6Y-r5-MUI/AAAAAAAAABY/ROooHV2e8vc/wwwOpeningzCeremony%5B3%5D.jpg"&gt;&lt;img style="border-right: 0px; border-top: 0px; border-left: 0px; border-bottom: 0px" height="308" alt="wwwOpeningzCeremony" src="http://lh4.ggpht.com/maykov/SA6Y_75-MVI/AAAAAAAAABg/suDSvBFlpIg/wwwOpeningzCeremony_thumb%5B1%5D.jpg" width="772" border="0" /&gt;&lt;/a&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;What was the reason behind one of the dancers being a male and all others female? I hope, not an attempt to play by political correctness rules.&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8205906155075223532-8945100111746005853?l=maykov.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://maykov.blogspot.com/feeds/8945100111746005853/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8205906155075223532&amp;postID=8945100111746005853' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8205906155075223532/posts/default/8945100111746005853'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8205906155075223532/posts/default/8945100111746005853'/><link rel='alternate' type='text/html' href='http://maykov.blogspot.com/2008/04/www-opening-ceremony.html' title='WWW Opening ceremony'/><author><name>Alexey Maykov</name><uri>http://www.blogger.com/profile/06295810199316638919</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://bp2.blogger.com/_xAF3Lelh2gI/R_CD78Tt0oI/AAAAAAAAAAM/oICc1U9O-kE/S220/amaykov.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh4.ggpht.com/maykov/SA6Y_75-MVI/AAAAAAAAABg/suDSvBFlpIg/s72-c/wwwOpeningzCeremony_thumb%5B1%5D.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8205906155075223532.post-6368870192354876462</id><published>2008-04-22T19:01:00.001-07:00</published><updated>2008-04-22T19:01:11.637-07:00</updated><title type='text'>Kai Fu Lee Keynote</title><content type='html'>&lt;p&gt;Mainly, Google propoganda, reiterating known things.&lt;/p&gt;  &lt;p&gt;Cloud computing, everything moves to the cloud, same old story. He showed a sample when Google's machine translation from Chinese into English is better than it's competitor. This shouldn't belong to a keynote speech.&lt;/p&gt;  &lt;p&gt;One thing which I didn't know, Google can deploy a new data center in 3 days. Their boxes (i.e. machines) do not have boxes :). Google came up with a way to stuff motherboards directly into racks. It is unclear where hard drives are located. &lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8205906155075223532-6368870192354876462?l=maykov.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://maykov.blogspot.com/feeds/6368870192354876462/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8205906155075223532&amp;postID=6368870192354876462' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8205906155075223532/posts/default/6368870192354876462'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8205906155075223532/posts/default/6368870192354876462'/><link rel='alternate' type='text/html' href='http://maykov.blogspot.com/2008/04/kai-fu-lee-keynote.html' title='Kai Fu Lee Keynote'/><author><name>Alexey Maykov</name><uri>http://www.blogger.com/profile/06295810199316638919</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://bp2.blogger.com/_xAF3Lelh2gI/R_CD78Tt0oI/AAAAAAAAAAM/oICc1U9O-kE/S220/amaykov.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8205906155075223532.post-1641587587770184072</id><published>2008-04-20T09:31:00.001-07:00</published><updated>2008-04-20T09:31:37.208-07:00</updated><title type='text'>@Beijing for WWW '08</title><content type='html'>&lt;p&gt;The livejournal is still denied for access from China. So much for &lt;a href="http://www2008.org/index.html"&gt;&amp;quot;One World, One Web&amp;quot;,&lt;/a&gt; the theme of the conference.&lt;/p&gt;  &lt;p&gt;Great tutorial day tomorrow&lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh4.ggpht.com/maykov/SAtv39clxBI/AAAAAAAAABI/fLUQuflSNwU/rubbish%5B2%5D.jpg"&gt;&lt;img style="border-right: 0px; border-top: 0px; border-left: 0px; border-bottom: 0px" height="100" alt="rubbish" src="http://lh4.ggpht.com/maykov/SAtv49clxCI/AAAAAAAAABQ/DzsebCD1Pv0/rubbish_thumb.jpg" width="244" border="0" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8205906155075223532-1641587587770184072?l=maykov.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://maykov.blogspot.com/feeds/1641587587770184072/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8205906155075223532&amp;postID=1641587587770184072' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8205906155075223532/posts/default/1641587587770184072'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8205906155075223532/posts/default/1641587587770184072'/><link rel='alternate' type='text/html' href='http://maykov.blogspot.com/2008/04/beijing-for-www.html' title='@Beijing for WWW &amp;#39;08'/><author><name>Alexey Maykov</name><uri>http://www.blogger.com/profile/06295810199316638919</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://bp2.blogger.com/_xAF3Lelh2gI/R_CD78Tt0oI/AAAAAAAAAAM/oICc1U9O-kE/S220/amaykov.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh4.ggpht.com/maykov/SAtv49clxCI/AAAAAAAAABQ/DzsebCD1Pv0/s72-c/rubbish_thumb.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8205906155075223532.post-6402095749520800651</id><published>2008-04-16T18:09:00.000-07:00</published><updated>2008-04-16T18:11:55.983-07:00</updated><title type='text'>PhD in Computer Science</title><content type='html'>A great essay on PhD in CS:&lt;br /&gt;&lt;br /&gt;&lt;a href="http://www-2.cs.cmu.edu/~harchol/gradschooltalk.pdf"&gt;http://www-2.cs.cmu.edu/~harchol/gradschooltalk.pdf&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8205906155075223532-6402095749520800651?l=maykov.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://maykov.blogspot.com/feeds/6402095749520800651/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8205906155075223532&amp;postID=6402095749520800651' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8205906155075223532/posts/default/6402095749520800651'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8205906155075223532/posts/default/6402095749520800651'/><link rel='alternate' type='text/html' href='http://maykov.blogspot.com/2008/04/phd-in-computer-science.html' title='PhD in Computer Science'/><author><name>Alexey Maykov</name><uri>http://www.blogger.com/profile/06295810199316638919</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://bp2.blogger.com/_xAF3Lelh2gI/R_CD78Tt0oI/AAAAAAAAAAM/oICc1U9O-kE/S220/amaykov.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8205906155075223532.post-763687486882857540</id><published>2008-04-15T00:00:00.001-07:00</published><updated>2008-04-15T00:00:54.748-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='blog crawl'/><category scheme='http://www.blogger.com/atom/ns#' term='social media'/><title type='text'>TREC Blog Track. Nominal? fee.</title><content type='html'>&lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;&lt;a href="http://ir.dcs.gla.ac.uk/test_collections/access_to_data.html"&gt;Price&lt;/a&gt;&amp;#160; &amp;#163;400 = $787&lt;/p&gt;  &lt;p&gt;Total Size: 135 gB on 3.5&amp;quot; hard drive=$70 for hard drive&lt;/p&gt;  &lt;p&gt;&lt;a href="http://ir.dcs.gla.ac.uk/test_collections/blog06info.html"&gt;Number of HTTP requests&lt;/a&gt;: 324,880 homepages + 3,215,171 permalinks + 100,649 feeds * 11 weeks = 4,647,190&lt;/p&gt;  &lt;p&gt;This number of requests in 11 weeks results from about 41 requests per minute or less than one request per second. I am averaging 20-50 requests per second on on my office box. My home Comcast cable service allows only 2 simultaneous requests and a page load every 2 seconds is not a wild assumption. The service cost for 11 weeks is about $100. So, you cane save around $600 minus unknown amount of dev and ops cost if you choose to run your own crawler. However, you won't be able to submit a paper to TREC.&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8205906155075223532-763687486882857540?l=maykov.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://maykov.blogspot.com/feeds/763687486882857540/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8205906155075223532&amp;postID=763687486882857540' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8205906155075223532/posts/default/763687486882857540'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8205906155075223532/posts/default/763687486882857540'/><link rel='alternate' type='text/html' href='http://maykov.blogspot.com/2008/04/trec-blog-track-nominal-fee.html' title='TREC Blog Track. Nominal? fee.'/><author><name>Alexey Maykov</name><uri>http://www.blogger.com/profile/06295810199316638919</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://bp2.blogger.com/_xAF3Lelh2gI/R_CD78Tt0oI/AAAAAAAAAAM/oICc1U9O-kE/S220/amaykov.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8205906155075223532.post-7745101921046072163</id><published>2008-04-14T23:26:00.001-07:00</published><updated>2008-04-14T23:27:09.535-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='icwsm2008'/><category scheme='http://www.blogger.com/atom/ns#' term='conference'/><category scheme='http://www.blogger.com/atom/ns#' term='social media'/><category scheme='http://www.blogger.com/atom/ns#' term='icwsm'/><title type='text'>ICWSM Papers</title><content type='html'>&lt;p&gt;Here are some highlights.&lt;/p&gt;  &lt;p&gt;Lots of papers on &lt;strong&gt;LSA-LDA&lt;/strong&gt;. I am fascinating by the elegance and non-supervision(?) of this technique.&lt;/p&gt;  &lt;p&gt; &amp;#8226;&lt;a href="http://www.cs.cmu.edu/~wcohen/postscript/icwsm-2007-ramesh.pdf"&gt;Link-PLSA-LDA: A new unsupervised model for topics and influence of blogs&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;&amp;#8226;The Psychology of Word Use in Depression Forums in English and in Spanish&lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;a href="http://trec.nist.gov/"&gt;TREC&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;This is a conference where all papers are about solving a fixed set of problems on a given set of data. So, everybody understands 100% of all papers and fully engaged. &lt;/p&gt;  &lt;p&gt;&amp;#8226;&lt;a href="http://www.cs.cmu.edu/~jaime/ICWSM1ArguelloJ.pdf"&gt;Document Representation and Query Expansion Models for Blog Recommendation&lt;/a&gt; basically, the winner of Blog'07&lt;/p&gt;  &lt;p&gt;&amp;#8226;On TREC Blog Track. The dataset is available for a nominal fee of 400 pounds.&lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;With over 200 world languages and the most widely used language being oddly the simplest one, I thought that good language coverage can be achieved only by teams of zillion people. However, there are some tricks with machine translation:&lt;/p&gt;  &lt;p&gt;&amp;#8226;&lt;a href="http://www.cs.sunysb.edu/~mbautin/pdf/int_senti_analysis.pdf"&gt;International Sentiment Analysis for News and Blogs&lt;/a&gt;&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8205906155075223532-7745101921046072163?l=maykov.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://maykov.blogspot.com/feeds/7745101921046072163/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8205906155075223532&amp;postID=7745101921046072163' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8205906155075223532/posts/default/7745101921046072163'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8205906155075223532/posts/default/7745101921046072163'/><link rel='alternate' type='text/html' href='http://maykov.blogspot.com/2008/04/icwsm-papers.html' title='ICWSM Papers'/><author><name>Alexey Maykov</name><uri>http://www.blogger.com/profile/06295810199316638919</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://bp2.blogger.com/_xAF3Lelh2gI/R_CD78Tt0oI/AAAAAAAAAAM/oICc1U9O-kE/S220/amaykov.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8205906155075223532.post-9109562887291143305</id><published>2008-04-12T12:07:00.001-07:00</published><updated>2008-04-12T12:07:47.149-07:00</updated><title type='text'>Keeping on</title><content type='html'>&lt;p&gt;I must say that keeping the blog going is much harder than starting one. You start all excited and jazzed up and then do not have a motivation to keep going. I saw lots of blogs with only a few entries. It would be an interesting study to see what the distribution of the total number of posts look like and check if there are any correlations with other parameters.&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8205906155075223532-9109562887291143305?l=maykov.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://maykov.blogspot.com/feeds/9109562887291143305/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8205906155075223532&amp;postID=9109562887291143305' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8205906155075223532/posts/default/9109562887291143305'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8205906155075223532/posts/default/9109562887291143305'/><link rel='alternate' type='text/html' href='http://maykov.blogspot.com/2008/04/keeping-on.html' title='Keeping on'/><author><name>Alexey Maykov</name><uri>http://www.blogger.com/profile/06295810199316638919</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://bp2.blogger.com/_xAF3Lelh2gI/R_CD78Tt0oI/AAAAAAAAAAM/oICc1U9O-kE/S220/amaykov.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8205906155075223532.post-461761739727804638</id><published>2008-03-30T19:22:00.000-07:00</published><updated>2008-03-30T23:11:40.628-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='icwsm2008'/><category scheme='http://www.blogger.com/atom/ns#' term='conference'/><category scheme='http://www.blogger.com/atom/ns#' term='social media'/><category scheme='http://www.blogger.com/atom/ns#' term='icwsm'/><title type='text'>ICWSM '08, Day 0</title><content type='html'>Today I attended a Day 0 of &lt;a href="http://icwsm.org/2008/index.shtml"&gt;ICWSM '08&lt;/a&gt;. ICWSM is the conference organized by a group of researches in the Social Media area along with Matt Hurst.&lt;br /&gt;&lt;br /&gt;Both tutorials were quite interesting. Jan Wiebe talked about subjectivity and sentiment analysis. According to some conference attendees, sentiment is a hot topic right now. Lots of business people would like to use the results of a sentiment mining. However, they're sceptical about how reliable these results are. Jan provided lots of details about different type of sentiment and objectivity-related annotations which exists for a data. The analysis part though, wasn't quite evolved.&lt;br /&gt;&lt;br /&gt;Mary McGlohon talked about different ways to study big graphs. Examples of big graphs are: web-links graph, frend graphs on social networks, blog links. There are interesting properties of measures of such graphs, such as diameter, in/out-degree. Since graphs are also matrices, diagonalization tricks such as SVD can be applied, which lead to interesting results. Interestingly, the evolution of graphs in time can be studied using tensor analysis. Turns out, tensor is a three-dimensional matrix. I really should've paid more attention to my Math and Physics classes instead of hacking C++ code! I highly recommend checking out Mary's slides at &lt;a href="http://icwsm.org/2008/tutorials.shtml"&gt;http://icwsm.org/2008/tutorials.shtml&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Tomorrow will be a great day which will start with a talk by a LiveJournal founder Brad Fitzpatrick. Stay tuned for more updates!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8205906155075223532-461761739727804638?l=maykov.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://maykov.blogspot.com/feeds/461761739727804638/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8205906155075223532&amp;postID=461761739727804638' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8205906155075223532/posts/default/461761739727804638'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8205906155075223532/posts/default/461761739727804638'/><link rel='alternate' type='text/html' href='http://maykov.blogspot.com/2008/03/icwsm-08-day-0.html' title='ICWSM &apos;08, Day 0'/><author><name>Alexey Maykov</name><uri>http://www.blogger.com/profile/06295810199316638919</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://bp2.blogger.com/_xAF3Lelh2gI/R_CD78Tt0oI/AAAAAAAAAAM/oICc1U9O-kE/S220/amaykov.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8205906155075223532.post-8849075458646970081</id><published>2008-03-29T23:08:00.000-07:00</published><updated>2008-03-30T23:11:53.152-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='bio'/><title type='text'>Introduction</title><content type='html'>&lt;p&gt;Hello!&lt;/p&gt;  &lt;p&gt;My name is Alexey Maykov.&amp;#160; I live and work in Redmond, WA. I've been with Microsoft for 7 years. Last two years I spent in &lt;a href="http://labs.live.com/Live+Labs+Manifesto.aspx"&gt;Live Labs&lt;/a&gt;. &lt;/p&gt;  &lt;p&gt;I'm working on data mining Social Media. My immediate coworkers include &lt;a href="http://datamining.typepad.com/"&gt;Matt Hurst&lt;/a&gt;, &lt;a href="http://micro-workflow.com/"&gt;Dragos Manolescu&lt;/a&gt; and &lt;a href="http://justinrudd.wordpress.com/"&gt;Justin Rudd&lt;/a&gt;. The rest of the team doesn't have blogs yet, so I won't reveal them yet. &lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8205906155075223532-8849075458646970081?l=maykov.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://maykov.blogspot.com/feeds/8849075458646970081/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8205906155075223532&amp;postID=8849075458646970081' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8205906155075223532/posts/default/8849075458646970081'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8205906155075223532/posts/default/8849075458646970081'/><link rel='alternate' type='text/html' href='http://maykov.blogspot.com/2008/03/introduction.html' title='Introduction'/><author><name>Alexey Maykov</name><uri>http://www.blogger.com/profile/06295810199316638919</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://bp2.blogger.com/_xAF3Lelh2gI/R_CD78Tt0oI/AAAAAAAAAAM/oICc1U9O-kE/S220/amaykov.jpg'/></author><thr:total>0</thr:total></entry></feed>
