<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
    <title>Michael Renner&apos;s macro-blog</title>
    <link rel="alternate" type="text/html" href="http://blogs.amd.co.at/robe/" />
    <link rel="self" type="application/atom+xml" href="http://blogs.amd.co.at/robe/atom.xml" />
    <id>tag:blogs.amd.co.at,2008-12-13:/robe//1</id>
    <updated>2010-04-30T17:22:17Z</updated>
    <subtitle>Since micro-blogs only take you so far...</subtitle>
    <generator uri="http://www.sixapart.com/movabletype/">Movable Type 4.23-en</generator>

<entry>
    <title>PostgreSQL talks in June</title>
    <link rel="alternate" type="text/html" href="http://blogs.amd.co.at/robe/2010/04/postgresql-talks-in-june.html" />
    <id>tag:blogs.amd.co.at,2010:/robe//1.24</id>

    <published>2010-04-30T16:36:35Z</published>
    <updated>2010-04-30T17:22:17Z</updated>

    <summary>Here&apos;s a short recap about my speaking engagements in June AMOOCON AMOOCON takes place from 4th to 6th June in Rostock, Germany. The conference has it&apos;s roots in the FOSS VoIP communities but has a more broad focus these days....</summary>
    <author>
        <name>Michael Renner</name>
        <uri>http://amd.co.at/</uri>
    </author>
    
    <category term="amoocon" label="amoocon" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="osdc" label="osdc" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="de" xml:base="http://blogs.amd.co.at/robe/">
        <![CDATA[<p>Here's a short recap about my speaking engagements in June</p>

<h2><span class="caps">AMOOCON</span></h2>

<p><span class="mt-enclosure mt-enclosure-image" style="display: inline;"><img alt="amoocon.png" src="http://blogs.amd.co.at/robe/2010/04/30/amoocon.png" width="231" height="99" class="mt-image-center" style="text-align: center; display: block; margin: 0 auto 20px;" /></span></p>

<p><a href="http://www.amoocon.de/"><span class="caps">AMOOCON</span></a> takes place from <b>4th to 6th June</b> in <b>Rostock, Germany</b>. The conference has it's roots in the <span class="caps">FOSS</span> VoIP communities but has a more broad focus these days. I will hold a "full length" PostgreSQL advocacy <a href="http://www.amoocon.de/talks/137">talk</a> as well as a 20 minute PostgreSQL 9.0 <a href="http://www.amoocon.de/talks/138">primer</a>.</p>



<h2>Netways <span class="caps">OSDC</span></h2>

<p><span class="mt-enclosure mt-enclosure-image" style="display: inline;"><img alt="Iam speaking_button_.png" src="http://blogs.amd.co.at/robe/Iam%20speaking_button_.png" width="300" height="234" class="mt-image-center" style="text-align: center; display: block; margin: 0 auto 20px;" /></span></p>

<p>It's my <a href="http://www.netways.de/osdc/y2009/programm/w/michael_renner_postgresql_repliziert_ein_ueberblick/">second time</a> at the <a href="http://www.netways.de/osdc/y2010/"><span class="caps">OSDC</span></a>, again in <b>Nuremberg, Germany</b> on <b>23rd &amp; 24th of June</b>. I'll hold a talk and give a short presentation on the native replication mechanisms PostgreSQL is going to provide with the upcoming 9.0 release.</p>


<p>As every year, there might be talks &amp; presentations on these topics at the <a href="http://metalab.at/">Metalab</a> shortly before or after the two events; follow me on <a href="http://twitter.com/terrorobe/">twitter</a> or watch the <a href="http://metalab.at/calendar/">event list</a> for updates if you're interested in these.</p>]]>
        
    </content>
</entry>

<entry>
    <title>Amsterdam, Volcanoes, Transport in Europe, Conferences, Projects and probably even more</title>
    <link rel="alternate" type="text/html" href="http://blogs.amd.co.at/robe/2010/04/amsterdam-volcanoes-transport-in-europe-conferences-projects-and-probably-even-more.html" />
    <id>tag:blogs.amd.co.at,2010:/robe//1.23</id>

    <published>2010-04-18T22:06:54Z</published>
    <updated>2010-04-18T23:08:58Z</updated>

    <summary>So it&apos;s April by now, more than half a year since my last post. Maybe this blogging business was just a temporary thing after all. I guess my reluctance to post anything new was caused by the lack of definitives...</summary>
    <author>
        <name>Michael Renner</name>
        <uri>http://amd.co.at/</uri>
    </author>
    
    
    <content type="html" xml:lang="de" xml:base="http://blogs.amd.co.at/robe/">
        <![CDATA[<p>So it's April by now, more than half a year since my last post. Maybe this blogging business was just a temporary thing after all. I guess my reluctance to post anything new was caused by the lack of definitives in my life as of lately, but this is hopefully changing in the near future.</p>]]>
        <![CDATA[<p>At the moment I'm sitting in Amsterdam, being grounded since Saturday evening by the Icelandic volcano business (which is 80% <span class="caps">FUD </span>and 20% facts by my guess), juggling the options of flying home to Vienna tomorrow (if they decide to open shop again and I can actually get a ticket) or just staying here till Thursday for my second round of PowerDNS trainings. I already ruled out going by car (one-way Amsterdam -&gt; Vienna is far too expensive for my tastings, and I won't be doing ~22 hours of driving in 4 days).</p>

<p>Going by train across Europe is still a complete no-go for the spoiled Web-generation I'd call myself a member of, both <a href="http://www.oebb.at/">oebb.at</a> <a href="http://www.ns.nl/">ns.nl</a> don't list fares and seat availability; <a href="http://www.nshispeed.nl/">nshispeed.nl</a> doesn't even know Austria! <a href="http://www.deutschebahn.com/">Deutschebahn.com</a> at least had pricing and seat (un)availability information for the trains from Frankfurt to Vienna, this information combined with first hands experience by a friend of mine ("The terror! The terror I've seen!") suggested that train was not the way to go.</p>

<p>So right now I'm wearing a lent Slipknot t-shirt, waiting for my laundry to finish and cancelling the appointments for the coming week.</p>


<h1>Definitives, possibilities and maybes</h1>

<p>I quit my job at Geizhals back in November because of cultural differences too large to bridge. Looking at my professional experience so far and the Austrian (especially Viennese) job market my best guess was starting to freelance. The support of the <a href="http://www.ams.or.at/"><span class="caps">AMS</span></a> in this regard and the aids for formation of a company in Austria by the <a href="http://www.wirtschaftskammer.at/">Wirtschaftskammer</a> made this step much easier than I had anticipated.</p>

<p>In the meanwhile I also got a few job offers which I'm looking into, the most notable being by the big G. I'll be over in Zurich at the beginning of May, so stay tuned for further updates.</p>

<p>There's also stuff happening in the Viennese Web20 area, the <a href="http://www.powerdns.com/">PowerDNS</a> front and I'm also pondering about startup ideas as of lately with <a href="http://fittl.com/">Lukas Fittl</a>. </p>

<p>Oh well, I hope I've got a sound plan by no later than the end of the year ;).</p>

<h1>Conferences</h1>

<p>Conferences! There's a confirmed <a href="http://amoocon.de/speakers/214">presence of me</a> at the <a href="http://www.amoocon.de/"><span class="caps">AMOOCON</span></a> and a pretty unconfirmed one at the <a href="http://www.netways.de/osdc/osdc_2010/program/">Netways <span class="caps">OSDC</span></a>. All talks are about PostgreSQL, see the conference pages for exact focus.<br />
 </p>

<h1>Sideprojects</h1>

<p>On the sideproject front I can welcome</p>

<h2>Titanpad</h2>

<p><a href="http://titanpad.com/">Titanpad</a> was launched as <a href="http://etherpad.com/">EtherPad</a> replacement, because the latter one prevented new pad creation from the 14th April on and will close shop at 14th of May.</p>

<h2>etherhack</h2>

<p><a href="http://etherhack.wordpress.com/">etherhack</a> is a project of a friend of mine who wants to bring Linux on generic (somewhat-)managed switches. The research is currently at the beginning, if you're into hardware hacking, writing toolchains and all that stuff it's definitely something to look into.</p>

<h2><span class="caps">FM4</span></h2>

<p>The ever-present unofficial <span class="caps">FM4 </span>stream is currently on <a href="http://fm4.amd.co.at/">hiatus</a>. We'll see if we can find a new home for it.</p>

<h2>PostgreSQL Benchfarm</h2>

<p>I <a href="https://workbench.amd.co.at/hg/pgperftrace/">dabbled a bit</a> into object oriented Perl with <a href="http://www.iinteractive.com/moose/">Moose</a>, trying to build an automated and customizable benchmarking framework for PostgreSQL. It was quite a trip but I currently put my project on hold because I'm all out of time and focus ;).</p>


<p>That's all from me for now, I'm off to bed and hunting flights tomorrow.</p>]]>
    </content>
</entry>

<entry>
    <title>PostgreSQL Performance Slides, Solaris stuff</title>
    <link rel="alternate" type="text/html" href="http://blogs.amd.co.at/robe/2009/09/postgresql-performance-slides-solaris-stuff.html" />
    <id>tag:blogs.amd.co.at,2009:/robe//1.22</id>

    <published>2009-09-23T18:10:00Z</published>
    <updated>2009-09-23T18:20:30Z</updated>

    <summary>As promised, here are the slides of my presentation I held at the Metalab, titled &quot;PostgreSQL Performance: Eine Landvermessung&quot;. And I just stumbled over this blog posting by Brendan Gregg who works for Sun&apos;s Fishworks team and was amazed by...</summary>
    <author>
        <name>Michael Renner</name>
        <uri>http://amd.co.at/</uri>
    </author>
    
    <category term="presentation" label="presentation" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="slides" label="slides" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="solaris" label="solaris" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="sun" label="sun" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="de" xml:base="http://blogs.amd.co.at/robe/">
        <![CDATA[<p>As promised, <a href="http://amd.co.at/blogdata/postgresql_performance_web.pdf">here are the slides</a> of my presentation I held at the Metalab, titled "PostgreSQL Performance: Eine Landvermessung".</p>

<p>And I just stumbled over this <a href="http://blogs.sun.com/brendan/entry/7410_hardware_update_and_analyzing">blog posting</a> by <a href="http://www.blogcdn.com/www.engadget.com/media/2009/01/shout-at-disk-array-eng.jpg">Brendan Gregg</a> who works for Sun's <a href="http://blogs.sun.com/bmc/entry/fishworks_now_it_can_be">Fishworks team</a> and was amazed by the level of detail that Solaris' instrumentation data provides. Good stuff!</p>]]>
        
    </content>
</entry>

<entry>
    <title>FrOSCon 2009</title>
    <link rel="alternate" type="text/html" href="http://blogs.amd.co.at/robe/2009/08/froscon-2009.html" />
    <id>tag:blogs.amd.co.at,2009:/robe//1.21</id>

    <published>2009-08-31T21:16:22Z</published>
    <updated>2009-08-31T21:48:25Z</updated>

    <summary>FrOSCon FrOSCon 2009 was a nice break from the stress at work, replacing it by stress in the weekend. The atmosphere was nice as usual and the planning good as every year. And with Andreas Scherbaum playing airport- and venue...</summary>
    <author>
        <name>Michael Renner</name>
        <uri>http://amd.co.at/</uri>
    </author>
    
    <category term="conference" label="conference" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="froscon" label="froscon" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="geizhals" label="geizhals" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="perl" label="perl" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="postgresql" label="postgresql" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="de" xml:base="http://blogs.amd.co.at/robe/">
        <![CDATA[<h1>FrOSCon</h1>

<p>FrOSCon 2009 was a nice break from the stress at work, replacing it by stress in the weekend. The atmosphere was nice as usual and the planning good as every year. And with <a href="http://andreas.scherbaum.la/blog/">Andreas Scherbaum</a> playing airport- and venue taxi the transportation didn't leave any room for improvement ;).</p>


<p>A few of the things that stuck with me were</p>


<h2>Virtualization &amp; Cloud Management</h2>

<p>There's a lot of stuff going on in the Virtualization world, since by now everybody noticed that just hypervising things doesn't cut the slack and that you need to manage the stuff you deployed somehow. Which is a good thing, by the way.</p>]]>
        <![CDATA[<p><a href="http://www.openqrm.com/">OpenQRM</a> gets <a href="http://reductivelabs.com/products/puppet/">Puppet</a> support, finally beating the <a href="http://madstop.com/2009/02/04/golden-image-or-foil-ball/">foil ball</a>. <a href="http://www.eucalyptus.com/">Eucalyptus</a> could emerge as a strong player in the IaaS area (Infrastructure as a Service) especially with the <a href="http://open.eucalyptus.com/wiki/EucalyptusStorage_v1.4">Walrus</a> storage service. And there's a plethora of other projects like <a href="http://www.opennebula.org/">OpenNebula</a>, <a href="http://workspace.globus.org/">Nimbus</a>, <a href="http://grid.ucalgary.ca/projects/DataCentre/index.html">Aspen</a>, <a href="http://www.enomaly.com/">Enomaly</a> and <a href="http://www.reservoir-fp7.eu/">Reservoir</a>, which also might have their respective strong points.</p>


<p>But all of them have one thing in common:</p>

<p>They are hardly usable in production environments.</p>

<p>If you need finished products stick to your VMware for now. If you go for one of the <span class="caps">FOSS </span>products in a large environment be prepared to hit a few speedbumps along your way and hack lots of essential stuff by yourself.</p>


<h2>Apache Hadoop &amp; Mahout</h2>

<p>Isabel Drost talked about <a href="http://lucene.apache.org/mahout/">Mahout</a>, which is a project focused on <a href="http://en.wikipedia.org/wiki/Machine_learning">machine learning</a>, extending from <a href="http://lucene.apache.org/">Lucene</a>, doing it's magic with <a href="http://hadoop.apache.org/">Hadoop</a>. Again very abstract and I get the impression that MapReduce based stuff isn't still quite ready for the unwashed masses.</p>

<p>Side notes: The Apache people start to get scary by now, I bet they're about to start an operating system project very soon now. And both Hadoop and Mahout have Elephant logos!</p>


<h2>PostgreSQL &amp; Performance</h2>

<p>My presentation was haunted by far too few time to prepare, a experience-wise very diverse audience and far too much content. For the future, I'll pick a skill level in advance and stick to it. And do timings upfront. Promised. ;)</p>



<h2>Perl::Critic</h2>

<p><a href="http://www.renee-baecker.de/">René Bäcker</a> gave a talk on <a href="http://search.cpan.org/~elliotjs/Perl-Critic-1.104/lib/Perl/Critic.pm">Perl::Critic</a>, an interesting module to enforce coding standards in Perl projects (You wouldn't believe it's not an oxymoron!). Most of it's rule set based on Damian Conways' <a href="http://oreilly.com/catalog/9780596001735/">Perl Best Practices</a> so it gives you a good head start for maintainable code. It's very easy to extend so enforcing all your major and minor pet peeves isn't much of a problem.</p>


<h2>PostgreSQL (in the real world)</h2>

<p>Stefan Kaltenbrunner gave us insights in how the PostgreSQL project infrastructure looks like and what role <a href="http://www.postgresql.org/about/servers">Panama</a> plays in the big picture ;).</p>

<p>In the afternoon we had a few lighting talks in the PostgreSQL Developer Room. Two excerpts:</p>

<p>I talked a bit about Geizhals.at and how we use PostgreSQL over here</p>

<p>Marek Swierzy from <span class="caps">OSSCAD</span> GmbH described how they use PostgreSQL to store temperature readings (among other measurements) which they collect from fiber optic cables over distances up to 12km with a resolution of 0,5m with just the cable as sensor!</p>

<p>See the <a href="http://wiki.postgresql.org/wiki/FrOSCon_2009">PostgreSQL Wiki</a> for a complete list of talks.</p>

<h2>OpenSQL Camp Database Panel Discussion</h2>

<p>More an interactive question time than a panel discussion but interesting nevertheless. Now I finally know what the main market of <a href="http://www.firebirdsql.org/">Firebird</a> is (Embedded database engine for applications). And <a href="http://www.blackray.org/">Blackray</a> seems to be an interesting contender on the <span class="caps">FTS </span>market, when you've got enough <span class="caps">RAM </span>to throw at the problem.</p>

<h2>Summing it up</h2>

<p>All in all it was a nice conference with very much content and far too many <a href="http://programm.froscon.org/2009/day_2009-08-22.en.html">parallel tracks</a> ;).</p>


<p><span class="mt-enclosure mt-enclosure-image" style="display: inline;"><a href="http://blogs.amd.co.at/robe/assets_c/2009/08/froscon-25.html" onclick="window.open('http://blogs.amd.co.at/robe/assets_c/2009/08/froscon-25.html','popup','width=1600,height=1200,scrollbars=no,resizable=no,toolbar=no,directories=no,location=no,menubar=no,status=no,left=0,top=0'); return false"><img src="http://blogs.amd.co.at/robe/assets_c/2009/08/froscon-thumb-200x150-25.jpg" width="200" height="150" alt="froscon.jpg" class="mt-image-none" style="" /></a></span></p>

<p>Würstel Queue</p>]]>
    </content>
</entry>

<entry>
    <title>An interim update</title>
    <link rel="alternate" type="text/html" href="http://blogs.amd.co.at/robe/2009/08/an-interim-update.html" />
    <id>tag:blogs.amd.co.at,2009:/robe//1.20</id>

    <published>2009-08-16T19:13:18Z</published>
    <updated>2009-08-16T20:50:17Z</updated>

    <summary>The last two months were very interesting and positively demanding....</summary>
    <author>
        <name>Michael Renner</name>
        <uri>http://amd.co.at/</uri>
    </author>
    
    <category term="amd" label="amd" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="debian" label="debian" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="froscon" label="froscon" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="geizhals" label="geizhals" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="hp" label="hp" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="intel" label="intel" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="performance" label="performance" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="postgresql" label="postgresql" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="sicekit" label="sicekit" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="talk" label="talk" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="de" xml:base="http://blogs.amd.co.at/robe/">
        <![CDATA[<p>The last two months were very interesting and positively demanding.</p>]]>
        <![CDATA[<h2>Geizhals</h2>

<p>I took up a regular (and interesting!) job again at <a href="http://geizhals.at/">Geizhals</a> (a price comparison platform with origins in Austria), which seems to be one of the few interesting web projects in Austria. The job title says "Head of IT services", at the moment I'm interviewing candidates for the newly created sysadmin team as well as testing components<sup class="footnote"><a href="http://blogs.amd.co.at/robe/2009/08/an-interim-update.html#fn1">1</a></sup> for a future-proof platform to run all Geizhals services. The project has come a long way in the few years I didn't follow it closely and the service landscape got quite a bit more complex in the meanwhile.</p>

<p class="footnote" id="fn1"><sup>1</sup> Currently at the foundation: <span class="caps">HP,</span> Supermicro, Debian, Xen, <span class="caps">DRBD, </span>one of the projects formerly known as Heartbeat, Pupppet, etc.</p>


<h2>Nehalem &amp; HP ProLiant G6</h2>

<p>With the official presentation of the Nehalem architecture HP also launched their ProLiant Generation 6. The <a href="http://en.wikipedia.org/wiki/Intel_QuickPath_Interconnect">QuickPath interconnect</a> was long overdue and is a stab in the heart of <span class="caps">AMD</span>s meticulously built up foothold in the server market. It'll be interesting to see how the vendors who switched to <span class="caps">AMD </span>in the last few years<sup class="footnote"><a href="http://blogs.amd.co.at/robe/2009/08/an-interim-update.html#fn2">2</a></sup> will plan their future strategy. </p>


<p>As for HP servers - we've currently got two <span class="caps">DL360 G6, E5530,</span> 72GB <span class="caps">RAM,</span> 8&#215; 300GB <span class="caps">SAS </span>machines in the office.</p>

<p>Things were a bit bumpy at the start (quite a few "must-have" firmware updates), but this is to be expected when a new <span class="caps">CPU </span>architecture and chipset is launched, I'll probably follow up as soon as the servers are in production.</p>

<p>A few things I noticed when comparing <span class="caps">DL360</span> G6 to G5:</p>

<p>The servers:</p>


<ul>
<li>are more quiet</li>
<li>use less power</li>
<li>take ages to <span class="caps">POST</span></li>
<li>are flaky in heavy reboot/test cycles, especially when using virtual media and broken boot loaders</li>
<li>are otherwise what you'd expect from properly engineered TIer 1 servers</li>
</ul>



<p class="footnote" id="fn2"><sup>2</sup> The main reason being Opteron/HyperTransport, because Intel's <span class="caps">FSB</span>-architecture didn't scale nicely to more than a few processors</p>

<h2>HP &amp; Debian</h2>

<p>On the plus side the Debian support for the server tools is much better these days. The effort (apparently spearheaded by <a href="http://dannf.org/bloggf/">Dann Frazier</a>) resulted in a apt-cdrom readable <a href="http://h20000.www2.hp.com/bizsupport/TechSupport/SoftwareIndex.jsp?lang=en&amp;cc=us&amp;prodNameId=3562405&amp;prodTypeId=15351&amp;prodSeriesId=3454575&amp;swLang=8&amp;taskId=135&amp;swEnvOID=4032"><span class="caps">ISO </span>image</a> which is going to be replaced by a proper Debian repository<sup class="footnote"><a href="http://blogs.amd.co.at/robe/2009/08/an-interim-update.html#fn3">3</a></sup> eventually.</p>

<p>If you run ProLiant x86 servers the hp-health tools are very nice to have and if you're using SmartArray controllers you'll be delighted by the properly packaged hpacucli. </p>


<p class="footnote" id="fn3"><sup>3</sup> I've set up <a href="http://amd.co.at/hpstuff/">http://amd.co.at/hpstuff/</a> in the meanwhile. Use "deb http://amd.co.at/hpstuff lenny/8.25 non-free" as sources.list entry in your Debian Lenny systems.</p>

<h2>Debian &amp; Bootloaders</h2>

<p>During the testing &amp; deployment of the new HP servers I stumbled over a few gotchas in Debian Lenny.</p>

<p>Installing a <span class="caps">LVM</span>-root based system with no standalone /boot partition is apparently unsupported (results in an unbootable system), if you go with Debian's defaults for <span class="caps">LVM</span>-root installations you get a system with <span class="caps">LILO.</span></p>

<p>Nothing a manual copying of the /boot files and installing of <span class="caps">GRUB</span> 2 can't fix.</p>

<p>But as soon as you try using <span class="caps">GRUB</span> 2 on a Xen Dom0 you notice that the grub.cfg generator doesn't support generating Xen compatible config stanzas.</p>

<p>I'll file a few bug/discussion reports in the near future as soon as I've got all the details worked out.</p>

<h2>PostgreSQL 8.4</h2>

<p>In the meanwhile PostgreSQL 8.4 was also released. No major breakthroughs, but lots of small functionality and performance improvements. See the <a href="http://www.postgresql.org/about/news.1108">announcement</a> and the <a href="http://www.postgresql.org/docs/8.4/static/release-8-4.html">Release notes</a> for details.</p>

<p>The thing I'm interested in the most is the introduction of <a href="http://www.postgresql.org/docs/8.4/interactive/runtime-config-resource.html#RUNTIME-CONFIG-RESOURCE-ASYNC-BEHAVIOR">fadvise calls</a> which make asynchronous kernel-side multithreaded IO-prefetching possible. This is very helpful in situations where index scans of a single backend hit disk and your storage backend can handle more (read) IOs than a single thread can generate on it's own. Expect more on this topic in the near future.</p>



<h2>Performance</h2>

<p>At work I had the chance to replace a 8 drive <span class="caps">SATA SAN </span>with a 32 drive <span class="caps">SAS SAN </span>along with a server replacement (Intel Core2/FSB Xeons -&gt; HyperTransport Opterons). Benchmarking these things was quite fun and enlightening, but I had far too few time to properly document everything. One thing that I learned is, that ample amounts of write cache and proper Command Queuing depth come a long way in storage systems ;). Still no solid state devices though.</p>


<h2>PostgreSQL talk</h2>

<p>I'll give a talk on <a href="http://programm.froscon.org/2009/events/432.en.html">PostgreSQL Performance</a> at <a href="http://www.froscon.org/">FrOSCon</a>. Still not finished, but it's targeted at beginners (when it comes to performance-related topics) and not entirely PostgreSQL specific. I'll probably repeat it at the <a href="http://metalab.at/">Metalab</a>, if there should be enough interest.</p>



<h2><span class="caps">SICEKIT</span></h2>

<p>Christian Hofstädtler and I started to generalize the infrastructure documentation framework we started back at <a href="http://www.inqnet.at/">Inqnet</a>. The current progress can be seen at <a href="http://sicekit.org/">http://sicekit.org/</a>, it will probably take a few months till we've got an usable product though.</p>]]>
    </content>
</entry>

<entry>
    <title>Data-sniffing trojans burrow into Eastern European ATMs</title>
    <link rel="alternate" type="text/html" href="http://blogs.amd.co.at/robe/2009/06/data-sniffing-trojans-burrow-into-eastern-european-atms.html" />
    <id>tag:blogs.amd.co.at,2009:/robe//1.19</id>

    <published>2009-06-04T17:35:44Z</published>
    <updated>2009-06-04T18:52:43Z</updated>

    <summary>A catchy headline, as written by The Register. To quote more from the story (Full report with tech details): The malware logs the magnetic-stripe data and personal identification number of cards used at an infected machine and provides an intuitive...</summary>
    <author>
        <name>Michael Renner</name>
        <uri>http://amd.co.at/</uri>
    </author>
    
    <category term="cryptography" label="cryptography" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="management" label="management" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="security" label="security" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="de" xml:base="http://blogs.amd.co.at/robe/">
        <![CDATA[<p>A catchy headline, as written by <a href="http://www.theregister.co.uk/2009/06/03/atm_trojans/">The Register</a>. To quote more from the story (<a href="http://regmedia.co.uk/2009/06/03/trust_wave_atm_report.pdf">Full report</a> with tech details):</p>

<blockquote><p>The malware logs the magnetic-stripe data and personal identification number of cards used at an infected machine and provides an intuitive interface for retrieving the information using the <span class="caps">ATM'</span>s receipt printer, [..] Since late 2007 or so, there have been at least 16 updates to the software, an indication that the authors are working hard to perfect their tool.</p></blockquote>

<p>This is a nice example of what happens, when you ignore the things that are necessary to run an important area of your core business. The business area being the operation of the <span class="caps">ATM </span>machines (guess how bank teller utilization would look like if you throw out all <span class="caps">ATM</span>s). And a few of the things to run such a part competently would be: security (of the systems, the network), <a href="http://en.wikipedia.org/wiki/Information_Technology_Infrastructure_Library#Overview_of_the_ITIL_v3_library">service lifecycle management</a> and <a href="http://en.wikipedia.org/wiki/Configuration_management">configuration management</a>.</p>]]>
        <![CDATA[<p>To put the situation in perspective:</p>

<blockquote><p>We've got a network of Windows 98/2000/XP devices, supplied by an <span class="caps">ISV, </span>hardly maintained, with the <span class="caps">ISV </span>having a proven trackrecord of "<a href="http://www.sos.ca.gov/elections/voting_systems/security_analysis_of_the_diebold_accubasic_interpreter.pdf">being</a> <a href="http://www.blackboxvoting.org/BBVtsxstudy.pdf">challenged</a>" WRT IT security, running on a scarcely secured network<sup class="footnote"><a href="http://blogs.amd.co.at/robe/2009/06/data-sniffing-trojans-burrow-into-eastern-european-atms.html#fn1">1</a></sup> which deals with cash transactions. Is there any reason to worry?</p></blockquote>

<p>Yes. Deploying machines in such sensitive environments, without having a plan on how one is going to deploy updates, without having a plan on how you're going to spot tamperings, without having a plan on assessing how the security of the system looks like is blatantly incompetent. I can see the guys in charge, stating "Oh, we don't need this, it's an internal network. Nobody is going to have access there!" in meetings when the discussion touches one of the aforementioned topics. And you can bet on the <a href="http://www.youtube.com/watch?v=Ug83sF_3_Ec">corporate culture</a> of banks to honor such reasonings. Until it's too late.</p>

<p>And the sad part is, that the whole system is most likely in such a bad shape that a proper approach to the situation would take at least months (or weeks, given the availability of domain experts and allowing for outages in production systems). And so they will do what they always do when faced with a problem which escapes the scopes they're fit for: fix the symptoms! There's probably a sorry lad driving around country right now, checking every <span class="caps">ATM </span>and deleting the trojan if it's installed. And maybe, only maybe, also fixing up the holes the crooks used to get in in the first place.</p>

<p>The interesting part of the story is the amount of professionalism shown by the bad boys. You can rely on the powers of the market economy, the finesse and level of competence of the russian IT-crooks and an software/infrastructure ecosystem which almost screams for being abused to lead to the exact situation at hand.</p>

<p>And I'm glad it happened. Now the companies involved will run through their <a href="http://en.wikipedia.org/wiki/Kübler-Ross_model">stages of grief</a>, probably skipping a phase or two, emerging reinforced. A pretty popular case of publicly displayed corporate griefing would be the timeline of the <a href="http://en.wikipedia.org/wiki/MIFARE#Security_of_MIFARE_Classic">Mifare Classic</a> security problems. Back then it was basically "There are no security issues, filthy liars!", being followed by "I baked you an injunction, but I failed it" which finally resulted in a "Mifare Plus is an <span class="caps">AES</span>-based drop-in replacement for Mifare Classic and will be available later this year". </p>

<p>The sad part in both cases is, that it always takes an event of such gigantic proportions to get the affected companies moving and accept/adopt best practices from the industry.</p>

<p>Proper system administration practices exists since the first "hosts" started to run batch jobs (and was much better back then, as I'm told by IT veterans). And they're even poured in <span class="caps">ITIL </span>these days.</p>

<p><a href="http://en.wikipedia.org/wiki/Cryptanalysis">Cryptanalysis</a> exists since mankind started to hide messages from each other and was very much professionalized in <span class="caps">WWII, </span>making it possible for the Allies to tilt the chances in their favor. And in the case of <a href="http://en.wikipedia.org/wiki/Crypto-1">Crypto-1</a> it doesn't even take a domain expert to get suspicious. It was a sound <em>obfuscation</em> solution back in '94, but product management should've acted on it in the last 14 years, especially because Mifare Classic started to get used heavily in electronic access control systems in Offices and governmental departments (can't say much about Police or Military, one hopes that they've got better standards there).</p>

<p>In the end, the world will have safer &amp; better systems. Maybe better educated vendors. All at the expense of much stress, pain, fingerpointing and shouting. And all that could've been much easier if the people in question had the room/balls/brains for actually questioning what they're doing, and if It's after all - <a href="http://www.flickr.com/photos/niallkennedy/67820957/">Good For The Company?</a></p>


<p class="footnote" id="fn1"><sup>1</sup> I had a picture of an <span class="caps">ATM </span>in a Bank foyer, supposedly somewhere in eastern europe, showing networking equipment and an abundance of cabling right next to it, for everyone to access. But I lost it somewhere on the internet ;).</p>]]>
    </content>
</entry>

<entry>
    <title>In defense of architecture diagrams</title>
    <link rel="alternate" type="text/html" href="http://blogs.amd.co.at/robe/2009/05/in-defense-of-architecture-documentation.html" />
    <id>tag:blogs.amd.co.at,2009:/robe//1.18</id>

    <published>2009-05-31T20:27:20Z</published>
    <updated>2009-05-31T20:27:24Z</updated>

    <summary>I just stumbled over an old architecture diagram from one of the projects I used to work on. The type of services and project in question are left as an exercise to the curious reader, since this is not the...</summary>
    <author>
        <name>Michael Renner</name>
        <uri>http://amd.co.at/</uri>
    </author>
    
    <category term="architecture" label="architecture" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="documentation" label="documentation" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="de" xml:base="http://blogs.amd.co.at/robe/">
        <![CDATA[<p>I just stumbled over an old architecture diagram from one of the projects I used to work on. The type of services and project in question are left as an exercise to the curious reader, since this is not the point of this posting. </p>

<p>What I wanted to show is, how complex multi-tiered applications can be these days, especially when you phase in new services or try to replace old ones by setting up the new services to run in parallel to the existing ones.</p>

<p><span class="mt-enclosure mt-enclosure-image" style="display: inline;"><a href="http://blogs.amd.co.at/robe/assets_c/2009/05/HighLevelArch_cropped-22.html" onclick="window.open('http://blogs.amd.co.at/robe/assets_c/2009/05/HighLevelArch_cropped-22.html','popup','width=1275,height=883,scrollbars=no,resizable=no,toolbar=no,directories=no,location=no,menubar=no,status=no,left=0,top=0'); return false"><img src="http://blogs.amd.co.at/robe/assets_c/2009/05/HighLevelArch_cropped-thumb-200x138-22.png" width="200" height="138" alt="HighLevelArch_cropped.png" class="mt-image-right" style="float: right; margin: 0 0 20px 20px;" /></a></span></p>]]>
        <![CDATA[<p>Imagine the following scenarios:</p>

<h2>New team members</h2>

<p>A member gets added to the project. How long does it take him to understand the project from the technical side? How long would it take him if he isn't familiar with the area of business or a <a href="http://en.wikipedia.org/wiki/Problem_domain_expert">domain expert</a>? Chances are high that new project members will create drawings on their own to get a complete picture of the architecture.</p>

<h2>Discussions</h2>

<p>There's a (maybe even heated) debate over a particular area of the architecture. Nobody has a complete &amp; clear picture of the architecture, since the last discussion is a few weeks old. How long does it take to get your point across when only resorting to a flipchart? How long will it take when you can use an accurate &amp; leigible overview as base for your discussion?</p>

<h2>Operations</h2>

<p>Your ops team gets alerted because some part of your projects infrastructure misbehaves. How much time is going to be spent to get the source of the problem when there's no overview of the project, trying to figure out which symptoms are causal for the problem or just side effects</p>

<h2>Summing it up</h2>

<p>Even if you don't need that kind of documentation right now, chance are high that you're going to need it very soon. And if you don't do it nice'n'thorough once, you (or other team members) will repeat the effort multiple times and throw the results away after they're done with them.</p>

<p>So in the name of efficiency, get out the Visio (or Dia, or OmniGraffle...) and draw away!</p>]]>
    </content>
</entry>

<entry>
    <title>System Administrator centric online community launched</title>
    <link rel="alternate" type="text/html" href="http://blogs.amd.co.at/robe/2009/05/system-administrator-centric-online-community-launched.html" />
    <id>tag:blogs.amd.co.at,2009:/robe//1.17</id>

    <published>2009-05-29T11:02:20Z</published>
    <updated>2009-05-29T11:03:06Z</updated>

    <summary>To quote Jeff Atwood in his blog: Server Fault is a sister site to Stack Overflow, which we launched back in September 2008. It uses the same engine, but it&apos;s not just for programmers any more: Server Fault is for...</summary>
    <author>
        <name>Michael Renner</name>
        <uri>http://amd.co.at/</uri>
    </author>
    
    <category term="community" label="community" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="sysadmin" label="sysadmin" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="de" xml:base="http://blogs.amd.co.at/robe/">
        <![CDATA[<p>To quote Jeff Atwood in his <a href="http://www.codinghorror.com/blog/archives/001269.html">blog</a>:</p>

<blockquote><p><a href="http://serverfault.com/">Server Fault</a> is a sister site to <a href="http://stackoverflow.com/">Stack Overflow</a>, which we launched back in September 2008. It uses the same engine, but it's not just for programmers any more:</p>

<p>Server Fault is for system administrators and IT professionals, people who manage or maintain computers in a professional capacity. If you are in charge of ...<br />
* servers<br />
* networks<br />
* many desktop PCs (other than your own)<br />
... then you're in the right place to ask your question! Well, as long as the question is about your servers, your networks, or desktops you support, anyway.</p></blockquote>]]>
        <![CDATA[<p>I'm really delighted to see this. I liked what Jeff and his friends did with Stack Overflow and always thought that the System Administrators lacked a sensible and well-visited forum of some sorts.</p>

<p>With software developers there are various boards, groups, etc. (albeit mostly language/framework-specific) where one can get sane and considerate suggestions from people who know their box and can think outside of it.</p>

<p>But for system administrators no such generic &amp; popular places existed (Maybe some Usenet groups and probably some areas in the wake of <a href="http://www.usenix.org/events/lisa09/"><span class="caps">LISA</span></a>/<a href="http://www.usenix.org/"><span class="caps">USENIX</span></a>, but those are as well-established in Old Europe as Monster Trucks and <span class="caps">WWF </span>wrestling).</p>

<p>One of the main challenges System Administrators face is, that compared to most developers who might work in a single language/framework on a single product for weeks or months, sysadmins are depending on the environment, tasked with a very broad area of responsibilities and topics.</p>

<p>At the bare minimum every site should have:</p>


<ul>
<li>Backup</li>
<li>Restore (think: Disaster Recovery)</li>
<li>Monitoring</li>
<li>Performance data collection</li>
<li>Documentation</li>
<li>Virtualization (by now!)</li>
<li>Patch/Update management</li>
<li>Configuration Management (if the amount of nodes warrants it)</li>
<li>Defined &amp; communicated availability information for the system</li>
</ul>



<p>Excluding any services which are going to be run on the infrastructure you need a good understanding of products from at least 7 different vendors to setup &amp; maintain this infrastructure. And may god help you if you need to design your infrastructure upfront with products you don't know yet. Especially when it's open source products<sup class="footnote"><a href="http://blogs.amd.co.at/robe/2009/05/system-administrator-centric-online-community-launched.html#fn1">1</a></sup>.</p>

<p>And this is were <a href="http://serverfault.com/">Server Fault</a> comes to the rescue.</p>

<p>You're looking for a backup solution and want to check upfront if <a href="http://www.bacula.org/">Bacula</a> or <a href="http://www.amanda.org/">Amanda</a> are any good or if you should go for the commercial offerings? Heck, you might even want to know about different approaches to short-term backups, like <a href="http://www.netapp.com/us/products/platform-os/snapshot.html">NetApp Snapshots</a>?</p>

<p>You're relatively new to the Virtualization bandwagon and want to know what the production-relevant impediments and features of  <a href="http://www.xen.org/">Xen</a>, <a href="http://www.linux-kvm.org/"><span class="caps">KVM</span></a>, <a href="http://www.openvz.org">OpenVZ</a>/<a href="http://www.parallels.com/products/virtuozzo/">Virtuozzo</a> and <a href="http://www.vmware.com/">VMware</a> are?</p>

<p>Those are a few examples one can learn through many years in System Administration, in the right environment with the right sort of colleagues.</p>

<p>And this process can be shortened considerably when you've got the right sort of forum, were interested persons can mingle with experienced ones and were even controversial topics (Container-based or Full Virtualization? I dare you!) can be discussed in a civilized manner.</p>

<p>So let's see how this develops, I'll be trolling the site in the meanwhile ;).</p>


<p class="footnote" id="fn1"><sup>1</sup> As the infrastructure/installation gets larger, proper integration of all tools becomes more and more important. You don't want to find out that your tool doesn't have proper <span class="caps">AAA </span>integration for central identity management. You don't want to hack up your own monitoring interfaces, going directly into the products native database because the vendor didn't really anticipate that you want automatic monitoring of your job runs. Those are expected features when a given tool handles more nodes than you can count with all your limbs.</p>]]>
    </content>
</entry>

<entry>
    <title>Testing PostgreSQL replication solutions: Slony-I</title>
    <link rel="alternate" type="text/html" href="http://blogs.amd.co.at/robe/2009/05/testing-postgresql-replication-solutions-slony-i.html" />
    <id>tag:blogs.amd.co.at,2009:/robe//1.16</id>

    <published>2009-05-16T00:00:23Z</published>
    <updated>2009-11-17T13:57:20Z</updated>

    <summary>Slony-I is a trigger-based replication solution which allows you to replicate database tables and sequences asynchronously from one master to several read-only slaves (which can also be cascaded). Trigger-based means, that each table and sequence which gets replicated has triggers...</summary>
    <author>
        <name>Michael Renner</name>
        <uri>http://amd.co.at/</uri>
    </author>
    
    <category term="pgrep" label="pgrep" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="postgresql" label="postgresql" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="replication" label="replication" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="slony" label="slony" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="de" xml:base="http://blogs.amd.co.at/robe/">
        <![CDATA[<p><a href="http://www.slony.info/">Slony-I</a> is a trigger-based replication solution which allows you to replicate database tables and sequences asynchronously from one master to several read-only slaves (which can also be cascaded).</p>

<p>Trigger-based means, that each table and sequence which gets replicated has triggers assigned, which will fire whenever the content of the given database object changes. The stored procedures, which are associated in the triggers, will then record the changes and store these in a replication log table. Separate daemons monitor the log table for changes and distribute the changes according to their defined rules.</p>

<p>This approach allows for extremely flexible setups, having different master servers for different tables, but this comes at a price.</p>]]>
        <![CDATA[<p>First - this kind of replication solution is very complex. There are triggers, stored procedures and very much meta-information (think "What has to get sent where?") in the database, with separate daemons doing the actual work.</p>

<p>Furthermore, dealing with the triggers also necessitates strict rules when it comes to <span class="caps">DDL </span>changes. The Slony-I <a href="http://www.slony.info/documentation/">documentation</a> has further information on <a href="http://www.slony.info/documentation/ddlchanges.html">this topic</a>.</p>

<p>And last but not least, the double write of every change ("in place" and in the logging table) also causes overhead for writes, approximately 2.5 times the data you'd have when not using Slony-I (Numeric and Date/Time values are much larger in the log table, since they only get stored in their <span class="caps">ASCII </span>representation there).</p>

<p>See also the Slony-I <a href="http://www.slony.info/documentation/slonyintro.html">introduction on their site</a>.</p>


<p>That being said, let's see how this works:</p>

<h1>Under the hood</h1>


<h2>Slony-I components</h2>

<p>There're a few things that make Slony-I tick:</p>

<h3>PostgreSQL</h3>

<p>Since most of the interesting things happen inside PostgreSQL in the form of triggers and stored procedures, Slony-I can naturally not work without PostgreSQL ;).</p>

<p>All Slony-I related information (nodes, replication sets, log entries, etc.) is stored in a schema called "_$SLONYCLUSTERNAME".</p>

<h3>slon</h3>

<p>slon is the daemon which takes care of the actual data replication, monitoring the Slony-I log tables and applying the changes to the various nodes.</p>

<h3>slon_tools.conf</h3>

<p>The "shape" of the cluster should be accurately documented in slon_tools.conf. Many Slony-I helper scripts use the information in the slon_tools.conf to generate the necessary slonik commands.</p>

<h3>slonik</h3>

<p>slonik is the Slony-I command processor, parsing slonik commands and calling stored procedures on the various nodes to reflect the desired changes.</p>


<p>Please also read <a href="http://www.slony.info/documentation/concepts.html">Slony-I Concepts</a> to understand the terms I'm going to use from now on ;).</p>


<h2>The pgexerciser schema</h2>

<p>Since using Slony-I requires a good understanding of the schema your application uses, I'll explain how pgexerciser does it's magic. pgexerciser tries to implement an overly trivialized auction application. There are users, who can create auctions and bid on auctions. Every bid is "sanity checked" in the database.</p>

<h3>user</h3>



<pre>
 Column |  Type   |                     Modifiers
--------+---------+---------------------------------------------------
 id     | integer | not null default nextval('user_id_seq'::regclass)
 name   | text    |
</pre>



<p>Boring table, two columns, one Primary Key doubling as the user id, one for usernames.</p>

<h3>auction</h3>



<pre>
   Column    |           Type           |                      Modifiers
-------------+--------------------------+------------------------------------------------------
 id          | integer                  | not null default nextval('auction_id_seq'::regclass)
 creator     | integer                  | not null
 description | text                     | not null
 current_bid | numeric                  | not null default 0
 end_time    | timestamp with time zone | not null default now()
Indexes:
    &quot;auction_pkey&quot; PRIMARY KEY, btree (id)
Foreign-key constraints:
    &quot;auction_creator_fkey&quot; FOREIGN KEY (creator) REFERENCES &quot;user&quot;(id)
</pre>




<p>Primary Key as auction id, the auctions creator (foreign key constraint on user table), auction description, current highest bid (updated via a trigger on the bid table) and the auctions end time.</p>

<h3>bid</h3>



<pre>
 Column  |           Type           |                    Modifiers
---------+--------------------------+--------------------------------------------------
 id      | integer                  | not null default nextval('bid_id_seq'::regclass)
 bidder  | integer                  | not null
 auction | integer                  | not null
 bid     | numeric                  | not null
 time    | timestamp with time zone | not null default now()
Indexes:
    &quot;bid_pkey&quot; PRIMARY KEY, btree (id)
Foreign-key constraints:
    &quot;bid_auction_fkey&quot; FOREIGN KEY (auction) REFERENCES auction(id) ON DELETE CASCADE
    &quot;bid_bidder_fkey&quot; FOREIGN KEY (bidder) REFERENCES &quot;user&quot;(id)
Triggers:
    update_auction_current_bid BEFORE INSERT OR UPDATE ON bid FOR EACH ROW EXECUTE PROCEDURE update_auction_current_bid()
</pre>



<p>Primary Key as bid id, the bidder (FK constraint on user table), the auction id (FK on auction table), the bid amount and a timestamp.</p>

<p>There's a trigger which validates every bid (checks if the new bid is higher than the current highest bid and if the auction hasn't ended already) and if it's valid, updates the current_bid in the auction table.</p>


<h1>Getting started</h1>

<p>As always, please make sure that your environment looks like as described in <a href="http://blogs.amd.co.at/robe/2009/05/testing-postgresql-replication-solutions-basic-setup.html">this post</a>.</p>

<h2>Preparing the environment</h2>

<p>As a first step, run</p>



<pre>
master1:~/pgworkshop# ./envorcer slony
</pre>



<p>This will </p>


<ul>
<li>create a PostgreSQL superuser called "slony" on both nodes</li>
<li>disable all access constraints on all databases network-wise</li>
<li>create a slon_tools.conf prepared for the pgexerciser schema</li>
<li>copy the pgexerciser schema to the "slave node"</li>
<li>add startup entries for the slon daemons on master1.</li>
</ul>



<h2>The slon_tools.conf</h2>

<p>The slon_tools.conf is not necessary for normal operation of a Slony-I cluster, it's just a reference for the <a href="http://www.slony.info/documentation/adminscripts.html#ALTPERL">altperl Scripts</a> which we will use for cluster administration.</p>

<p>There's few documentation for the config file itself, but it's heavily commented.</p>

<p>/etc/slony1/slon_tools.conf contains the version edited for our schema, /usr/share/doc/slony1-bin/examples/slon_tools.conf-sample.gz is the original file as supplied by Slony-I, which contains more comments.</p>

<h2>slonik et al</h2>

<p>I won't go into much detail about slonik and the commands it expects - the userland tools we use (mostly) do what they're supposed to do, so there's no need to dive into this right now. See the Slony-I <a href="http://slony.info/documentation/commandreference.html">command reference</a> for more information about the slonik commands.</p>

<h2>Bootstrapping slony</h2>

<p>Running "slonik_init_cluster" generates the necessary slonik commands based on /etc/slony1/slon_tools.conf to initialize a Slony-I cluster, which basically means that slonik will create the special Slony-I schema on all configured nodes. You can either review the commands or just pipe the output to slonik to get started. Afterwards make sure to start the slon daemons which are necessary to actually replicate data.</p>



<pre>
master1:~/pgworkshop# slonik_init_cluster | slonik
&lt;stdin&gt;:10: Set up replication nodes
&lt;stdin&gt;:13: Next: configure paths for each node/origin
&lt;stdin&gt;:16: Replication nodes prepared
&lt;stdin&gt;:17: Please start a slon replication daemon for each node
master1:~/pgworkshop# /etc/init.d/slony1 start
Starting Slony-I daemon: 1 2.
master1:~/pgworkshop#
</pre>



<p>From now on you can monitor the actions of the slon daemons in "/var/log/slony1" on master1.</p>

<p>Now it's also a good time to start pgexerciser to get some movement in the database.</p>


<h2>The Slony-I schema</h2>

<p>I already mentioned that Slony-I stores much information related to replication in a special schema; to see what's actually in there you can use</p>



<pre>
master1:~/pgworkshop# psql sqlsim -c '\dt _slonytestcluster.'
</pre>



<p>See the <a href="http://www.slony.info/documentation/schema.html">Slony-I schema documentation</a> for further information on the tables and stored procedures.</p>


<h2>Replicating our first few tables</h2>

<p>To start the replication of data to the other node, we need to define a replication set first.</p>

<p>I've prepared the set in the slon_tools.conf already, there is a set called "set1" consisting of the tables "user", "bid" and "auction". To create the replication set in the slony schema in the database, we need to run slonik_create_set:</p>



<pre>
master1:~# slonik_create_set 1 | slonik
&lt;stdin&gt;:16: Subscription set 1 created
&lt;stdin&gt;:17: Adding tables to the subscription set
&lt;stdin&gt;:21: Add primary keyed table public.user
&lt;stdin&gt;:25: Add primary keyed table public.bid
&lt;stdin&gt;:29: Add primary keyed table public.auction
&lt;stdin&gt;:32: Adding sequences to the subscription set
&lt;stdin&gt;:33: All tables added
master1:~#
</pre>



<p>As always, you can check the commands slonik is going to run by ommiting the piped call to the slonik interpreter.</p>

<p>Creating the set alone won't buy us anything though, we also need to subscribe a second node to it:</p>



<pre>
master1:~# slonik_subscribe_set 1 2 | slonik
&lt;stdin&gt;:10: Subscribed nodes to set 1
master1:~#
</pre>



<p>In the logfile of node2 we can now see that the data is going to be copied from the master server:</p>



<pre>
[..]
2009-05-16 00:16:19 CEST DEBUG2 remoteWorkerThread_1: Received event 1,1674 ENABLE_SUBSCRIPTION
2009-05-16 00:16:19 CEST DEBUG1 copy_set 1
2009-05-16 00:16:19 CEST DEBUG1 remoteWorkerThread_1: connected to provider DB
2009-05-16 00:16:19 CEST DEBUG2 remoteWorkerThread_1: prepare to copy table &quot;public&quot;.&quot;user&quot;
2009-05-16 00:16:19 CEST DEBUG2 remoteWorkerThread_1: prepare to copy table &quot;public&quot;.&quot;bid&quot;
2009-05-16 00:16:19 CEST DEBUG2 remoteWorkerThread_1: prepare to copy table &quot;public&quot;.&quot;auction&quot;
[..]
</pre>



<p>and later on that new data created by pgexerciser is periodically transferred:</p>



<pre>
2009-05-16 00:19:41 CEST DEBUG2 remoteListenThread_1: queue event 1,1840 SYNC
2009-05-16 00:19:41 CEST DEBUG2 remoteWorkerThread_1: Received event 1,1840 SYNC
2009-05-16 00:19:41 CEST DEBUG2 calc sync size - last time: 1 last length: 4012 ideal: 14 proposed size: 3
2009-05-16 00:19:41 CEST DEBUG2 remoteListenThread_1: queue event 1,1841 SYNC
2009-05-16 00:19:41 CEST DEBUG2 remoteWorkerThread_1: SYNC 1840 processing
2009-05-16 00:19:41 CEST DEBUG2 remoteWorkerThread_1: syncing set 1 with 3 table(s) from provider 1
2009-05-16 00:19:41 CEST DEBUG2  ssy_action_list length: 0
2009-05-16 00:19:41 CEST DEBUG2 remoteWorkerThread_1: current local log_status is 0
2009-05-16 00:19:41 CEST DEBUG2 remoteWorkerThread_1_1: current remote log_status = 0
2009-05-16 00:19:41 CEST DEBUG2 remoteHelperThread_1_1: 0.001 seconds delay for first row
2009-05-16 00:19:41 CEST DEBUG2 remoteHelperThread_1_1: 0.003 seconds until close cursor
2009-05-16 00:19:41 CEST DEBUG2 remoteHelperThread_1_1: inserts=3 updates=2 deletes=0
2009-05-16 00:19:41 CEST DEBUG2 remoteWorkerThread_1: new sl_rowid_seq value: 1000000000000000
2009-05-16 00:19:41 CEST DEBUG2 remoteWorkerThread_1: SYNC 1840 done in 0.025 seconds
</pre>




<p>And when we check the slave server the data also looks good:</p>




<pre>
slave1:~# psql sqlsimslave -c &quot;SELECT * FROM bid ORDER BY id DESC LIMIT 3&quot;
  id  | bidder | auction |  bid   |             time
------+--------+---------+--------+-------------------------------
 2164 |     11 |      86 |   9.86 | 2009-05-16 00:34:15.510123+02
 2163 |      7 |      83 |  46.96 | 2009-05-16 00:34:15.177281+02
 2162 |     11 |      64 | 267.12 | 2009-05-16 00:34:15.16756+02
(3 rows)

slave1:~#
</pre>




<h2>About <span class="caps">SYNC</span>s</h2>

<p>Data between nodes is only replicated with every <span class="caps">SYNC </span>event. Additionally, Slony-I will introduce <span class="caps">SYNC </span>events periodically as a way to allow monitoring solutions to check if a node has fallen behind too much.</p>

<p>The Debian packaged slon will check for new data every second and introduce a <span class="caps">SYNC </span>event if it finds any. If there was no <span class="caps">SYNC </span>event for 10 seconds it will introduce a "keep-alive" SYNC.</p>


<h2>Adding new objects to replication</h2>

<p>We knowingly ignored the sequences (used for the primary keys) in our schema when defining the first replication set - a quick check on the subscriber server shows that they're troublingly low compared to the origin:</p>



<pre>
master1:~# psql -h slave1 sqlsimslave -c &quot;SELECT nextval('bid_id_seq')&quot;
 nextval
---------
       1
(1 row)

master1:~# psql sqlsim -c &quot;SELECT nextval('bid_id_seq')&quot;
 nextval
---------
    2931
(1 row)

master1:~#
</pre>



<p>Slony-I doesn't allow you to add new objects to an existing replication set, you have to define a new set and then merge it into an existing one:</p>



<pre>
master1:~# slonik_create_set 2 | slonik
&lt;stdin&gt;:16: Subscription set 2 created
&lt;stdin&gt;:17: Adding tables to the subscription set
&lt;stdin&gt;:20: Adding sequences to the subscription set
&lt;stdin&gt;:24: Add sequence public.auction_id_seq
&lt;stdin&gt;:28: Add sequence public.bid_id_seq
&lt;stdin&gt;:32: Add sequence public.user_id_seq
&lt;stdin&gt;:33: All tables added
master1:~# slonik_subscribe_set 2 2 | slonik
&lt;stdin&gt;:10: Subscribed nodes to set 2
master1:~#
</pre>



<p>And now the sequence on the slave server is also correct again:</p>



<pre>
master1:~# psql -h slave1 sqlsimslave -c &quot;SELECT nextval('bid_id_seq')&quot;
 nextval
---------
    3267
(1 row)

master1:~#
</pre>




<p>And to reduce the amount of sets to maintain:</p>



<pre>
master1:~# slonik_merge_sets 1 1 2 | slonik
&lt;stdin&gt;:10: Replication set 2 merged in with 1 on origin 1
master1:~#
</pre>




<p>Be sure to update the set definition in slon_tools.conf every time you modify a set!</p>


<h2>Homework!</h2>

<p>I think by now you've got the hang of the slonik tools.</p>

<p>Try to play through the following scenario:</p>

<h3>Defining some data</h3>

<p>Since <span class="caps">DDL </span>changes in Slony-I environments are <a href="http://slony.info/documentation/ddlchanges.html">not to be taken lightly</a>, try applying the script in /root/pgworkshop/configs/slony/add_start_time.sql with slonik_execute_script.</p>

<h3>Moving on</h3>

<p>Node1 needs to have some maintenance downtime. Move the replication set from Node1 to Node2. Check the last bid in pgexerciser. Restart it with "./pgexerciser -h slave1 -d sqlsimslave".</p>

<h3>Shit hits the fan</h3>

<p>Node2/slave1 experiences a horrible case of "killall -9 postgres". Failover the replication set back to Node1. Check pgexerciser.</p>

<h3>Rebuilding our shattered dreams</h3>

<p>Restart PostgreSQL on slave1. Since Node2 is now in an indeterministic state as far as Slony-I is concerned, you need to rebuild it from scratch. Cheat sheet: slonik_drop_node, slonik_store_node, slonik_subscribe_set.</p>


<h1>Final words</h1>

<p>Slony-I is not for the faint of heart. To quote the documentation:</p>


<blockquote><p>Thus, examples of cases where Slony-I probably won't work out well would include:</p>

<p>[..]<br />
Sites where configuration changes are made in a haphazard way.<br />
[..]</p>
</blockquote>

<p>And regarding <span class="caps">DDL </span>changes:</p>

<blockquote><p>Unfortunately, this nonetheless implies that the use of the <span class="caps">DDL </span>facility is somewhat fragile and fairly dangerous. Making <span class="caps">DDL </span>changes must not be done in a sloppy or cavalier manner. If your applications do not have fairly stable <span class="caps">SQL </span>schemas, then using Slony-I for replication is likely to be fraught with trouble and frustration.</p></blockquote>

<p>So, test your procedures beforehand, document everything, monitor everything and be extra-sure when modifying the cluster.</p>

<p>Be wary that the slon daemons are as important as the PostgreSQL databases itself, so treat them as such (especially when it comes to HA/Failover)</p>


<p>But in the end, if you treat Slony-I nicely it's a trusty, reliable and proven solution for your asynchronous master-to-multiple-slaves replication needs.</p>]]>
    </content>
</entry>

<entry>
    <title>Testing PostgreSQL replication solutions: Log shipping with walmgr</title>
    <link rel="alternate" type="text/html" href="http://blogs.amd.co.at/robe/2009/05/testing-postgresql-replication-solutions-log-shipping-with-walmgr.html" />
    <id>tag:blogs.amd.co.at,2009:/robe//1.15</id>

    <published>2009-05-12T17:19:14Z</published>
    <updated>2009-05-13T08:32:07Z</updated>

    <summary>As we&apos;ve seen in our previous example, doing log shipping with pg_standby can be quite a hassle if you take your slave servers regularly online to use them for queries and then want to resume replication again. The guys from...</summary>
    <author>
        <name>Michael Renner</name>
        <uri>http://amd.co.at/</uri>
    </author>
    
    <category term="pgrep" label="pgrep" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="postgresql" label="postgresql" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="replication" label="replication" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="walmgr" label="walmgr" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="de" xml:base="http://blogs.amd.co.at/robe/">
        <![CDATA[<p>As we've seen in our <a href="http://blogs.amd.co.at/robe/2009/05/testing-postgresql-replication-solutions-log-shipping-with-pg-standby.html">previous example</a>, doing log shipping with pg_standby can be quite a hassle if you take your slave servers regularly online to use them for queries and then want to resume replication again.</p>

<p>The guys from Skype were probably faced by exactly the same problems when they decided to write walmgr.</p>

<p>If you're not familiar with log shipping I strongly suggest to read the <a href="http://blogs.amd.co.at/robe/2009/05/testing-postgresql-replication-solutions-log-shipping-with-pg-standby.html">previous post</a> first.</p>]]>
        <![CDATA[<h1>walmgr?!</h1>

<p><a href="https://developer.skype.com/SkypeGarage/DbProjects/SkyTools/WalMgr">walmgr</a> is a tool written in Python, which eases deployment and maintenance of log shipping slaves. It provides easy one-shot-commands to create backups from running PostgreSQL servers, implements <span class="caps">WAL</span>-file-management (deleting files not needed anymore) and makes bringing slave-servers online for production use a breeze. </p>

<p>Furthermore it can also be configured to perioidically sync the currently used <span class="caps">WAL</span>-segment. This greatly reduces the amount of lost transactions when a slave server has to be brought online "as is".</p>

<h1>I'm sold! Please tell me how this works, Jim!</h1>

<h2>A few warning words upfront</h2>

<p>For basic setup of the virtual machines see the <a href="http://blogs.amd.co.at/robe/2009/05/testing-postgresql-replication-solutions-basic-setup.html">first article</a> in the series. Prepare the walmgr environment with</p>



<pre>
master1:~/pgworkshop# ./envorcer walmgr
</pre>



<p>walmgr relies on fairly extensive configuration files, pointing to all the necessary infrastructure to do it's magic. Additionally, you've to take care to do all operations as the "postgres" user, since walmgr does a lot of copying around and does not enforce correct ownership of all files it touches by itself. Permission issues can be tedious to work out and walmgr isn't especially helpful to point out which files/directories need to be corrected.</p>

<h2>A short word on configuration</h2>

<p>In /root/pgworkshop/walmgr reside all necessary tools, configuration files and documentation for walmgr. Most of the parameters in wal-[master|slave].ini are self-explanatory, the puzzling ones are documented in walmgr.txt.</p>

<p>The whole directory including the wal-slave.ini is copied to the slave server when running the "envorcer" script. The wal-master.ini is only used on the master server and the wal-slave.ini is only used on the slave server. Because of this, they contain a bit of redundant information.</p>

<h2>Setting up the master</h2>

<p>Since I've already prepared all the necessary configuration, we can dive right in.</p>

<p>First, we need to prepare the master server for log shipping with walmgr:</p>



<pre>
postgres@master1:/root/pgworkshop/walmgr$ ./walmgr.py wal-master.ini setup
</pre>



<p>This sets archive_command, enables archive_mode in the postgresql.conf of the given cluster and creates the directory structure needed by walmgr on the slave server. You should also set archive_timeout to 60 seconds to get some segment switching in our test scenario.</p>

<p>Then restart the PostgreSQL cluster ("pg_ctlcluster 8.3 walmgr restart") and start the pgexerciser.</p>

<p>At this point PostgreSQL happily writes transactions to it's <span class="caps">WAL, </span>whose segments get switched every 60 seconds and copied to the slave server as per the configuration in wal-master.ini.</p>

<h2>Starting recovery</h2>

<p>To start recovery on the slave server you just need to run:</p>



<pre>
postgres@slave1:/root/pgworkshop/walmgr$ ./walmgr.py wal-slave.ini restore data.master
</pre>



<p>It's important to explicitly specify the name of the backup (which can be listed/shown with the "listbackups" command) to have walmgr <em>copy</em> the backup from it's archive-directory ("/srv/walmgr-data") to the $PGDATA path; if you don't specify it walmgr will <em>move</em> the latest backup to $PGDATA, making the particular backup unavailable for any future recovery operations.</p>

<p>To see if the log shipping is working, see the "Doing some transactions" section in the <a href="http://blogs.amd.co.at/robe/2009/05/testing-postgresql-replication-solutions-log-shipping-with-pg-standby.html">previous post</a>.</p>

<h2>Bringing up the Slave</h2>

<p>If you want to use the slave server to do some actual work you have to bring it online first:</p>



<pre>
postgres@slave1:/root/pgworkshop/walmgr$ ./walmgr.py wal-slave.ini boot
</pre>



<p>This will stop recovery and bring the database online, voiding the copy for further recovery/replication use.</p>


<h2>Resuming Replication</h2>

<p>To resume the recovery operation a simple</p>



<pre>
postgres@slave1:/root/pgworkshop/walmgr$ ./walmgr.py wal-slave.ini restore data.master
</pre>



<p>does the trick. Be aware though, that PostgreSQL has to replay all <span class="caps">WAL</span>-files which have accumulated since the time the backup has been run. On databases with write-heavy loads this can take quite some time.</p>

<h2>Cutting the losses</h2>

<p>walmgr can also be daemonized to synchronize the currently active <span class="caps">WAL</span>-segment at periodic intervals. This reduces the amount of lost transactions from "transactions since last segment switch" to "transactions in the last $loop_delay seconds" when bringing the slave server online.</p>

<p>I suggest running the following command in a screen terminal:</p>



<pre>
postgres@master1:/root/pgworkshop/walmgr$ ./walmgr.py wal-master.ini syncdaemon
</pre>



<p>since walmgr won't detach from the terminal and inform you on what's happening on <span class="caps">STDOUT.</span></p>

<p>If you bring the slave online when syncdaemon is running, the most recent entry in the bid table shouldn't be older than the interval configured in the config file.</p>]]>
    </content>
</entry>

<entry>
    <title>Testing PostgreSQL replication solutions: Log shipping with pg_standby</title>
    <link rel="alternate" type="text/html" href="http://blogs.amd.co.at/robe/2009/05/testing-postgresql-replication-solutions-log-shipping-with-pg-standby.html" />
    <id>tag:blogs.amd.co.at,2009:/robe//1.14</id>

    <published>2009-05-02T19:52:00Z</published>
    <updated>2009-05-07T09:07:17Z</updated>

    <summary>Log shipping?! PostgreSQL offers support for &quot;shipping&quot; it&apos;s WAL, the Write Ahead Log, where the changes of every transaction are recorded, to other database systems. The other database system then reads the changes from the WAL file and applies the...</summary>
    <author>
        <name>Michael Renner</name>
        <uri>http://amd.co.at/</uri>
    </author>
    
    <category term="pgrep" label="pgrep" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="postgresql" label="postgresql" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="replication" label="replication" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="de" xml:base="http://blogs.amd.co.at/robe/">
        <![CDATA[<h1>Log shipping?!</h1>

<p>PostgreSQL offers support for "<a href="http://www.postgresql.org/docs/8.3/interactive/warm-standby.html">shipping</a>" it's <span class="caps">WAL, </span>the <a href="http://www.postgresql.org/docs/8.3/interactive/wal.html">Write Ahead Log</a>, where the changes of every transaction are recorded, to other database systems. The other database system then reads the changes from the <span class="caps">WAL </span>file and applies the changes to it's local data store.</p>

<p>Log shipping has the drawback that the slave servers can't be used for queries as long as they are replicating data and cannot be put back in replication after they've been taken online. Additionally the replication isn't very granular, PostgreSQL natively itself will accept only completed <span class="caps">WAL </span>files.</p>

<p>On the other hand this mechanism is very efficient and very reliable since the <span class="caps">WAL </span>is at the core of normal PostgreSQL operation.</p>]]>
        <![CDATA[<h2>The <span class="caps">WAL</span></h2>

<p>The <span class="caps">WAL </span>files of a PostgreSQL database can be found under $PGDATA/pg_xlog, in Debian $PGDATA is usually /var/lib/postgresql/&lt;VERSION&gt;/&lt;CLUSTERNAME&gt;. Every <span class="caps">WAL </span>segment is 16MiB in size (compile-time default) and it's name consists of three separate counters:</p>

<h3>Naming</h3>

<p>If we take the name "00000001000000030000008E" it tells us that the <a href="http://www.postgresql.org/docs/8.3/interactive/continuous-archiving.html#BACKUP-TIMELINES">timeline</a> of the file is "1", that it belongs to the logical log file (logid) "3" and that it's the 142th (0x8E) segment of the given logfile.</p>

<p>The segment counter increments with every segment switch, the logical log file is incremented (and the segment counter reset to 0) whenever a new segment would overflow the 32bit address space (or "4GiB") of a logical logfile. With a standard segment size of 16MiB this happens every 255 segments.</p>

<h3>Switching segments</h3>

<p>A <span class="caps">WAL </span>segment gets switched when one of the following things happen:</p>


<ul>
<li>it's full (16MiB worth of changes have been written)</li>
<li><a href="http://www.postgresql.org/docs/8.3/interactive/runtime-config-wal.html#GUC-ARCHIVE-TIMEOUT">archive_timeout</a> is exceeded</li>
<li><a href="http://www.postgresql.org/docs/8.3/interactive/functions-admin.html#FUNCTIONS-ADMIN-BACKUP-TABLE">pg_switch_xlog</a> is called</li>
</ul>



<h2>Replicating</h2>

<p>The mechanism used for reading in <span class="caps">WAL </span>files on a slave server is very close to the mechanism that is used when PostgreSQL recovers from an unclean shutdown:</p>

<p>The daemon doesn't know in what state the heap files (tables, indexes, etc.) are and therefore consults the <span class="caps">WAL, </span>where changes of every transaction are written to, replaying every transaction since the last <span class="caps">CHECKPOINT.</span></p>

<p>Because the same code-infrastructure is used, the replaying of <span class="caps">WAL </span>files is called "recovery mode".</p>

<h3>Shipping the files</h3>

<p>PostgreSQL has an <a href="http://www.postgresql.org/docs/8.3/interactive/runtime-config-wal.html#GUC-ARCHIVE-COMMAND">archive_command</a> parameter which can be used to configure a command which gets called after every segment switch. This makes it easy to copy completed <span class="caps">WAL </span>segments from the master server to a remote system with various mechanisms, e.g. nfs, scp, rsync, etc.</p>

<h3>Recovering</h3>

<p>To configure a server for recovery you need to place a file named "recovery.conf" into it's $PGDATA directory. A sample recovery.conf might look something like this:</p>



<pre>
restore_command = '/usr/lib/postgresql/8.3/bin/pg_standby -l -t /var/lib/postgresql/logship.trigger /srv/logship-archive %f %p'
log_restartpoints = 'true'

# for PITR
#recovery_target_time = '2009-04-21 19:00:00'
</pre>



<p>Additionally the server needs a consistent backup in it's $PGDATA directory and access to all <span class="caps">WAL </span>files that have been written since the backup.</p>

<p>When started in recovery mode, PostgreSQL will replay <span class="caps">WAL </span>files until the program referenced in <a href="http://www.postgresql.org/docs/8.3/interactive/continuous-archiving.html#RECOVERY-CONFIG-SETTINGS">restore_command</a> returns. After that it will take the database online, increment the timeline counter of the <span class="caps">WAL </span>file and effectively prevent that the current database can be used as target for recovery again. This is necessary, because modifications can happen to the tables as soon as the database is taken online.</p>

<h3>pg_standby</h3>

<p><a href="http://www.postgresql.org/docs/8.3/interactive/pgstandby.html">pg_standby</a> is a contrib tool that watches a given directory for new <span class="caps">WAL </span>files and makes these available to PostgreSQL via copying/linking the given files into it's pg_xlog directory.</p>

<p>When using pg_standby there are two main mechanisms for ending replication:</p>


<ul>
<li>"Pulling the trigger", meaning: creating the specified trigger file</li>
<li>Feeding an incomplete <span class="caps">WAL</span>-file: Imagine a crashed server that doesn't boot anymore: if you could salvage the active <span class="caps">WAL </span>segment and copy it to the recovery server, PostgreSQL will notice that the <span class="caps">WAL </span>segment is incomplete and perform it's normal startup procedure as well as incrementing the timeline.</li>
</ul>



<h3>Resuming recovery</h3>

<p>After a slave server has been taken online (and it's timeline was switched) you must copy a backup from the master server and create a new recovery.conf to resume log shipping operation.</p>

<h2>Doing it all</h2>

<p>Now that we know what to do and how these things work, let's break a few things!</p>

<h3>Preparing</h3>

<p>Preparing the environments should be rather easy, first make sure, that your machines are <a href="http://blogs.amd.co.at/robe/2009/05/testing-postgresql-replication-solutions-basic-setup.html">setup correctly</a>.</p>

<p>When both machines are running, run the following command:</p>



<pre>
master1:~/pgworkshop# ./envorcer logship
</pre>



<p>This creates a cluster named "logship" on both servers, creates a database for pgexerciser on master1 and installs it's schema to the database.</p>

<p>Additionally, it creates a directory on slave1 where the <span class="caps">WAL </span>files will be copied to, enables archive_mode among a few other settings on master1 and copies a base backup of the database &amp; an appropriate recovery.conf to slave1.</p>

<h3>Doing some transactions</h3>

<p>Start the databases on both servers with pg_ctlcluster and run pgexerciser (no arguments needed) on master1.</p>

<p>archive_timeout is set to 60 seconds, so a logswitch should occur every minute. This can be monitored in a few places:</p>


<ul>
<li>The "archiver" process on master1 and the "startup" process on slave1 will show in their processtitle what <span class="caps">WAL </span>file they have handled or are expecting next</li>
<li>PostgreSQL also keeps track of which files have already been copied on master1 in $PGDATA/pg_xlog/archive_status </li>
<li>The PostgreSQL logfile on slave1 (found in /var/log/postgresql/) will show when the <span class="caps">WAL </span>files have been processed</li>
</ul>




<h3>Breaking stuff</h3>

<p>Now it's up to you. You could either create the trigger file pg_standby watches, "killall -9 postgres" on the master and copy over the active <span class="caps">WAL </span>segment or try a <a href="http://www.postgresql.org/docs/8.3/interactive/continuous-archiving.html#RECOVERY-TARGET-TIME"><span class="caps">PITR</span></a> (Point in time recovery)</p>

<h3>Resuming recovery, this time for real</h3>

<p>After you took the slave online, use the following steps to get back into recovery mode:</p>



<pre>
slave1:~# killall -9 postgres
master1:~# psql postgres -c &quot;select pg_start_backup('foo')&quot;
master1:~# rsync -avH --delete --delete-excluded --exclude pg_xlog/*  /var/lib/postgresql/8.3/logship/ root@slave1:/var/lib/postgresql/8.3/logship
master1:~# psql postgres -c &quot;select pg_stop_backup()&quot;
master1:~# scp pgworkshop/configs/logship/recovery.conf root@slave1:/var/lib/postgresql/8.3/logship/
</pre>



<p>When you start the PostgreSQL cluster on slave1 again, it should start in recovery mode again. More on backing up PostgreSQL databases can be found in the <a href="http://www.postgresql.org/docs/8.3/interactive/continuous-archiving.html#BACKUP-BASE-BACKUP">documentation</a>.</p>]]>
    </content>
</entry>

<entry>
    <title>Testing PostgreSQL replication solutions: Basic Setup</title>
    <link rel="alternate" type="text/html" href="http://blogs.amd.co.at/robe/2009/05/testing-postgresql-replication-solutions-basic-setup.html" />
    <id>tag:blogs.amd.co.at,2009:/robe//1.13</id>

    <published>2009-05-02T16:43:00Z</published>
    <updated>2009-05-02T18:19:34Z</updated>

    <summary>I want to provide an introduction, annotated examples and an easy to setup test environment for a few common and &quot;simple&quot; PostgreSQL replication solutions. I planned on providing images, but after what I&apos;ve seen so far it seems to be...</summary>
    <author>
        <name>Michael Renner</name>
        <uri>http://amd.co.at/</uri>
    </author>
    
    <category term="pgrep" label="pgrep" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="postgresql" label="postgresql" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="replication" label="replication" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="de" xml:base="http://blogs.amd.co.at/robe/">
        <![CDATA[<p>I want to provide an introduction, annotated examples and an easy to setup test environment for a few common and "simple" PostgreSQL replication solutions.</p>

<p>I planned on providing images, but after what I've seen so far it seems to be much easier to just provide a HowTo ;).</p>

<h2>Prerequisites </h2>

<p>I chose Debian as test platform because I'm familiar with it and the PostgreSQL related packages are in excellent shape there.</p>

<p>What you need is</p>


<ul>
<li>two separate Debian Lenny instances</li>
<li>with the following packages installed:
<ul>
<li>postgresql postgresql-contrib postgresql-8.3-slony1 slony1-bin mercurial less libdbd-pg-perl libpoe-perl rsync psmisc ssh screen python-psycopg2 libstring-random-perl</li>
</ul>
</li>
<li>which are reachable via the respective hostnames "master1" and "slave1"</li>
<li>Where both the root and the postgres user from master1 can ssh into root and postgres on slave1</li>
</ul>

]]>
        <![CDATA[<h2>Using VirtualBox</h2>

<p>If you're using <a href="http://www.virtualbox.org/">Virtualbox</a> you can use this as a rough draft:</p>


<ul>
<li>Create master1 machine, 8GB dynamic disk, 256MB <span class="caps">RAM, </span>three <span class="caps">NIC</span>s:
<ul>
<li>Adapter 1: <span class="caps">NAT</span></li>
<li>Adapter 2: Internal Network "intnet"</li>
<li>Adapter 3: Host-Only network (optional, only needed if you don't like using VBox's console)</li>
</ul>
</li>
<li>Install <a href="http://cdimage.debian.org/debian-cd/current/i386/iso-cd/">Debian Lenny</a> (look for "netinst"), set hostname to master1, don't select any profiles in the tasksel screen since it's not necessary</li>
</ul>




<pre>
apt-get install postgresql postgresql-contrib postgresql-8.3-slony1 slony1-bin mercurial less libdbd-pg-perl libpoe-perl rsync psmisc ssh screen python-psycopg2 libstring-random-perl perl-doc
cd /root; hg clone https://workbench.amd.co.at/hg/pgworkshop/
ssh-keygen -q -t dsa -f ~/.ssh/id_dsa -N &quot;&quot;
cp /root/.ssh/id_dsa.pub /root/.ssh/authorized_keys
cp -a /root/.ssh /var/lib/postgresql
chown -Rv postgres:postgres /var/lib/postgresql/.ssh/
rm -v /etc/udev/rules.d/*-persistent-net*
echo &quot;10.1.0.11       slave1&quot; &gt;&gt; /etc/hosts
</pre>



<ul>
<li>Then configure eth1:</li>
</ul>




<pre>
cat &lt;&lt; HERE &gt;&gt; /etc/network/interfaces

auto eth1
iface eth1 inet static
address 10.1.0.10
netmask 255.255.255.0
HERE
</pre>



<ul>
<li>Stop the instance and snapshot it, for good measure</li>
<li>Add a second machine named slave1, identical configuration to master1, choose the <u>same</u> disk as master1. This will cause VirtualBox to use the state of master1 as snapshot source for slave1.</li>
<li>Boot slave1, change the IP of eth1 in /etc/network/interfaces to 10.1.0.11 and change the hostname in /etc/hostname to slave1</li>
<li>Boot master1, reboot slave1</li>
<li>ssh root@slave1, ssh postgres@slave1 from master1 should work now.</li>
</ul>



<p>And you're done!</p>

<h2>PostgreSQL on Debian</h2>

<p>Debian offers a few tools to manage multiple Postgres "clusters" (as in "instance").</p>

<p>"ls -l /usr/bin/pg_*cluster" shows all available commands, we will use pg_ctlcluster regularly to start, stop, restart or reload clusters.</p>

<h2>Custom tools</h2>

<p>I've written two tools to make testing replication scenarios easier. These can be found in the mercurial repository at <a href="https://workbench.amd.co.at/hg/pgworkshop/">https://workbench.amd.co.at/hg/pgworkshop/</a>. The tutorials assume that the repository has been checked out to "/root/pgworkshop".</p>

<h3>The envorcer</h3>

<p>There is a script called "envorcer", which is basically an "environment enforcer". It prepares the PostgreSQL databases &amp; needed configuration for the test cases.</p>

<p>It is very destructive, so it's got a hardcoded hostname check so that it can be only run from master1.</p>

<p>Running it without arguments shows a short usage example, the source code is pretty self-explanatory and fairly commented ;).</p>

<h3>The pgexerciser</h3>

<p>The pgexerciser is in the same directory as the envorcer and is used for exercising a given PostgreSQL database. See ./pgexerciser --help for documentation.</p>]]>
    </content>
</entry>

<entry>
    <title>PostgreSQL repliziert: Ein Workshop</title>
    <link rel="alternate" type="text/html" href="http://blogs.amd.co.at/robe/2009/05/postgresql-repliziert-ein-workshop.html" />
    <id>tag:blogs.amd.co.at,2009:/robe//1.12</id>

    <published>2009-05-02T15:40:03Z</published>
    <updated>2009-05-02T19:55:13Z</updated>

    <summary>Der Workshop ist nicht so gelaufen wie ich mir&apos;s erwartet habe. Ich war noch etwas gefertigt von der halben Grippe die ich mitgenommen habe, insgesamt waren nur drei Personen dabei1, Virtualbox hat bei niemandem out of the Box funktioniert (Fuck...</summary>
    <author>
        <name>Michael Renner</name>
        <uri>http://amd.co.at/</uri>
    </author>
    
    <category term="conference" label="conference" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="osdc" label="osdc" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="postgresql" label="postgresql" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="workshop" label="workshop" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="de" xml:base="http://blogs.amd.co.at/robe/">
        <![CDATA[<p>Der Workshop ist nicht so gelaufen wie ich mir's erwartet habe.</p>

<p>Ich war noch etwas gefertigt von der halben Grippe die ich mitgenommen habe, insgesamt waren nur drei Personen dabei<sup class="footnote"><a href="http://blogs.amd.co.at/robe/2009/05/postgresql-repliziert-ein-workshop.html#fn1">1</a></sup>, Virtualbox hat bei <u>niemandem</u> out of the Box funktioniert (Fuck this, I'm going back to VMware) und mit dem Scoping ist's bei einer sehr kleinen, Erfahrungsmässig weit verstreuten Gruppe, auch immer extrem schwer.</p>

<p>Damit der Rest der Welt aber auch etwas davon hat (und ich den beiden auch noch was zum durchtesten geben kann) werde ich den Praxis-Teil in ein paar Artikeln aufbereiten und hier zur Verfügung stellen.</p>

<p>Die Slides vom Workshop gibts mal <span class="mt-enclosure mt-enclosure-file" style="display: inline;"><a href="http://blogs.amd.co.at/robe/2009/05/02/osdc-pgreplikation-web.pdf">hier</a></span></p>


<p class="footnote" id="fn1"><sup>1</sup> Und nach 'ner Stunde war ich mit Kristian allein, weil ein Teilnehmer seinen Zug erwischen musste und ein anderer ob des dysfunktionalen Virtualboxes lieber noch einen Talk erwischen wollte.</p>]]>
        
    </content>
</entry>

<entry>
    <title>The OSDC 2009 is over</title>
    <link rel="alternate" type="text/html" href="http://blogs.amd.co.at/robe/2009/05/the-osdc-2009-is-over.html" />
    <id>tag:blogs.amd.co.at,2009:/robe//1.11</id>

    <published>2009-05-02T15:27:00Z</published>
    <updated>2009-05-02T15:31:17Z</updated>

    <summary>It was a nice conference, the guys and gals from Netways surely know how to run an event. It&apos;s all the nice little details which make up a great experience1. I was also surprised by NH Hoteles, the Nuremberg City...</summary>
    <author>
        <name>Michael Renner</name>
        <uri>http://amd.co.at/</uri>
    </author>
    
    <category term="conference" label="conference" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="drbd" label="drbd" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="netways" label="netways" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="osdc" label="osdc" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="postfix" label="postfix" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="puppet" label="puppet" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="spamassassin" label="spamassassin" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="de" xml:base="http://blogs.amd.co.at/robe/">
        <![CDATA[<p>It was a nice conference, the guys and gals from <a href="http://www.netways.de/">Netways</a> surely know how to run an event. It's all the nice little details which make up a great experience<sup class="footnote"><a href="http://blogs.amd.co.at/robe/2009/05/the-osdc-2009-is-over.html#fn1">1</a></sup>.</p>

<p>I was also surprised by NH Hoteles, the <a href="http://www.nh-hotels.com/nh/en/hotels/germany/nurnberg/nh-nuernberg-city.html">Nuremberg City</a> one greeted us with one of the most attractive parking garages I've ever seen (_very_ clean, "follow me"-lines on the floor, automatic hinged safety doors, complimentary window cleaning for hotel guests, etc.) and the hotel lived up to the standards it set in it's garage ;). The only problem I noticed was that the dining area was constantly understaffed for the 70-something people which attended the conference.</p>

<p>The lineup of the conference was quite nice although I prefer "war stories" told from real world scenarios over feature presentations of a single solution. Fortunately <a href="http://kris.koehntopp.de/">Kristian Köhntopp</a> was able to speak about his experiences from his times as a MySQL consultant and the stuff he's doing over at booking.</p>

<p>More on that after the break.</p>

<p class="footnote" id="fn1"><sup>1</sup> A few examples:</p>


<ul>
<li>Everything planned &amp; communicated in advance, no uncertainties on what/how/where</li>
<li>Taped down cables everywhere</li>
<li>Enough power sockets for laptops and other gadgets in every conference room and the lounge area</li>
<li>Nameplates on the speakers podests</li>
<li>Constantly refilled/replaced bottles &amp; glasses</li>
</ul>

]]>
        <![CDATA[<p>The highlights of the conference for me were (in chronological order):</p>

<h2>Puppet</h2>

<p><a href="http://madstop.com/">Luke Kanies</a> (<a href="http://reductivelabs.com/">Reductive Labs</a>) talked a bit about <a href="http://reductivelabs.com/products/puppet/">Puppet</a>, which most of the attendees already knew. It's still the best configuration management solution for heterogenous environments where the "foil ball" approach (his words!) of golden master images don't cut the slack anymore. Another part of his talk was targeted about how the Puppet development approach and community integration is way better than what he experienced with the author(s) of cfengine back then, which eventually caused him to start his own thing. Puppet shows progress in critical areas (dropping <span class="caps">XML</span>-RPC in favor of <span class="caps">REST </span>to increase performance especially when serving static files) but still has a long way to come. One of the issues Kristian mentioned is, that Facter only supports scalar values natively and no complex data structures. This is very limiting when you need to analyze complex data structures e.g. the <span class="caps">LVM </span>configuration of a server.</p>

<h2><span class="caps">DRBD, </span>the stuff that was formerly known as Heartbeat &amp; <span class="caps">KVM</span></h2>

<p><a href="http://blogs.linbit.com/florian">Florian Haas</a> (<a href="http://www.linbit.com/en/">Linbit</a>, the company behind <a href="http://www.drbd.org/"><span class="caps">DRBD</span></a>) showed how Virtualization &amp; HA play together with the building blocks being <span class="caps">KVM,</span> Pacemaker, OpenAIS and <span class="caps">DRBD.</span> He talked a bit about the infighting in the Linux-HA/Heartbeat community, which eventually lead to the current Pacemaker &amp; OpenAIS solution (which is not yet available in stock Debian systems). One of the issues full and paravirtualization techniques have over container-based solutions like OpenVZ and Solaris Zones is performance. He presented <a href="http://www.percona.com/ppc2009/PPC2009_virtual_block_perf.pdf">a few slides</a> from his talk at the <a href="http://conferences.percona.com/percona-performance-conference-2009/schedule.html">Percona Performance Conference 2009</a>, showing latency issues in <span class="caps">KVM, </span>which are very bad in systems with large amounts of unbatched transactions. Since his results were only a week old it's too early to comment about the reasons and resolutions, the bottom line was that it might be too early to bury Xen until these things are resolved.</p>

<p>Over a talk with Florian I was finally able to stop worrying and love shared-nothing architectures. Florian told me that my association of <span class="caps">DRBD </span>with "something to keep services on shoddy hardware online" wasn't too far-fetched, since the first version of <span class="caps">DRBD </span>was written out of the need to run complex computation jobs on rickety machines in a CS lab without loosing the complete calculation if one of the nodes hit the dust in A Bad Way. But since then <span class="caps">DRBD </span>has evolved considerably since then and with the overwhelmingly positive feedback of other conference attendees and <span class="caps">DRBD'</span>s availability in stock Debian and RedHat distributions I'm finally convinced that it's A Good Thing ;).</p>

<h2>Incubation completed in 3... 2... 1...</h2>

<p>The other talks of the day that I attended weren't that interesting and the latent flu I brought with me from last week finally started to kick in, causing me to call it a day at 19:00 and sweat through the night.</p>

<h2>Systematic management of 1000 heterogenous nodes</h2>

<p>On the second day <a href="http://kris.koehntopp.de/">Kristian Köhntopp</a> (<a href="http://www.booking.com/">Booking.com</a>) started the day with a talk about how they do systems (and database) management at their shop (in a hurry: HP hardware for easy deployment and <span class="caps">MAC </span>address management, <span class="caps">PXE </span>and atftpd (with custom database backend) in combination with Kickstart for basic setup, puppet and yum for everything else). The basic system they install, which is identical for every server, is a minimum CentOS installation with a Puppet client, all customization is done afterwards with puppet. Kris also told some stories about outstanding Puppet issues (Facter/Puppet only handling scalar values, random Facter state corruptions, horrible fileserving with Puppet 0.24, etc.) but it is still the best tool for the job and way more flexible than cfengine which, they used previously.</p>

<h2>Why has MySQL still got a market share in professional environments?</h2>

<p>I had an interesting talk with Kris over a glass of peach juice (NH is sooo exclusive!) about why MySQL's oddities don't hurt that much in <a href="http://queue.acm.org/detail.cfm?id=1394128"><span class="caps">BASE</span></a>-environments and why a simple and (somewhat) flexible replication solution is of utmost importance in such scenarios. Expect more on that topic in this blog in the future. I won't say that I'm convinced that MySQL is the best solution in those environments, but at least now I  understand that it's a viable choice ;).</p>

<h2>Postfix ate my Spam!</h2>

<p><a href="http://kuehnast.com/s9y/">Charly Kühnast</a> (RZ Niederrhein) then presented his Postfix-based spam filtering solution. I forgot the exact numbers, but it looked very promising. The basic components were (from the top of my head, I think he talked about 6 tiers but was only able to remember 5...)</p>


<ul>
<li>Policyd with <span class="caps">RBL</span>s</li>
<li>Header checks (HELO, sender/recipient verification, etc.)</li>
<li>SpamAssassin</li>
<li>FuzzyOCR</li>
<li>ClamAV with custom definition files</li>
</ul>



<p>which were quite effective in combination and very low in maintenance requirements by his own words.</p>

<h2>Wrapping it up</h2>

<p>After that it was time for my workshop (more on that later) followed by discussions and a final beer with the guys from Netways and the few attendees which were still around.</p>

<p>I hit the road with <a href="http://michael-prokop.at/">Mika</a> at 18:15 and we were back in Vienna 5 hours later...</p>]]>
    </content>
</entry>

<entry>
    <title>Graphing heterogenous data sets with multiple axes</title>
    <link rel="alternate" type="text/html" href="http://blogs.amd.co.at/robe/2009/03/graphing-related-data-sets-with-multiple-axes.html" />
    <id>tag:blogs.amd.co.at,2009:/robe//1.10</id>

    <published>2009-03-15T21:44:47Z</published>
    <updated>2009-03-16T00:45:01Z</updated>

    <summary>A while ago I wrote a small script which runs benchmarks against given filesystems and collects performance data for each run. What I wanted to find out is, how expensive (IO-wise) various standard filesystem operations are. The collected informations proved...</summary>
    <author>
        <name>Michael Renner</name>
        <uri>http://amd.co.at/</uri>
    </author>
    
    <category term="benchmark" label="benchmark" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="filesystem" label="filesystem" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="graphs" label="graphs" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="linux" label="linux" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="de" xml:base="http://blogs.amd.co.at/robe/">
        <![CDATA[<p>A while ago I wrote a small script which runs benchmarks against given filesystems and collects performance data for each run. What I wanted to find out is, how expensive (IO-wise) various standard filesystem operations are.</p>

<p>The collected informations proved to be quite extensive and very hard to visualize.</p>]]>
        <![CDATA[<h2>But why?!</h2>

<p>I always wondered how much faster a given filesystem is for specific tasks and more importantly - why? </p>

<p>Some dogmas which exist are:</p>


<ul>
<li>ext2 is fast for sequential I/O</li>
<li>reiserfs is fast for handling many small files</li>
<li>xfs is fast for deletes</li>
<li>everything except the ext* family of filesystems will eat your data for breakfast at the slightest chance of blockdevice or kernel issues</li>
</ul>



<p>but will those live up to scrutiny?</p>

<h2>The benchmarks</h2>

<p>What I did was to define some basic isolated filesystem workloads which are supposed to benchmark different areas of a given filesystem. What I came up with was:</p>


<ul>
<li>write a 4GB file with cp</li>
<li>read a 4GB file with cp</li>
<li>delete a 4GB file</li>
<li>create many files (untar 2.6.[0,5,10,15,20,25] linux sources)</li>
<li>read many files (rsync given files to an empty directory)</li>
<li>stat many files (rsync given files to the previously filled directory)</li>
<li>delete many files (delete given files)</li>
</ul>



<p>The tested filesystem was unmounted between each run to simulate cold caches. I collected the extended io statistics from <code>/proc/diskstats</code>, the interesting bits being the amount of IOs and the sectors read/written during the run as well as the total duration.</p>

<p>The system I used for testing is a Athlon64 <span class="caps">X2, </span>running Debian Lenny with stock kernel. The filesystems were created on a very dated <a href="http://www.seagate.com/support/disc/specs/ata/st3200822a.html">Seagate Barracuda 7200.7</a>.</p>

<p>Although these tests are highly unscientific<sup class="footnote"><a href="http://blogs.amd.co.at/robe/2009/03/graphing-related-data-sets-with-multiple-axes.html#fn1">1</a></sup> they already yielded some very interesting, and much more importantly, reproducible results.</p>

<h2>The data</h2>

<p>A sample result set from one benchmark run can be found <a href="http://nopaste.narf.at/f49e727c5">here</a>. If you've got a high pain threshold (and/or a soft spot for raw numbers) you can already deduce some interesting facts from this list, e.g. that ext4 is much faster than ext3 for most operations, or that xfs is embarrassingly slow when creating many files. But to get a big picture of what's going on here you need to visualize the data.</p>

<p>I did a bit of sketching and came up with something like this:</p>

<p><span class="mt-enclosure mt-enclosure-image" style="display: inline;"><a href="http://blogs.amd.co.at/robe/assets_c/2009/03/fs-graph-sketch-12.html" onclick="window.open('http://blogs.amd.co.at/robe/assets_c/2009/03/fs-graph-sketch-12.html','popup','width=1536,height=2048,scrollbars=yes,resizable=yes,toolbar=no,directories=no,location=no,menubar=no,status=no,left=0,top=0'); return false"><img src="http://blogs.amd.co.at/robe/assets_c/2009/03/fs-graph-sketch-thumb-200x266-12.jpg" width="200" height="266" alt="fs-graph-sketch.jpg" class="mt-image-none" style="" /></a></span></p>

<p>I wanted to stack identical units (e.g. read &amp; write IOs or read &amp; written sectors) to form a single bar and preserve space this way. Additionally I wanted to group the bars together to make comparison easier and improve the overall graph layout. To make things even more complicated I wanted to combine three different units (IOs, bytes and seconds) on a single graph.</p>

<p>After a bit of reading I found out that the result is supposedly called a "grouped, stacked bar graph with variable y axes". That was my goal.</p>


<h2>Tools of trade</h2>

<p>Having hardly any experience with data visualization I turned to <a href="http://www.gnuplot.info/">gnuplot</a> and got disappointed. Only up to two axes per graph and dimension, sparse documentation for the things I wanted to achieve and a mailinglist which never accepted my "anonymous" <a href="http://gmane.org/">Gmane</a> post which was stuck in the moderation queue.</p>

<p>The various Flash rendering frameworks like <a href="http://code.google.com/apis/chart/">Google Chart</a> seemed promising but didn't live up to my rather specific expectations.</p>

<p>Then a friend of mine pointed me to <a href="http://www.sigmaplot.com/">SigmaPlot</a> which he used for his diploma thesis and spoke highly of.</p>

<p>I gave it a try, and after a bit of trial and error (and dropping the stacked bar requirement) I had my first graphs. Implementing multiple axes isn't too easy with SigmaPlot either (and seems very "bolted on" rather than nicely integrated), but at least I had my first visualized data sets.</p>

<h2>The results</h2>

<p><span class="mt-enclosure mt-enclosure-image" style="display: inline;"><a href="http://blogs.amd.co.at/robe/assets_c/2009/03/testplot-15.html" onclick="window.open('http://blogs.amd.co.at/robe/assets_c/2009/03/testplot-15.html','popup','width=1228,height=887,scrollbars=yes,resizable=yes,toolbar=no,directories=no,location=no,menubar=no,status=no,left=0,top=0'); return false"><img src="http://blogs.amd.co.at/robe/assets_c/2009/03/testplot-thumb-200x144-15.png" width="200" height="144" alt="testplot.png" class="mt-image-none" style="" /></a></span></p>

<p>This graph was the first I did and was grouped by filesystem because this was much easier to accomplish.</p>

<p><span class="mt-enclosure mt-enclosure-image" style="display: inline;"><a href="http://blogs.amd.co.at/robe/assets_c/2009/03/testplot2-18.html" onclick="window.open('http://blogs.amd.co.at/robe/assets_c/2009/03/testplot2-18.html','popup','width=1228,height=927,scrollbars=yes,resizable=yes,toolbar=no,directories=no,location=no,menubar=no,status=no,left=0,top=0'); return false"><img src="http://blogs.amd.co.at/robe/assets_c/2009/03/testplot2-thumb-200x150-18.png" width="200" height="150" alt="testplot2.png" class="mt-image-none" style="" /></a></span></p>

<p>The second graph resembles the one I drafted in the beginning, minus the stacked bar graphs (which is a pity, since there's interesting information lost<sup class="footnote"><a href="http://blogs.amd.co.at/robe/2009/03/graphing-related-data-sets-with-multiple-axes.html#fn2">2</a></sup>).</p>

<p>So what do these graphs tell us? For the given test case (create a few hundred thousand files):</p>


<ul>
<li>ext4's performance is almost identical to ext2's, which is great to hear</li>
<li>The amount of sectors need to be read and written are pretty closely grouped except for xfs (with reiserfs setting the lower boundary)</li>
<li>for the ext* family and reiserfs the amount of IOs correlated with the overall runtime</li>
<li>Both xfs and jfs seem unsuitable for general usage, at least with standard mkfs and mount parameters on Debian Lenny.</li>
</ul>



<h2>And now?</h2>

<p>To be honest, I'm not too fond of the results I got. The amount of time necessary to get the graphs in question seems prohibitively high. Also, the results will never satisfy all people since they're rather static and may contain too much "noise" or not the right combination of data points for a given question you want to answer.<sup class="footnote"><a href="http://blogs.amd.co.at/robe/2009/03/graphing-related-data-sets-with-multiple-axes.html#fn3">3</a></sup></p>

<p>If you've got any suggestions on different tools or approaches these would be highly appreciated.</p>

<p>And if I don't get any new input I'll eventually re-run the benchmarks with a more recent kernel (adding a stable <a href="http://ext4.wiki.kernel.org/">ext4</a> and <a href="http://btrfs.wiki.kernel.org/">btrfs</a> to the mix), check if the jfs and xfs results are representative and last but not least average a few iterations and increase the working set to get solid results.</p>

<p>And always remember: </p>




<a href="http://www.toothpastefordinner.com/"><img alt="toothpaste for dinner" src="http://www.toothpastefordinner.com/031509/the-more-graphs-you-make.gif" width="600" height="372" border=0 /></a><br /><a href="http://www.toothpastefordinner.com">toothpastefordinner.com</a>




<h2>Scripts, etc.</h2>

<p>In case you want to run your own benchmarks, you can find the highly undocumented and uncommented scripts <a href="https://workbench.amd.co.at/hg/benchmarks/">here</a>. Basic instructions for creating the graphs with SigmaPlot can be found <a href="https://workbench.amd.co.at/dokuwiki/doku.php?id=fs_blockdevice_overhead">here</a>. </p>


<h3>Footnotes</h3>

<p class="footnote" id="fn1"><sup>1</sup> What's the buffer size which cp uses for copying? Am I stalled by reading/writing from/to the "helper" filesystems? Are the collected numbers representative for "normal" usage with warm caches?</p>

<p class="footnote" id="fn2"><sup>2</sup> E.g. "How many read IOs does a filesystem need to do to delete a single large file?"</p>

<p class="footnote" id="fn3"><sup>3</sup> Interestingly these are similar issues which you will also have when comparing tools like <a href="http://munin.projects.linpro.no/">Munin</a> and <a href="http://www.zabbix.com/">Zabbix</a>. The former is rather easy to set up but will bite you when you try the simplest form of data correlation, especially for older data sets. The latter is a huge <span class="caps">PITA </span>to set up but offers very sophisticated and dynamic tools for data analysis and correlation.</p>]]>
    </content>
</entry>

</feed>
