tag:blogger.com,1999:blog-8532065960756482590.post8951008304523784002..comments2024-03-18T06:58:46.073-04:00Comments on Jermdemo: Big-Ass Servers™ and the myths of clusters in bioinformaticsJermdemohttp://www.blogger.com/profile/01662705354227625640noreply@blogger.comBlogger17125tag:blogger.com,1999:blog-8532065960756482590.post-59905119395810463402021-10-15T15:32:02.337-04:002021-10-15T15:32:02.337-04:00This comment has been removed by a blog administrator.Unow22https://www.blogger.com/profile/01834968935925513098noreply@blogger.comtag:blogger.com,1999:blog-8532065960756482590.post-40557316505305039802020-09-22T22:38:44.759-04:002020-09-22T22:38:44.759-04:00This comment has been removed by a blog administrator.Simon Havertzhttps://www.blogger.com/profile/04807511877021213955noreply@blogger.comtag:blogger.com,1999:blog-8532065960756482590.post-24735652045393598582020-09-22T22:38:08.517-04:002020-09-22T22:38:08.517-04:00This comment has been removed by a blog administrator.Simon Havertzhttps://www.blogger.com/profile/04807511877021213955noreply@blogger.comtag:blogger.com,1999:blog-8532065960756482590.post-27117616244373063762018-10-01T08:19:24.008-04:002018-10-01T08:19:24.008-04:00This comment has been removed by a blog administrator.Tejutejuhttps://www.blogger.com/profile/03536889753125110390noreply@blogger.comtag:blogger.com,1999:blog-8532065960756482590.post-79308080527826868982017-10-13T11:46:39.665-04:002017-10-13T11:46:39.665-04:00This comment has been removed by a blog administrator.Indian Sex Bazarhttps://www.blogger.com/profile/13744336224533690190noreply@blogger.comtag:blogger.com,1999:blog-8532065960756482590.post-45686622342942697562016-07-20T05:16:43.961-04:002016-07-20T05:16:43.961-04:00This comment has been removed by a blog administrator.Veronica Phillipihttps://www.blogger.com/profile/02779591600415973534noreply@blogger.comtag:blogger.com,1999:blog-8532065960756482590.post-38265984632683718532015-07-04T04:43:31.000-04:002015-07-04T04:43:31.000-04:00This comment has been removed by a blog administrator.neha kumarihttps://www.blogger.com/profile/18182887762599527246noreply@blogger.comtag:blogger.com,1999:blog-8532065960756482590.post-11343401909590800032013-07-24T10:11:56.846-04:002013-07-24T10:11:56.846-04:00wow: http://vimeo.com/64637406wow: http://vimeo.com/64637406Jermdemohttps://www.blogger.com/profile/01662705354227625640noreply@blogger.comtag:blogger.com,1999:blog-8532065960756482590.post-65393673737592587282012-03-29T19:47:51.318-04:002012-03-29T19:47:51.318-04:00Yannick Wurm tells me the British equivalent to &q...Yannick Wurm tells me the British equivalent to "big-ass servers" is "fuck-off big machines". This is good to know.<br />http://biostar.stackexchange.com/questions/18873/peer-reviewed-justification-for-big-ass-serversJermdemohttps://www.blogger.com/profile/01662705354227625640noreply@blogger.comtag:blogger.com,1999:blog-8532065960756482590.post-69514138402477431252012-01-10T12:16:56.075-05:002012-01-10T12:16:56.075-05:00The discussion is reignited once again on biostar:...The discussion is reignited once again on biostar:<br />http://biostar.stackexchange.com/questions/16129/big-ass-servers-storageJermdemohttps://www.blogger.com/profile/01662705354227625640noreply@blogger.comtag:blogger.com,1999:blog-8532065960756482590.post-72610783401638406352011-07-14T13:32:00.445-04:002011-07-14T13:32:00.445-04:00Good post! Going to blog this on kevin-gattaca.blo...Good post! Going to blog this on kevin-gattaca.blogspot.com<br /><br />I almost agree with you totally? But I think BAS isn't a very shareable resource. <br />it is useful to have loads of ram and loads of cores for one person's use. But when it is shared, you have a hard time juggling resources in a fair manner especially in Bioinformatics where walltimes and ram requirements are known post analysis. <br />That said Cloud computing is having trouble keeping up with I/O bound stuff like bioinformatics, and smaller cloud computing services are all trying to show that they have faster interconnects, but you can't really beat a BAS that's on a local network.Kevinhttps://www.blogger.com/profile/00419554940694709908noreply@blogger.comtag:blogger.com,1999:blog-8532065960756482590.post-88101369160848199262011-07-07T16:54:50.013-04:002011-07-07T16:54:50.013-04:00Interesting post
Some points:
1: You don't h...Interesting post<br /><br />Some points:<br /><br />1: You don't have to program in java to do hadoop, it supports scripting languages via hadoop streaming and other frameworks (Dumbo etc)<br /><br />2: You mention I/O as a bottleneck. What is the I/O subsystem that feeds your BAS? One of the reasons why people move to clusters is I/O from manay internal disks/spindles typically outperforms <br /><br />3:What is your plan when you outgrow your BAS? Buy a bigger one for 75K? What if there is no bigger one?<br /><br />4: You mention "Many bioinformatics programs are not even threaded, much less designed to work amongst several machines." <br /><br />How are such programs going to benefit from a BAS?Fatal Errorhttps://www.blogger.com/profile/05977957268634068989noreply@blogger.comtag:blogger.com,1999:blog-8532065960756482590.post-10655797925400782662011-07-07T13:28:52.380-04:002011-07-07T13:28:52.380-04:00I generally agree with the post, but an going to c...I generally agree with the post, but an going to come from the contrarian viewpoint in general.<br /><br />A BAS makes sense under a few conditions. Your utilization is predictable, and high, and you can take some downtime risk. For many applications that is more than sufficient.<br /><br />Having said that, there are some flaws in the argument<br />1. BLAST is hardly a high end application, and MPIBlast shouldn't exist. It does because we don't know how to write distributed systems. <br /><br />2. At scale, BAS is always a bad idea cause hardware will fail. You should buy cheaper servers and make your software fault tolerant, so unless you are an occasional user and working with small data sizes (which is absolutely true for many people), you should consider other options.<br /><br />3. Google does a LOT more than search. They have different data stores and processing engines optimized for different problems precisely cause they have so many. Their problem set and data complexity is beyond anything NGS has to deal with (and boy should we be paying attention). <br /><br />There are two real nits I do have. <br /><br />1. We are terrible programmers and the quality of our science is going to suffer as long as we remain there. You are arguing for lazyness and a lack of interest in solving hard computational problems. Then all the smart people are going to keep going to Facebook cause their skills get appreciated there.<br /><br />2. Parallel programming is not one thing. We need a distributed systems approach, i.e. assume your networks are crappy, disk is slow and compute is fast and cheap. You can bet that the infrastructure at Google is way way way cheaper than the University cluster. Yes, they need to invest in the software, but there are lessons to be learned. Otherwise, you incur a ton of long term debt. <br /><br /><br />Of course, as scientists, we don't care about that.Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-8532065960756482590.post-82416017704400443352011-06-25T09:42:06.452-04:002011-06-25T09:42:06.452-04:00EC2 is great (with or without Galaxy) and for some...EC2 is great (with or without Galaxy) and for some groups the large memory instance will suffice as a big (or medium) ass server. But for lots of applications including BLASTing against a big database (>64Gb) or doing complex assemblies, you will need more memory. It may be that EC2 will offer larger instances in the future. But I completely agree with the original post - a big-ass server is often a way better investment than a cluster. Unfortunately the guys that run high-performance compute facilities tend to be stuck in a cluster mindset, and that may not be the best solution for a lot of bioinformatics projects.<br /><br />Jerm - I am definitely going to consider getting one of those bad boys!Nick Lomanhttps://www.blogger.com/profile/12121179953421841062noreply@blogger.comtag:blogger.com,1999:blog-8532065960756482590.post-39214460457106851752011-06-24T14:02:34.621-04:002011-06-24T14:02:34.621-04:00Of course it is like buying 20% of a BAS for price...Of course it is like buying 20% of a BAS for price which is about 40% of a BAS price. Moreover, it is limited in terms of size (68.4 GB of RAM in case of Amazon). However, we don't have human resources to maintain (invite tenders, backup, update, etc.) either a BAS or a cluster. We have only two bioinformaticians in the lab and I am one of them. I suppose it would be one of us responsible for maintaining the server...Marcin Piechotahttps://www.blogger.com/profile/14045995862925193174noreply@blogger.comtag:blogger.com,1999:blog-8532065960756482590.post-14152519254101366622011-06-24T10:56:28.022-04:002011-06-24T10:56:28.022-04:00Hi Marcin,
I don't understand how a high-memor...Hi Marcin,<br />I don't understand how a high-memory virtual machine can effectively span multiple nodes, so there must be an actual physical BAS that is supported by your research. So it kind of sounds like you have, albeit indirectly, bought 20% of a BAS.<br />-jermJermdemohttps://www.blogger.com/profile/01662705354227625640noreply@blogger.comtag:blogger.com,1999:blog-8532065960756482590.post-20507851797427298022011-06-24T04:42:23.755-04:002011-06-24T04:42:23.755-04:00The first answer: Galaxy - it makes easy to parall...The first answer: Galaxy - it makes easy to parallel everything. The second answer: one large VM in Amazon EC2. In Poland we have also central scientific computational facility that delivers among others large virtual machines in similar way to Amazon.<br />We have heavy computations once per week, so we only utilize about 20% of available computational time. So there is no need to buy costly BAS.Marcin Piechotahttps://www.blogger.com/profile/14045995862925193174noreply@blogger.com