Computational Genomics Group
  • Home
  • Research
  • Publications
  • Teaching
  • Blog
  • Group Members
  • News
  • Computational Biology Book
  • Data Analysis with R Book
  • CG2 github
  • Fiction

Covid19 Epidemic. Doubling times responding to lock-downs

3/25/2020

0 Comments

 
In a recent blog post about the Covid 19 epidemic I have tried to tackle the problem of variability in the reporting of SARS-CoV-2 confirmed cases by estimating the case doubling times for various countries. A few things of note from that analysis included:

a. Very different doubling times between countries that suggested that some have effectively managed to slow down the spread of the virus. China in that respect had already succeeded in almost putting the epidemic to a halt and so was not included in the analysis.
b. Even though the estimated doubling times were rather stable they showed an interesting dynamics which called for a more careful analysis. This is true for almost everything that has to do with this epidemic, which is the first of this scale to be monitored in real time.
c. There was no correlation between number of cases and doubling times (Italy being the most problematic case with high number of infected people and low doubling time). There was also no correlation between the doubling times and the days since the first case was reported in each country. This means that even in countries that the "spill" occurred more or less at the same time, the situation has not been developing in the same way.

All of the above imply that a more careful examination and the factoring in of other aspects is required. In this post I have tried to dissect the different modes of the developing situation in countries with more than 300 cases (as of March 24th) excluding China and South Korea that appear to have contained the situation (and hopefully will remain thus).

In the plot below I have created the same graph as in the previous post with an update on the data four days after. A number of things regarding the dynamics can be seen by comparing this plot to this one for March 21st. Italy is showing signs of slowing down the spread moving up the y-axis (Doubling Time, DT). France and Germany have done this quicker but Spain has not done much (with a constant DT of <4). The US is the most worrying case (as many have already noted). With a big population and a doubling time of < 3 it is a question of days before they become the center of the outbreak.

One main difference of this plot with the previous one is the colouring of the counties, which now represents the time since the first reported case (in weeks). All countries with more than 10k cases belong to the category of "early outbreaks" and are thus coloured light blue. But not all "early outbreak" countries have many cases and this is largely due to their high doubling times (Japan and Singapore the prime examples). In the plot below, I have tried to group countries based on the combination of doubling times and weeks since first case, by drawing regression lines that represent a (sort of) Slow-Down Rate (SDR). The higher the slope the most effective the slowing down of the spread is. You may think of the SDR as the fight to increase the doubling time while dealing with an increasing number of cases. For most epidemics the natural cause of things is that doubling times increase (and the spread slows down) with the accumulation of cases, but unfortunately this only happens after a large percentage of the population has been infected. As we try to flatten the curve we cannot afford this and so we hope that this slowing-down will happen with case isolation through quarantine. It remains to be seen if points on this plot show a trend of moving fastly towards the bottom right (bad scenario) or slowly towards the top right (good scenario). 

Through four SDR regression lines, I have split the plot in four areas, each of which contains at least one country where the outbreak was detected early. this means that different SDRs cannot be attributed to insufficient time for observations. In this way, we can then correlate the development of the situation in each country-group with other characteristics and one such is the mitigation measures that have been imposed, as we will see right after.
 

Looking into the plot below (and note that this is a rough, eye-balling split) we can see four different groups. From best to worst, Japan and Singapore have SDR > 12 as do Quatar, Slovenia and other countries which are -nonetheless- late in reporting a case. Their goal is to stay in that area of the plot. A large group with SDR between 6.5 and 4.5 includes the Scandinavian countries (except Denmark which was late in reporting a case and is doing rather well) and a number of other "late outbreak" countries. Some struggle to stay in this area (Belgium, Israel, the Netherlands and Australia) while others (like Greece and Iceland) may be more optimistic. Unfortunately the countries that represent more than 90% of the active cases worldwide lie in SDR areas that are below 4.5. Italy is fighting to get into the "OK" zone of SDR=4.5-6.5 but Germany, France and Spain (in particular) are far from that. The US is even further down. The spread shows no sign of slowing down. Quite the opposite. 
Picture
The question then is: What is making Japan and Singapore so successful and what will help Belgium and the Netherlands avoid the fate of Italy? One answer that has been suggested by the short-term history of this developing situation is: extreme self-isolation measures with effective lock-downs. And this is in fact the approach that most of the countries have been adopting. The best example for Its efficiency as a means of slowing down the spread (even to a halt) has come from China that has now effectively contained the virus. Below we see the steady increase of doubling time in China from January 24th (the first date for which we have data) to February 8th. There is a marked increase in doubling time that more or less starts before the decision for a full lock-down in Hubei province (the epicenter of the outbreak). While the slowing down was already under way before the measures were put into place there is a notable increase of the curve, from February 3rd, which comes a week after the lock down. 

To say we have learned from China is easy but to actually do it like them is more complicated. A number of special characteristics made it much easier to impose and maintain the lock-down in Hubei. The lock down did not affect the whole country and the relationship between the state and the people (to put it gently) made things easier.  
Picture
What about the rest of the world then? It was not very easy to collect data for when exactly measures were taken in every country. There is variability in this respect too, as some countries have not gone into full lock down or did so only after milder measures were put into place. To keep things as homogeneous as possible, I obtained the dates when school closure was decided in 13 countries, from wikipedia articles on the Covid 19 epidemic. I then went back to the Confirmed case data and calculated 5-day rolling doubling times for the last three weeks. This means that, starting from March 1st, I calculated average doubling time estimates for five consecutive days. (Note: this was done to get smoother estimates. It was also the approach I used for the data from China in the plot above).

The results are shown below for 11 European countries (including my native Greece), the US and Israel. I hope to have time to include more in the future, (provided I find time to read wikipedia articles, or get a reliable comprehensive data source). The heatmap shows doubling times from March 1st to March 22nd (because of the 5-day averaging, the two final dates refer to a central date -2). Doubling times lie in a range of 1 to 12 with Iran being in the best situation. You basically look the plot from left to right and hope to be moving steadily (and as fast as possible) to darker colours. 

What does it tell us?

1. First of all, things are encouraging for Italy. Its dynamics in increasing doubling time puts her in the same cluster with Sweden, Greece and the obvious outperformer of the plot that is Iran. 
2. A second cluster containing France, Belgium, the Netherlands and Spain is seriously lagging behind but things are positive for France and the Netherlands that show a slowing down (darker colours as we move from left to right). Still too early to tell about Spain and Belgium.
3. The last cluster is quite inhomogeneous. Germany looks more like France than the UK or Portugal but it is probably the large number of cases that obscures its performance. The US are an obvious outlier. Things don't seem to be going anywhere better.

But remember we did all this to see if the measures on social isolation can actually work. Can we reproduce the effect they had on China? The answer is yes, but not entirely. There appears to be a mild positive association with how early the measures were taken and the development of the situation. Greece, Sweden, Iran and Italy all closed their schools on March 11th. This didn't happen in the UK before March 19th, more than a week after. On the other hand, the lag in school closure between Spain and Italy appears small to justify the dynamics of doubling times between them. Italy is getting better, Spain is not. But it could well be that two or even one day can really make a difference. 

There are also a number of factors that we cannot easily account for. The red lines below show the date schools were shut down but that is not the only measure. Most countries have gone into a complete lock down, shutting down bars and restaurants and even imposing curfews. The time when this happened was not the same. In others, like the US, even the time schools closed was not the same across the country. As data accumulate and the situation develops we will be in a better position to discuss the efficiency of the lock-down and (perhaps more importantly) the time it would be OK to lift them.

Other kinds of data need to be factored in as well. Population densities and demographics may make lock downs more or less effective and some countries may need to considered additional (or other) means of mitigation. 
It sounds frustratingly repetitive but we can only wait and see. 
Picture
0 Comments

Covid-19 Epidemic. Confirmed case doubling times among countries

3/21/2020

0 Comments

 
The Covid-19 pandemic is affecting our lives in every possible way. This is a first in many respects, one of which being that this is the first epidemic for which data regarding cases, deaths and recoveries are being made available in almost real time.
A number of data scientists have been trying to make sense of the available data from many possible angles. Nevertheless, little more than the obvious (and expected) exponential increases have come out of most of the analyses. We now know that the virus is highly contagious and that exponential growth of cases should be the norm, but the goal is to tame this growth as much as possible and in this sense social distancing and case isolation are likely to be the only way to delay the peak of the epidemic (that is, the time when the number of sick people reaches the maximum), or (as you should know by now) to "flatten the curve".

One main problem with most data analytical approaches is the variability of reported data in terms of cases. While the number of deaths cannot be questioned, the way each country reports confirmed cases is very different. Some countries (like South Korea) have opted for extensive testing in the general population, while others (like for instance Greece) have explicitly targeted serious/critical cases, recommending that people with mild symptoms avoid over-crowding hospitals and diagnostic labs. I shall refrain from discussing arguments that exist both for and against these two extreme approaches and focus on the main problem that this variability poses in the analysis of the epidemic dynamics.
If some countries test a lot, the number of cases will be high but the case fatality rate (i.e. the number of deaths per active cases) will be smaller. Countries that test only people with severe symptoms will report low number of cases but higher fatality rates. In any case, it is difficult to tell how the spread differs from one country to the other.

How then can we really know what is going on?
One solution is to focus on increase rates instead of number of cases. Assuming that the way countries report cases doesn't change over time we can try to estimate the rate of increase in confirmed cases from one day to the other. Regardless whether one tests a lot or little, the number of new cases against the previous sum is representative of the spread. This is not something new and people have tried to figure out this rate from the slopes of log-linear fits, but the problem is that these slopes are very prone to random fluctuations especially when case numbers are small. 

In the following I will present a simple approach to address the problem, and more importantly, to gauge into differences in the approaches that different countries employ to tackle the spread of the epidemic.

Data
I used data from Johns Hopkins University, Center for Systems Science and Engineering (CSSE), which are daily updated for all countries that have reported at least one confirmed case and which are freely accessible here: 
https://github.com/CSSEGISandData/COVID-19/tree/master/csse_covid_19_data. 

Case Doubling Time
Instead of looking into slopes of increase curves, I tried to estimate Case Doubling Times. This is the time (in days) it takes for the number of cases to double, given the rate of increase over a certain period of time. This means that if we have N cases on day:x you can estimate the number of days it will take to reach 2N cases, assuming a constant increase rate r.
This is easily calculated from the following equation: Nr^(dt)=2N, where dt is the doubling time in days. Solving the equation for dt gives you: dt=1/log2(r), which means that a good estimate for r is equivalent to a reliable prediction of doubling (obvious).

What I did
1. I took the timeseries data (that is, cases per day for many days in a row) from the link above and estimated a rolling increase rate for every country that has reported at least 300 cases. Rolling, in this sense, means that starting from the day that each country reported the first case, I calculated the mean rate of increase between two consecutive days until today (March 21st). 
2. As this gives a series of increase rates that is equal to the number of days of reporting (minus 1) it is, expectedly, very noisy, especially as the number of cases is still in the lows. You expect it to converge once sufficient cases have been reported (and then you hope it slowly drops to zero). This is why I used only the mean rolling rate of the last 5 days for each country. I then used this mean rate to estimate the Doubling Time explained above. 
3. Even thus, the rate values may vary significantly in countries, where cases have been reported for a short period and this is why I combined the mean rate with its standard deviation (how much it varied over the last 5 days) and the days since the first case was reported.
4.  I then plotted the estimated Doubling Times against the total number of cases, taking into account the variability of increase rates. Each country is represented by a dot. Doubling time is on the y-axis and you want this to be as high as possible. This means you have efficiently "flattened the curve". Total cases is on the x-axis and this means, well the obvious, that a lot of people are sick (but caution: not all countries report in the same way). How reliable are the data? The darker the color of the label, the smaller the standard deviation of the increase rate, thus the more confident we can be of the doubling time we estimate. 


Picture
Conclusions?
So, what can we make of this figure?

Japan is the way to go
First of  all, one should focus on the dark blue labels. These represent countries with low standard deviation of increase rates and thus reliable estimates of doubling times. That said, you want to be as high on the y-axis as possible. We knew already that Japan is doing a great job. Even though they were among the first countries to report a case, they have kept cases low (~1000) and -more importantly- with a doubling time of 25 days. This means that, provided things don't change, Japan is not expected to have more than 3000 cases by the end of April. 
  
Good signs from Italy 
Italy has, on the other hand, been described as the horror story so far. An exploding number of cases, coupled with a high fatality, both of which are probably due to strong exponential increase rates in the early days after the outbreak. Nevertheless, Italy's doubling time is now 5.4 days, almost double that of Germany (2.8 days) and more than double that of the US (a little over 2 days). Whether this is due to the heavy restrictions on movement and social distancing that were imposed a bit more than a week ago remains to be seen. Even though dt=5.4 is not perfect (if stable it means an 60-fold increase within the period of one month), Italy's doubling time was <3d, a few days ago and was less than 2 days in the early days of the epidemic, which means that significant progress is being made in stopping the spread. One possibility is that Italy has simply reached a saturation point in terms of testing capacity and can now only perform a certain number of test everyday, most of which come out positive. This may also be the case of Iran, one of the countries that suffered mostly but which now reports low number of cases and doubling time of more than 10 days. This will be made clearer in the days to come.

Central Europe should look towards Scandinavia
What about the rest? Bad news for the US, Germany, the UK and Austria, for all of which doubling times are estimated to be on the lows. The Netherlands, Belgium and Switzerland are doing a bit better. The Scandinavian countries are performing rather well and in spite great numbers of cases (taking their population into account) they seem to have slowed down the spread. Not much can be said for countries like Greece, Iceland or Slovenia where the spread looks to be halted but the number of cases is still low to allow for accurate estimates.  Countries in the bottom left part of the plot have very low number of cases and great standard deviations for increase rates. It is just too early tell.

What more can we look into?
These are highly volatile data and so, one needs to keep looking for more as the timeseries become longer and thus rolling estimates of increase rates converge. It would be very interesting to look into how doubling times changed for each country taking into account the sort of counter-measures imposed, and, perhaps more importantly, how early after the first case they were put into place. This will probably give us a better idea on what the best approach is. Everybody agrees we are on uncharted waters and need to approach every analysis and its interpretation with a lot of caution.

0 Comments

Bias, Prejudice and a simple "litmus test" to detect them

11/23/2018

0 Comments

 
As part of my (quite loaded) teaching schedule I get to give introductory lectures on statistics and experimental planning to undergraduate students (mostly freshmen). One of the concepts I have the greatest difficulty in explaining is that of "bias", which in statistics is the difference between an estimator's value and the actual value of the parameter that is being estimated. As it is understandably hard to explain a complex concept to freshmen, who are increasingly becoming more and more mathematically illiterate, through equally complex concepts such as "estimators" and "expected values", I often have to take less rigorous approaches. One such is to resort to more mundane explanations of the terms. According to wikipedia, bias is "a disproportionate weight in favour of or against a certain person, thing or group against another", a definition that is much easier for students to grasp as it resonates with the more commonplace notion of prejudice. In fact, most of the examples I am using to explain bias, are very "non-statistical" ranging from my old favourite story on how sharks prefer eating men to women to everyday issues such as the reporting of crimes committed by immigrants in mainstream media.

These are, of course, issues not to be taken lightly as biased ways to report, write and discuss about events are becoming ever more frequent. In my brief spell as aspiring reporter (back in the day) I was surprised to realize how easy it is to let prejudice infiltrate your reporting (Noam Chomsky and Edward Hermann, devote a whole chapter of their seminal "Manufacturing Consent" on this topic).  In our times, however, of a general return to conservatism and ever expanding bigotry, what is more striking is the way we fail to perceive prejudice and bias in every day life. It seems as if we are becoming blind to even the most outright bias in expressed opinions and this, of course, is not at all helpful for my students (as both students and citizens).
To this end I have been thinking on more straightforward ways to detect bias and I have recently co
me up with some ideas, after exchanging opinions with friends on facebook (yes, it is possible). The examples I am be posting below, were inspired by three very different topics that came up on my facebook timeline on the same day.

The first was brought to my attention by an old school mate and it had to do with a somewhat famous greek actress defending her decision not to have kids after being repeatedly asked why she wouldn't in various interviews. My friend, herself also married and happily childless, was infuriated by the way the actress (Katerina Lechou) had to defend "a woman's right" to not have kids. What immediatelly stroke me was the fact that we were discussing this as a "woman's right" and went on to ask why are men never asked this question. This is a very straightforward way to realize that the question "why won't you have kids?" is only part of the problem as long as it is only addressed to women. My friend and many other women were rightfully offended by the content of the question but failed to realize it was also greatly prejudiced since it implied that having children is either something to be decided by women alone or something that men should not really care about.

The second story had to do with a British food editor being forced to resign after making an admittedly bad joke about vegans in an email. Even though, I respectfully understand that some groups of people may be more sensitive to comments than others, you will have to agree with me that there is absolutely no chance that William Sitwell would have quitted his job, had he asked in his message that "all meat eaters be killed". Here again, we see how easy it is to spot bias simply by substituting the object of the statement with its conceptual counterpart (here "vegans" become "meat eaters"). This is the essence of what I call the "bias touchstone". Invert the argument and see if it makes sense or not. If it still sounds reasonable then bias is not so likely. If, however, it doesn't, prejudice may be implied. In this case, the prejudice is a positive one, aimed at "protecting" the sensitivity of vegans. It remains a prejudice nonetheless.

The last example is a bit more personal, as it has to do with the negative evaluation of a grant application, which I received yesterday. The sole reviewer of my proposal had made an honest effort to read it and had a few plausible arguments for rejecting it, especially given the very low acceptance rate of the call. What was however alarming was his/her blunt statement regarding our work. In the part of his/her assessment, where rejection was being justified, he/she had no reservation in writing that "This is a purely computational biology project". The way it read, made it sound almost offensive to be proposing a computational work in a Life Sciences panel. Besides the fact that I have been working in Biology and Biomedical institutes for my entire adult life, I could not resist applying the "bias touchstone" to the statement. The same reviewer has surely never used a comment like "this is a purely developmental/molecular/cellular biology project" as a justification for rejection. (Even though I can think of other subdisciplines such as structural or evolutionary biology that may have been targeted in a similar manner). A simple substitution of the agent of bias in the statement quickly reveals the bias itself.

As double standards are increasingly becoming the norm in may forms of public discourse, this very simple idea can be easily extended as a first assessment for any sort of statement. In science, it can also serve as a rough evaluation of the originality of a given finding. Take any sentence like "X is found to interact with/regulate/inhibit Y" and form its negation: "
X is found NOT to interact with/regulate/inhibit Y". If it still sounds plausible, then the finding is quite interesting. In what regards X and Y anything could be going on, but now -thanks to this work- we know that it's X that regulates Y. If, however, the negative statement sounds quite improbable then the original finding is suddenly not so original. In this case, the goal is not to spot bias but to confirm an imbalance between two -initially- equivalent possibilities.

​At this point you may have realized that this mental experiment is silently conducted all the time, especially by editors and reviewers of scientific journals when assessing the possible "impact" of a scientific finding. It forms the basis of a rather special kind of bias, called "confirmation bias". 
​But this is a story for another post.  
0 Comments

A first take on Footballomics: Analysis of footbal data

3/30/2017

0 Comments

 
What we are doing

Being in science combines a number of rewarding activities that make the daily working routine fulfilling in many ways. You get to learn new things everyday, you (sometimes) even understand how things work (or even better how nature works) and you get to interact, through teaching, with young people that are full of contagious optimism and aspiration. Last but not least, you have the freedom to make your own working schedule and, more often than in other jobs, find time to apply what you learn in things you were always curious about.

Footballomics: Take #1

And I am, I ’ve always been, curious (nay! crazy) about football in all aspects of it. Playing, whatching, talking, thinking and dreaming about it. Over the years, job, family and age have caught up with me and thus I have now grown a more mature way of appreciating the “beautiful game”, from worshiping players to admiring managers and from chanting on the stands to reading about football tactics. My professional involvement to data analysis and statistics, has also lead to my developing of a more “quantitative” approach about football and thus I have always wanted to try to use some of the simple (or not simple) principles of my everyday work routine, which includes making sense of data for biological problems to more “mundane” questions regarding football. In this, my first ever, attempt to analyze football data, I took the opportunity (OK, I took advantage) of teaching a (hopefully) interesting graduate class on “R for Bioinformatics” at the University of Crete, Medical School. After having introduced the basic concepts of R to the students I thought of giving them an example of how we can use it to attack simple questions based on data. And since they are (or will soon be) fed up with biological problems I thought of giving them a different kind of a puzzle,which brings us to:

The Question: Are Liverpool performing significantly better with top-flight teams than with bottom-table “minnows”?

Being a big (OK, huge) fan of Liverpool Football Club in the post-90s era can be exhilarating and frustrating at the same time. You get to experience glorious moments like the Miracle in Instabul or last year’s come-back against Borussia Dortmund, but you also get to see them miss on league after league campaigns by unexpected losses to “lesser” teams like Crystal Palace in 2014. This year in particular, this trend of being imperious in big games, only to lose nerve against teams like Burnley, Bournemouth or Swansea has been more apparent than ever. Liverpool are doing very well when playing big opponents that are title challengers and somehow sink when they find themselves against tough-to-crack defenses. I am, of course, not the first to address this issue, brought up by former managers and former players turned football pundits. The question, though, when it comes to punchlines such as “Liverpool sink against lesser sides” is how well they are founded on real data and this is exactly the question I posed to my (patient) students. What they had to do was to test whether Liverpool indeed performed worse than expected against teams at the end of the table, the word “expected” being the key.

If you are ready for a long read on football, data mining and some medium level R code you can see the rest of the details here
0 Comments

Μαθηματικός αναλφαβητισμός #1

9/5/2014

0 Comments

 
Ολοένα και συχνότερα βλέπουμε, διαβάζουμε και ακούμε για επιστημονικές (ή τουλάχιστον επιστημονικο-φανείς) μελέτες του τάδε ή του δείνα πανεπιστημίου. Οι μελέτες αυτές, ως επί το πλείστον, αφορούν θέματα γενικού ενδιαφέροντος που εγείρουν όμως και την περιέργεια (φαγητά που αδυνατίζουν ή αποτρέπουν τον καρκίνο, νέα είδη ζώων που ανακαλύπτονται ή παλιά που εξαφανίζονται κλπ). Ακόμα συχνότερα αφορούν οργανισμούς που έχουν με κάποιον τρόπο συνδεθεί με αυτό που λέμε "λαϊκή κουλτούρα" (pop culture) μέσω ταινιών, βιβλίων ή έργων τέχνης. Έτσι στο όχι και τόσο μακρυνό παρελθόν εντυπωσιαστήκαμε με τα "εξωγήινα βακτήρια" της λίμνης Μono που έχουν αρσενικό αντί φωσφόρου στο γενετικό τους υλικο (κάτι που αποδείχτηκε μια μεγαλοπρεπής απάτη), ενώ κατά καιρούς διαβάζουμε με (ελαφρώς ένοχο) ενδιαφέρον άρθρα για τα πιράνχα, τις ανακόντες, τα ψάρια της αβύσσου κ.λ.π.

Δυστυχώς τόσο για τα μέλη της επιστημονικής κοινότητας όσο και για τους αναγνώστες με ένα απλό ενδιαφέρον περι τα επιστημονικά, τα περισσότερα από αυτά τα άρθρα είναι όχι μόνο κακογραμμένα αλλά τις περισσότερες φορές είναι τόσο εκτός πραγματικότητας που κάνουν μεγαλύτερο κακό παρά καλό. Έτσι αντί να "εκπαιδεύσουν" ένα κοινό ώστε να αντιλαμβάνεται καλύτερα κάποια πράγματα, το βομβαρδίζουν με τσιτάτα διαφημιστικού τύπου (σαν τον "υγρό κολλαγόνο" και τα "δύο ενεργά fluoride") με αποτέλεσμα να πληθύνονται ολοένα ανάμεσα μας άνθρωποι που διαβάζουν διαρκώς ανοησίες.

Μια ολόκληρη κατηγορία τέτοιων δημοσιευμάτων είναι αυτά που ονομάζω συλλήβδην "μαθηματικά αναλφάβητα". Αφορούν συνήθως επιδημιολογικές μελέτες ή μελέτες σχετικές με τις επιδράσεις τροφών και ουσιών στην ανθρώπινη υγεία και παρόλο που συνήθως αντλούν υλικό από πραγματικές επιστημονικές εργασίες παρουσιάζουν μια ερμηνεία των αποτελεσμάτων που όχι μόνο είναι αβάσιμη αλλά και πολύ συχνά απολύτως ανερμάτιστη. Ένα χαρακτηριστικό παράδειγμα είναι ένα σημερινό αρθράκι στον ιστότοπο in.gr που μας "ενημερώνει" ότι οι λευκοί καρχαρίες προτιμούν να τρώνε άντρες. Αυτό προκύπτει από μελέτες του αρχείου επιθέσεων μεγάλων λευκών στην Αυστραλία που, σύμφωνα με τον συντάκτη του in.gr, "έδειξαν πως στο 84% των επιθέσεων τα θύματα ήταν άντρες". Το συμπέρασμα "οι καρχαρίες προτιμούν τους άντρες" προκύπτει έτσι μάλλον φυσικά και ο αναγνώστης καλείται να το καταπιεί "αμάσητο" χωρίς να γίνεται καμιά απολύτως αναφορά στην πιο απλή εξήγηση, ότι δηλαδή οι άντρες κολυμπούν περισσότερο, κάνουν περισσότερο σερφ και ανοίγονται σε βαθύτερα νερά πολύ συχνότερα απ' ότι οι γυναίκες. Η απλούστατη αυτή εξήγηση δεν διέφυγε (ευτυχώς) της προσοχής του επιστημονικού υπευθύνου της μελέτης, του καθηγητή Daryl McPhee στο Πανεπιστήμιο του Queensland της Αυστραλίας, ο οποίος απέδωσε τα αποτελέσματα της εργασίας του καθαρά στο γεγονός πως "οι άντρες περνούν στη θάλασσα πολύ περισσότερες ώρες από τις γυναίκες". Το γεγονός αυτό και μόνο αρκεί για να κάνει τις επιθέσεις καρχαριών έναντι αντρών πολύ πιο πιθανές απ' ότι έναντι γυναικών. Κάτω από αυτό το πρίσμα, το να γράφει κανείς πως οι καρχαρίες προτιμούν τους άντρες είναι τόσο ανόητο (και οικτρά λανθασμένο) όσο το να διατείνεται πως "τα λιοντάρια της Σαβάνας αποφεύγουν τους λευκούς".

Ανάμεσα στους διάφορους όρους της στατιστικής υπάρχει ένας με πολύ μεγάλη σημασία. Σε μελέτες συσχέτισης (το Α εξαρτάται από το Β, ή το Γ έχει μια προτίμηση για το Δ) πρέπει κανείς να έχει πάντα στο μυαλό του την λεγόμενη "αρχή της σύγχυσης" (confounding principle). Αυτή αναφέρεται σε μια "κρυφή" μεταβλητή η οποία ευθύνεται για την παρατηρούμενη συσχέτιση και που λαμβάνοντας την υπ' όψιν (και αφαιρώντας την) αρκεί για να εξαφανιστεί κάθε συσχέτιση. Στην περίπτωσή μας, οι ώρες παραμονής στο νερό είναι ακριβώς μια τέτοια μεταβλητή "σύγχυσης". Έτσι αν κανείς διαιρέσει τον αριθμό των επιθέσεων με τις εκτιμώμενες ώρες παραμονής στη θάλασσα για κάθενα από τα δύο φύλα (μια διαδικασία που ονομάζουμε κανονικοποίηση), θα έβλεπε πιθανότατα πως οι καρχαρίες δεν έχουν καμιά ιδιαίτερη προτίμηση. Η ανάλυση αυτή δεν είναι εφικτή καθώς κανείς δεν έχει καταγράψει (ευτυχώς ακόμα) τον χρόνο που περνούν οι λουόμενοι στη θάλασσα, ωστόσο η απλούστατη αυτή εξήγηση, πέρα από το ότι αναφέρεται από τους ίδιους τους συντάκτες της μελέτης, θα έπρεπε να είναι προφανής για οποιονδήποτε έχει μπει ποτέ στον κόπο να θυμηθεί την αριθμητική του γυμνασίου (κεφάλαιο: Αναλογίες).
​
Δυστυχώς ο συντάκτης του μεγαλύτερου σε επισκεψιμότητα ιστοτόπου ειδήσεων στην Ελλάδα δεν είναι ανάμεσα σε αυτούς. Το ακόμα χειρότερο είναι πως "εκπαιδεύει" μια γενιά αναγνωστών στο να του μοιάζει.

0 Comments

    RSS Feed

    It's all about...

    Bioinformatics and computational biology with a focus on chromatin and genome architecture, plus a little bit of football and occasional aspects of  University education.

    Archives

    April 2021
    December 2020
    March 2020
    November 2018
    September 2017
    April 2017
    March 2017
    December 2016
    November 2016
    February 2016
    May 2015
    November 2014
    September 2014
    July 2014
    February 2014
    November 2013
    October 2013

    Categories

    All
    Academic Life
    Bioinformatics
    ChIPSeq
    ChIPSeq Bias
    Cpg Islands
    Data Analysis
    Exons
    Football
    Footballomics
    Gene Regulation
    Genetic Diseases
    Genome Architecture
    Genome Structure
    Inflammation
    Journalism
    Math Illiteracy
    NGS
    Nucleosome Positioning
    Nucleotide Composition
    Nucleotide Skews
    Promoters
    R
    Splicing
    Statistics
    Systems Biology
    Tnf
    Transcriptome
    Variation
    Whole Exome

Powered by Create your own unique website with customizable templates.