Computational Genomics Group
  • Home
  • Research
  • Publications
  • Teaching
  • Blog
  • Group Members
  • News
  • Computational Biology Book
  • Data Analysis with R Book
  • CG2 github
  • Fiction

European Super League και άλλα δαιμόνια (ένα "μακρύ" ποστ στα ελληνικά)

4/23/2021

0 Comments

 
H θνησιγενής European Super League, που μας ήρθε και μας έφυγε μέσα σε 48 ώρες δεν ήταν ούτε αναπάντεχη, ούτε αναπόφευκτη.
Ταυτόχρονα δεν είναι ούτε η χαμένη ευκαιρία των οραματιστών του μέλλοντος, ούτε ο εφιάλτης από τον οποίον μας έσωσαν οι πραγματικοί οπαδοί. Ό,τι προτάθηκε μια φορά, θα ξανάρθει.
Πριν την επόμενη φορά και εν ειδει προετοιμασίας, αν όχι αποδοχής, μερικές σκέψεις.

Αρχικά. Γιατί τώρα;
Η πρόταση για κλειστή λίγκα στα πρότυπα της οργάνωσης των σπορ στην Αμερική δεν ήρθε τυχαία στη μέση της πιο ταλαιπωρημένης σαιζόν όλων των εποχών.
Όπως σε όλα τα πράγματα, η πανδημία λειτουργεί ως ο "μεγάλος επιταχυντής", επισπεύδοντας τάσεις που προϋπήρχαν, τη στιγμή που ανησυχίες του άλλοτε γίνονται τετελεσμένα.
Η Ρεαλ και η Μπαρτσελόνα λειτουργούν εδώ και χρόνια ως freeloaders στην πλάτη των ισπανικών τραπεζών (και άρα του ισπανικού δημοσίου) με χρέη που ξεπερνούν το ένα δις ευρώ. Η Γιουβέντους και η Ίντερ το ίδιο αν και όχι σε τόσο άσχημη κατάσταση. Στην Αγγλία, ακόμα και ομάδες με τακτοποιημένα οικονομικά βρίσκονται στην δυσάρεστη θέση να καταγράφουν ζημιές ακριβώς τη στιγμή που περίμεναν να μαζέψουν κέρδη, με τη Λίβερπουλ να είναι το πιο χαρακτηριστικό παράδειγμα. 
Τίποτα από αυτά δεν είναι νέο, ούτε εκπλήσσει προφανώς το γεγονός ότι οι ποδοσφαιρικές εταιρείες-κολοσσοί ζορίζονται οικονομικά σε έναν κόσμο όπου μία στις πέντε θέσεις εργασίας απειλείται να χαθεί μέσα στο 2021. Συνεπώς το ερώτημα γιατί τώρα δεν είναι τόσο δύσκολο να απαντηθεί. Για τους εμπνευστές του σχεδίου είναι προφανές ότι όσο νωρίτερα, τόσο καλύτερα. Επίσης, για κάτι που ήταν αναμενόμενο να βρει τους οπαδούς απέναντι, το να το κάνεις στη μέση μιας χρονιάς με κλειστά τα γήπεδα προσφέρει το επιπλέον πλεονέκτημα της απουσίας αντιπαράθεσης, τουλάχιστον απευθείας.

Από εκεί και πέρα όμως αξίζει να δούμε κάποια επιπλέον ερωτήματα, τώρα ακριβώς που το πρότζεκτ φαίνεται κλινικά "νεκρό". Η αίσθησή μου, όπως έγραψα και παραπάνω είναι ότι θα έχουμε σύντομα νεκρανάσταση. Αργά ή γρήγορα, και μάλλον γρηγορότερα από όσο θα θέλαμε να ελπίζουμε. Να γιατί:

1. Αυτοί που "τρέχουν" τα μαγαζιά είναι διαφορετικοί από άλλοτε.
Όταν το 1983, η Θάτσερ καταργούσε τον περίφημο κανονισμό 34 και οι ποδοσφαιρικοί σύλλογοι έπαυαν να είναι μη-κερδοσκοπικά σωματεία, οι πρώτοι έμμισθοι πρόεδροι ήταν τοπικοί επιχειρηματίες της κοινότητας. Σήμερα, οι άνθρωποι που κάνουν κουμάντο στα μεγάλα κλαμπ σχεδόν ποτέ δεν προέρχονται όχι μόνο από την ίδια κοινότητα που γέννησε την ομάδα, αλλά ούτε καν από την ίδια χώρα. Η Λίβερπουλ, η Μαντσεστερ Γιουνάιτεντ, η Μίλαν και η Άρσεναλ ανήκουν σε Αμερικανούς. Η Παρί Σαιντ Ζερμαιν και η Μαντσεστερ Σιτυ πρακτικά ανήκουν σε μικρές Αραβικές χώρες. Η σχέση των διοικήσεων με την κοινότητα είναι πολύ διαφορετική. Στην ουσία έχει περισσότερες ομοιότητες με την σχέση που έχουν οι εξωκοινοβουλευτικοί υπουργοί με το εκλογικό σώμα. Μια υποτιθέμενη αυθεντία, συχνά με χαρακτηριστικά Σωτήρα, με ελάχιστο accountability. Aν ο Χένρι φύγει αύριο από τη Λίβερπουλ ή ο Γκλέιζερ από τη Γιουνάιτεντ, το επιχειρηματικό τους προφίλ θα πληγεί ελάχιστα. Το ίδιο ισχύει για τους ανθρώπους σε θέσεις κλειδιά. Ο Γούντγουορντ, εκτελεστικός διευθυντής της Γιουνάιτεντ, η ο Χόγκαν της Λίβερπουλ είναι αυτό που (δεν) μας αρέσει να λέμε "άνθρωποι της αγοράς", όχι του ποδοσφαίρου. Όλοι βλέπουν το ποδόσφαιρο ως "προϊόν". Από αυτήν την άποψη, οι δείκτες με τους οποίους αξιολογούνται δεν είναι αυτοί που έχουμε στο μυαλό μας. Δεν είναι (απαραίτητα) οι τίτλοι και τα εισιτήρια, αλλά το άνοιγμα σε νέες αγορές και μεγαλύτερα κοινά ακόμα κι αν αυτό ξενίζει (η και αποξενώνει) την παραδοσιακή βάση των ομάδων. Ακόμα και για τον Αμπράμοβιτς ή τους Άραβες των Εμιράτων που έχουν επενδύσει σε κάποιο βαθμό και επικοινωνιακά στην ενασχόλησή τους με το παιχνίδι, η διοίκηση των ομάδων είναι ένα πρότζεκτ και ως τέτοιο είναι επιτυχημένο όταν είναι κερδοφόρο. Είναι κακό αυτό; Γι' αυτούς όχι. Για πολλούς από εμάς ναι. Αλλά όχι για τους λόγους που φαντάζεσαι. 

2. Αυτό που συνιστά επιτυχία για τα κλάμπ σήμερα δεν είναι αυτό που νομίζεις.
Μεταξύ 2019 και 2020 και μέσα σε ένα διάστημα 12 μηνών, η Λίβερπουλ του FSG έγινε πρωταθλήτρια Ευρώπης, κόσμου και μετά Αγγλίας. Για τη διοίκηση ήταν μια τεράστια επιτυχία χωρίς αντίκρυσμα. Περίμεναν να κεφαλαιοποιήσουν τους τίτλους σε περισσότερους χορηγούς, μεγαλύτερη εμπορική εκμετάλλευση και άνοιγμα σε νεότερο κοινό που επιτέλους θα συνέδεε την Λίβερπουλ με μια "δυναστεία". Η επιδημία σκότωσε πολλά από αυτά τα σχέδια αλλά ακόμα κι έτσι, ένας παραδοσιακος ιδιοκτήτης στα 80ς ή τα 90ς θα ένιωθε ευτυχής. Δεν είναι πια έτσι. Με την ίδια λογική που στην παγκόσμια οικονομία προστάζει να μη μένει κανείς στάσιμος, το γεγονός ότι η Λίβερπουλ δεν αυξάνει τα κέρδη της στη φάση που είναι μια υπερδύναμη, δημιουργεί ανησυχίες ότι θα ζοριστεί πολύ όταν πάψει να είναι (και στους επιταχυμένους χρόνους της πανδημίας, αυτό ήδη συνέβη). Για το FSG, τους Γκλέιζερ και τον Ανιέλι, δεν υπάρχει χρόνος για χάσιμο. Για τον Πέρεθ και τον Λαπόρτα κάθε μέρα που περνάει είναι επιπλέον κόστος. Και το κόστος δεν μετριέται σε χαμένα κύπελλα και πρωταθλήματα, αλλά σε τηλεοπτικά δικαιώματα (κυρίως) και παράπλευρες εμπορικές εκμεταλλεύσεις. Η Τότεναμ είναι εδώ το πιο χαρακτηριστικό παράδειγμα. O τελευταίος της τίτλος είναι το Λήγκ Καπ του 2008, κι όμως θα προτιμούσε να είναι ένας παρίας, χάνοντας όλα τα ματς, στο κλάμπ των 12 με εξασφαλισμένο εισόδημα, παρά πιθανή διεκδικήτρια του πρωταθλήματος στην Αγγλία με μειωμένα έσοδα.

3. Το ποδόσφαιρο που βγάζει λεφτά σύντομα δεν θα είναι αυτό που ήξερες
Έχεις αναρωτηθεί γιατί κάθε χρονιά βγαίνουν όλο και περισσότερες ταινίες με υπερήρωες; Γιατί το 50% των ταινιών του Χόλιγουντ γυρίζονται με στόχο το ηλικιακό γκρουπ 15-25; Γιατί κανείς δεν πληρώνει πλέον για να αγοράσει μουσική; Η συντριπιτική πλειοψηφία του μάρκετινγκ σε όλα τα επίπεδα γίνεται με στόχο τους νέους. Και όταν λέμε νέους, εννοούμε πολύ νέους. Τόσο νέους που κανένας (ή πολύ λίγοι) από αυτούς δεν θα έχουν αντέξει να φτάσουν να διαβάζουν ως εδώ σε αυτό το κείμενο. Οι νέοι σήμερα δεν μπορούν να ακούσουν ένα ολόκληρο μουσικό άλμπουμ. Βαριούνται να δούν ακόμα και ολόκληρο βίντεο στο youtube πριν περάσουν στο επόμενο προτεινόμενο. Όταν ο Φλορεντίνο Πέρεθ ανησυχεί ότι ματς 90 λεπτών με 15 λεπτά ημίχρονο δεν είναι ελκυστικά στους νέους δεν είναι τρελός. Λέει την αλήθεια. Οι 30-40+ μεγαλώσαμε με πολύ μεγαλύτερο πάθος για το ποδόσφαιρο από ότι τα παιδιά μας. Τα μωρά του σήμερα δύσκολα θα αντέξουν ολόκληρο 90λεπτο στο γήπεδο χωρίς οθόνες, το κινητό τους, αφορμές για stories και tik-tok βίντεο ή όποιο άλλο distranction έχει επινοηθεί ως τότε για να δεσμεύσει το μυαλό τους στην κατανάλωση. Η ανακοίνωση της ESL έκανε λόγο για τους legacy fans, δηλαδή τους "παραδοσιακούς οπαδούς" που ακολουθούν την ομάδα, γνωρίζοντας την ιστορία της, τις παλιές της δόξες, τις αντιπαλότητές της κλπ. Μιλούσε δηλαδή για εμάς τους 40+, με τη διαφορά ότι δεν μας ανέφερε ως "κεφάλαιο" για την ομάδα, αλλά ως παθητικό. Οι 30 και πάνω δεν είναι το κοινό, στο οποίο στοχεύουν πλέον οι διοικήσεις. Το στοίχημα είναι να κερδηθούν οι νεότεροι. Που μας φέρνει στο επόμενο.

4. Αυτό που ήταν context θα γίνει content
Στα χρόνια που οι τριαντάρηδες-σαραντάρηδς και πάνω βλέπουμε μπάλα, τα ματς είναι ιστορία. Μπορώ να σου πω που ήμουν κάθε Μάη από το 1985 και μετά γιατί θυμάμαι πού έχω δει τον τελικό του Πρωταθλητριών (μετέπειτα Champions League). Ο τελικός έχει σημασία, είτε ήταν το 3-3 της Κωνσταντινούπολης, είτε το 0-0 μεταξύ Μαρσέιγ και Ερυθρού Αστέρα σε ένα μισογεμάτο γήπεδο στο Μπάρι. Προφανώς και κάποια ματς είναι μυθικά, ενώ άλλα όχι τόσο. Όμως είναι η διοργάνωση που τους δίνει αίγλη. Σε ένα ανταγωνιστικό σπορ, ακόμα και ένα με εκπλήξεις, οι ομάδες που προχωράνε σε μια διοργάνωση είναι κατά κανόνα, καλές ομάδες και γι' αυτό το λόγο αξίζει κανείς να τις παρακολουθήσει.Αυτό είναι το context και το context μετράει όταν προτιμάς να δεις έναν ημιτελικό Champions League μεταξύ Πόρτο και Λα Κορούνια από ένα Σούπερ Καπ Ισπανίας μεταξύ Μπαρτσελόνα και Ρεαλ, που είναι πρακτικά φιλικό προετοιμασίας. Όμως και αυτό έχει ήδη αλλάξει. Μην μπεις στον κόπο να ρωτήσεις υπεύθυνο καναλιού ή διαφημιστή που θα έβαζε τα λεφτά του για μεγαλύτερη τηλεθέαση πλέον. Ο συνδυασμός υπερπροβολής και μια τάση για "περιεχόμενο" (content) που μπορεί να δημιουργήσει υπεραξία (σταριλίκι, στόρις, συζήτηση σε πρωινάδικα και οπαδικά ραδιόφωνα) είναι αυτό που διαμορφώνει πλέον την αγορά. Αν η Πόρτο του Μουρίνιο του 2004 έπαιζε ξανά σήμερα με την Κορούνια του Ιρουρέτα, θα μιλούσαμε για έναν αντι-εμπορικό ημιτελικό. Το πρόβλημα είναι ακριβώς εδώ. Το 2004, η Κορούνια είχε αποκλείσει τη Γιουβέντους με δύο νίκες και είχε ρίξει μια ξεγυρισμένη τεσσάρα στην Μίλαν. Η Πόρτο ήταν κάτοχος του Ουέφα και είχε αποκλείσει την Γιουνάιτεντ μέσα στο Ολντ Τράφορντ. Ήταν στον ημιτελικό επειδή ήταν ομαδάρες. Αυτό έχει μικρή σημασία σήμερα. Σκέψου μόνο πόσο γρήγορα ξεχάσαμε την Λυών να ρίχνει τρία πέρσι στην Μαντσεστερ Σιτυ. Ο Άγιαξ έχει τέσσερα πρωταθλητριών, είναι ολόκληρη σχολή στο σπορ και πριν δύο χρόνια έφτασε μερικά δευτερόλεπτα μακριά από τον τελικό. Όμως δεν προσκλήθηκε καν στους 12. Στους 12 μπήκε η Τότεναμ που τον απέκλεισε. Επειδή έχει μια διοίκηση που συμμερίζεται τις απόψεις για την εξέλιξη του προϊόντος που λέγεται ποδόσφαιρο και μοιράζεται με τους άλλους μια ψευδαίσθηση μεγαλείου.

5. Καπιταλισμός vs Κανιβαλισμός
Αλλά οι ψευδαισθήσεις μεγαλείου δεν έρχονται προφανώς μόνες τους. Τα μέγα-κλαμπ του σήμερα ζούν την χρυσή εποχή τους. Τα τελευταία δέκα χρόνια, σημειώθηκαν απίστευτα ρεκόρ σε όλα τα μεγάλα ευρωπαϊκά πρωταθλήματα. Σε τρία από αυτά, για πρώτη φορά έσπασε το φράγμα των 100 βαθμών. Σε 13 από τα 54 ευρωπαϊκά πρωταθλήματα τρέχει το μεγαλύτερο σερί κατακτήσεων από την ίδια ομάδα στην ιστορία τους. Δεν είναι απλώς ο "καπιταλισμός χαζούλη". Είναι η απόλυτη εκδοχή του "και οι πλούσιοι έσονται πλουσιότεροι", της λεγόμενης αρχής του Ματθαίου. Βγαίνεις μια χρονιά στο Champions League και κερδίζεις χρήματα, που σου επιτρέπουν να προκριθείς και την επόμενη χρονιά και ξανά και ξανά σε έναν κύκλο αυτοτροφοδοτούμενης ευμάρειας. Το πρόβλημα είναι όταν αυτή η επιτυχία αποκτά χαρακτηριστικά προνομίου (entitlement). Όταν μια Τότεναμ ή μια Άρσεναλ θεωρεί ότι το να μη βγαίνει στο Champions League δεν είναι κάτι που έχασε αλλά κάτι που "της το παίρνουν". Ακόμα χειρότερα, η προοπτική μιας χρονιάς χωρίς τα λεφτά του Champions League για μια Μπαρτσελόνα ή μια Ρεάλ δεν είναι απλώς αποτυχία αλλά καταστροφή. Τα κλαμπ μεγάλωσαν σαν τράπεζες που καλόμαθαν στην διαρκή επιδότηση, έγιναν κάτι σαν θεσμοί και τώρα αισθάνονται ότι είναι "too big to fail". Οι οπαδοί που εξεγείρονται ενάντια στην "απληστία" των κλαμπ είναι οι "χαζούληδες" της υπόθεσης. Το πρόβλημα δεν είναι ότι οι ομάδες διοικούνται από καπιταλιστές. Όλες από τέτοιους διοικούνται. Το πρόβλημα είναι ότι όλα τα κλαμπ λειτουργούν κάτω από ένα είδος ακραίου καπιταλισμού που στην απόλυτη εκδοχή του αυτοακυρώνεται. Δεν έχει ανταγωνισμό και γίνεται παρασιτικός. Τα είπε καλύτερα ο Γκουαρντιόλα.

6. Και τώρα τι κάνουμε;
Όπως και σε όλα τα επίπεδα υπερ-καπιταλισμού, η κριτική δεν γίνεται πάνω στη λειτουργία των ομάδων ως εταιρειών με στόχο το κέρδος. Αυτό έχει γίνει αποδεκτό εδώ και δεκαετίες. Το πρόβλημα, όπως σε όλους τους τομείς της οικονομικής δραστηριότητας, είναι η εκρηκτική ανισότητα που έρχεται με την πλήρη απορρύθμιση. Η κλειστή λίγκα, είναι μια τέτοια πλήρης απορρύθμιση που αφαιρεί ακόμα και τον τελευταίο περιορισμό συνάρτησης των οικονομικών απολαβών από την αθλητική επιτυχία. Κλειστή λίγκα είναι σαν να μαζεύτηκαν ο Βασιλόπουλος, ο Σκλαβενίτης, ο Μασούτης και ο Χαλκιαδάκης και να αποφάσισαν ότι όποιος βγάζει ένα καινούργιο προϊόν θα πρέπει να το πουλήσει στα μαγαζιά τους πρώτα, πριν σκεφτεί να το δώσει στο κάθε μικρό μπακάλικο της γειτονιάς. (Oh wait...).
Aυτό που θέλω να πω είναι ότι αυτό που θέλουν να κάνουν ο Πέρεθ, ο Ανιέλι και η παρέα των Αμερικανών είναι ακριβώς αυτό που περιμένεις να συμβεί σε ένα πλήρως απορρυθμισμένο καπιταλιστικό σύστημα. Χωρίς όρια στις απολαβές, χωρίς υποχρεωτική αναδιανομή κερδών στους μικρότερους, χωρίς υποχρεώσεις για συνδιαχείριση κλπ. Οι οπαδοί και οι κυβερνήσεις που εξεγείρονται έναντι ενός υπερ-καπιταλιστικού μονοπωλίου που σκοπό έχει να σκοτώσει τον ανταγωνισμό δεν είναι προφανώς κάτι αρνητικό, αλλά υπάρχει κάτι ανησυχητικά γκροτέσκο στο να ακούς τον Τζόνσον και τον Μακρόν να συζητούν για το πώς θα "σκοτώσουν" το πρότζεκτ της κλειστής λίγκας με "νομοθετικές βόμβες" τη στιγμή που δεν έχουν κανένα πρόβλημα όταν τράπεζες και πολυεθνικές κάνουν ακριβώς το ίδιο πράγμα. Mε ακριβώς αυτό το ακλόνητο επιχείρημα, οι ομάδες θα ξαναφέρουν το πρότζεκτ είτε αυτούσιο, είτε ελαφρώς αλλαγμένο, είτε μασκαρεμένο μέσα από μια αλλαγή στη δομή του Champions League. Και αν οι οπαδοί ή όποιος άλλος φορέας θέλει να έχει ελπίδες να το σταματήσει θα πρέπει να πιέσει τώρα, ιδίως τώρα που υπάρχει ένα μικρό παράθυρο πρωτοβουλίας, για ριζικές, δομικές αλλαγές στον τρόπο που γίνονται οι διοργανώσεις.

Εννοείται ότι δεν θα πείραζε αν εκτός από τον Ανιέλι, τον Πέρεθ και τον Γκλέιζερ, τα βάζαμε και με τους Σκλαβενίτες, τους Μπέζος και τις JP Morgan αυτού του κόσμου, αλλά φοβάμαι ότι εδώ μπορεί να ζητάμε πολλά. 
0 Comments

mens, manus et privilegiis

12/5/2020

0 Comments

 
The story is more or less known.

At some point in 1913, a young Indian named Srinivasa Ramanujan, the son of a clerk and a housewife, wrote a letter to G.H. Hardy of Cambridge. Ramanujan had a certain gift for mathematics but he was largely unaware of his potential, having been grown up as a poor, mostly self-taught, boy with very limited access to books, proper education and tutors. He had only been exposed to the academic micro-environment of the subcontinent, at the time very confined and narrow. Hardy was already the uncontested star of British mathematicians, a member of the Cambridge Apostles, one of the youngest ever lecturers at Trinity College and a reformer of both math education and research, alongside another prodigy, John Littlewood. Littlewood, roughly the same age with Ramanujan, immediately recognized the talent of the young Indian and convinced Hardy to invite him to England. In the brief period of the next seven years, the three of them would work on a variety of problems in number theory with groundbreaking results, before Ramanujan's ailing health deteriorated, forcing him back to India, where he died in 1920.

Besides a large number of important mathematical accomplishments, that even spawned their own scientific journal and a series of mythical anecdotes (the most famous of which led to the concept of "taxicab numbers"), the common story of the three mathematicians is very interesting in the sense that it makes one think about privilege and humility in science and life in general. Or, at least it makes me think about these things.

I remembered this story, of the poor, unknown young man from the colonies who is recognized by his established, highly esteemed peers in the metropolis, while listening to a controversial podcast excerpt by MIT Professor Manolis Kellis. In it, Kellis describes how his becoming an academic, somehow reflects his genetic predisposition for being clever. He then goes on to suggest, that his kids, being similarly endowed genetically, have the additional benefit of finding themselves in the company of other, gifted children of Kellis' University colleagues in the scholarly micro-environment of Cambridge, Massachusetts.  The comment has caused quite a steer in US academic circles with Caltech's Lior Pachter calling out Kellis for promoting eugenics and Berkeley's Mike Eisen following suit. 

Even though Pachter has a long-standing, open feud with (fellow computational biologist) Kellis and Eisen may be somehow sounding echoes of underlying, west-vs-east academic rivalries, it is hard not to see their point. Kellis' comments rank somewhere between incredibly arrogant and downright prejudiced, even when put in the context of an interview with Lex Fridman (himself not exactly the best example of academic -or social- tolerance). The problem is that Kellis is probably neither a bigot nor even -that- arrogant. The way I see it, his comments are more likely the reflection of an increasingly lazy approach by academics (and other privileged individuals) to explain (and sometimes justify) their place in life. 

I have personally met Manolis Kellis a couple of times in scientific meetings (actually meeting dinners). Being both Greeks we were able to casually discuss in our native tongue about his origins from Lesbos, good ouzo, sardines and the fine mediterranean climate compared to the harsh Boston winters. My perception of him is that of a very intelligent individual and yes, perhaps one that is rather aware of the fact, but not of someone who would look down on people who are less smart or educated than he is, even though he may belong to that -ever expanding- group of academics who find themselves in trouble when having to talk or relate to laymen. His "genetically inspired" comments (to put it mildly) are thus less a reflection of a sense superiority and more of naivety and insensitivity. And this is why they are more alarming.

In Kellis' description of Cambridge one finds a worldview containing only Hardys and Littlewoods, those who belong in the academic environments, by virtue of their endowments, be them genetic or other. But Kellis doesn't realize that in this same view of the world, the Ramanujans remain rare oddities. Only a handful will be able to fight their way into scientific institutions, but most will spend their lives in ignorance. There is simply no way, established procedure or even space to ensure that very intelligent people from under-developed countries, (or poor people from developed ones), can get into MIT or Harvard, let alone to allow them to "stick around and become professors" (in Kellis' own words). Those that do make it may indeed carry superior "cognitive systems" (again, his words) but this has much less to do with a genetic predisposition and a lot more with a combination of privilege, geographical and cultural advantage (themselves, let's not forget, linked to centuries of colonialism) and, basically, pure luck.

In this sense, Kellis' arguments are not so disturbing for picturing a world of genetically superior scholars inbreeding in Ivy League campuses. They are more an ominous predicament of the detachment of academics, scholars and university teachers from the rest of society. A world in which clever, gifted people indulge themselves in believing their position to be the result of genetically inherited excellence is a very dark place.
It is exactly from this sort of people that we expect better. Better interpretations of the unequal state of affairs in our universities and other workplaces and a better understanding of the complexity of societal factors in shaping our own worldviews. 
     
We also expect more. More empathy towards the disenfranchised strata of our unequal societies and more radical ideas on the organization of academic institutions than the ones of 19th century biometricians.  

One of the key attributes of a good researcher is to not give in to complacency. To unceasingly challenge his/her own perceptions. I hope, for the sake of us all, that intelligent people like Manolis Kellis rise to this challenge. 


0 Comments

Covid19 Epidemic. Doubling times responding to lock-downs

3/25/2020

0 Comments

 
In a recent blog post about the Covid 19 epidemic I have tried to tackle the problem of variability in the reporting of SARS-CoV-2 confirmed cases by estimating the case doubling times for various countries. A few things of note from that analysis included:

a. Very different doubling times between countries that suggested that some have effectively managed to slow down the spread of the virus. China in that respect had already succeeded in almost putting the epidemic to a halt and so was not included in the analysis.
b. Even though the estimated doubling times were rather stable they showed an interesting dynamics which called for a more careful analysis. This is true for almost everything that has to do with this epidemic, which is the first of this scale to be monitored in real time.
c. There was no correlation between number of cases and doubling times (Italy being the most problematic case with high number of infected people and low doubling time). There was also no correlation between the doubling times and the days since the first case was reported in each country. This means that even in countries that the "spill" occurred more or less at the same time, the situation has not been developing in the same way.

All of the above imply that a more careful examination and the factoring in of other aspects is required. In this post I have tried to dissect the different modes of the developing situation in countries with more than 300 cases (as of March 24th) excluding China and South Korea that appear to have contained the situation (and hopefully will remain thus).

In the plot below I have created the same graph as in the previous post with an update on the data four days after. A number of things regarding the dynamics can be seen by comparing this plot to this one for March 21st. Italy is showing signs of slowing down the spread moving up the y-axis (Doubling Time, DT). France and Germany have done this quicker but Spain has not done much (with a constant DT of <4). The US is the most worrying case (as many have already noted). With a big population and a doubling time of < 3 it is a question of days before they become the center of the outbreak.

One main difference of this plot with the previous one is the colouring of the counties, which now represents the time since the first reported case (in weeks). All countries with more than 10k cases belong to the category of "early outbreaks" and are thus coloured light blue. But not all "early outbreak" countries have many cases and this is largely due to their high doubling times (Japan and Singapore the prime examples). In the plot below, I have tried to group countries based on the combination of doubling times and weeks since first case, by drawing regression lines that represent a (sort of) Slow-Down Rate (SDR). The higher the slope the most effective the slowing down of the spread is. You may think of the SDR as the fight to increase the doubling time while dealing with an increasing number of cases. For most epidemics the natural cause of things is that doubling times increase (and the spread slows down) with the accumulation of cases, but unfortunately this only happens after a large percentage of the population has been infected. As we try to flatten the curve we cannot afford this and so we hope that this slowing-down will happen with case isolation through quarantine. It remains to be seen if points on this plot show a trend of moving fastly towards the bottom right (bad scenario) or slowly towards the top right (good scenario). 

Through four SDR regression lines, I have split the plot in four areas, each of which contains at least one country where the outbreak was detected early. this means that different SDRs cannot be attributed to insufficient time for observations. In this way, we can then correlate the development of the situation in each country-group with other characteristics and one such is the mitigation measures that have been imposed, as we will see right after.
 

Looking into the plot below (and note that this is a rough, eye-balling split) we can see four different groups. From best to worst, Japan and Singapore have SDR > 12 as do Quatar, Slovenia and other countries which are -nonetheless- late in reporting a case. Their goal is to stay in that area of the plot. A large group with SDR between 6.5 and 4.5 includes the Scandinavian countries (except Denmark which was late in reporting a case and is doing rather well) and a number of other "late outbreak" countries. Some struggle to stay in this area (Belgium, Israel, the Netherlands and Australia) while others (like Greece and Iceland) may be more optimistic. Unfortunately the countries that represent more than 90% of the active cases worldwide lie in SDR areas that are below 4.5. Italy is fighting to get into the "OK" zone of SDR=4.5-6.5 but Germany, France and Spain (in particular) are far from that. The US is even further down. The spread shows no sign of slowing down. Quite the opposite. 
Picture
The question then is: What is making Japan and Singapore so successful and what will help Belgium and the Netherlands avoid the fate of Italy? One answer that has been suggested by the short-term history of this developing situation is: extreme self-isolation measures with effective lock-downs. And this is in fact the approach that most of the countries have been adopting. The best example for Its efficiency as a means of slowing down the spread (even to a halt) has come from China that has now effectively contained the virus. Below we see the steady increase of doubling time in China from January 24th (the first date for which we have data) to February 8th. There is a marked increase in doubling time that more or less starts before the decision for a full lock-down in Hubei province (the epicenter of the outbreak). While the slowing down was already under way before the measures were put into place there is a notable increase of the curve, from February 3rd, which comes a week after the lock down. 

To say we have learned from China is easy but to actually do it like them is more complicated. A number of special characteristics made it much easier to impose and maintain the lock-down in Hubei. The lock down did not affect the whole country and the relationship between the state and the people (to put it gently) made things easier.  
Picture
What about the rest of the world then? It was not very easy to collect data for when exactly measures were taken in every country. There is variability in this respect too, as some countries have not gone into full lock down or did so only after milder measures were put into place. To keep things as homogeneous as possible, I obtained the dates when school closure was decided in 13 countries, from wikipedia articles on the Covid 19 epidemic. I then went back to the Confirmed case data and calculated 5-day rolling doubling times for the last three weeks. This means that, starting from March 1st, I calculated average doubling time estimates for five consecutive days. (Note: this was done to get smoother estimates. It was also the approach I used for the data from China in the plot above).

The results are shown below for 11 European countries (including my native Greece), the US and Israel. I hope to have time to include more in the future, (provided I find time to read wikipedia articles, or get a reliable comprehensive data source). The heatmap shows doubling times from March 1st to March 22nd (because of the 5-day averaging, the two final dates refer to a central date -2). Doubling times lie in a range of 1 to 12 with Iran being in the best situation. You basically look the plot from left to right and hope to be moving steadily (and as fast as possible) to darker colours. 

What does it tell us?

1. First of all, things are encouraging for Italy. Its dynamics in increasing doubling time puts her in the same cluster with Sweden, Greece and the obvious outperformer of the plot that is Iran. 
2. A second cluster containing France, Belgium, the Netherlands and Spain is seriously lagging behind but things are positive for France and the Netherlands that show a slowing down (darker colours as we move from left to right). Still too early to tell about Spain and Belgium.
3. The last cluster is quite inhomogeneous. Germany looks more like France than the UK or Portugal but it is probably the large number of cases that obscures its performance. The US are an obvious outlier. Things don't seem to be going anywhere better.

But remember we did all this to see if the measures on social isolation can actually work. Can we reproduce the effect they had on China? The answer is yes, but not entirely. There appears to be a mild positive association with how early the measures were taken and the development of the situation. Greece, Sweden, Iran and Italy all closed their schools on March 11th. This didn't happen in the UK before March 19th, more than a week after. On the other hand, the lag in school closure between Spain and Italy appears small to justify the dynamics of doubling times between them. Italy is getting better, Spain is not. But it could well be that two or even one day can really make a difference. 

There are also a number of factors that we cannot easily account for. The red lines below show the date schools were shut down but that is not the only measure. Most countries have gone into a complete lock down, shutting down bars and restaurants and even imposing curfews. The time when this happened was not the same. In others, like the US, even the time schools closed was not the same across the country. As data accumulate and the situation develops we will be in a better position to discuss the efficiency of the lock-down and (perhaps more importantly) the time it would be OK to lift them.

Other kinds of data need to be factored in as well. Population densities and demographics may make lock downs more or less effective and some countries may need to considered additional (or other) means of mitigation. 
It sounds frustratingly repetitive but we can only wait and see. 
Picture
0 Comments

Covid-19 Epidemic. Confirmed case doubling times among countries

3/21/2020

0 Comments

 
The Covid-19 pandemic is affecting our lives in every possible way. This is a first in many respects, one of which being that this is the first epidemic for which data regarding cases, deaths and recoveries are being made available in almost real time.
A number of data scientists have been trying to make sense of the available data from many possible angles. Nevertheless, little more than the obvious (and expected) exponential increases have come out of most of the analyses. We now know that the virus is highly contagious and that exponential growth of cases should be the norm, but the goal is to tame this growth as much as possible and in this sense social distancing and case isolation are likely to be the only way to delay the peak of the epidemic (that is, the time when the number of sick people reaches the maximum), or (as you should know by now) to "flatten the curve".

One main problem with most data analytical approaches is the variability of reported data in terms of cases. While the number of deaths cannot be questioned, the way each country reports confirmed cases is very different. Some countries (like South Korea) have opted for extensive testing in the general population, while others (like for instance Greece) have explicitly targeted serious/critical cases, recommending that people with mild symptoms avoid over-crowding hospitals and diagnostic labs. I shall refrain from discussing arguments that exist both for and against these two extreme approaches and focus on the main problem that this variability poses in the analysis of the epidemic dynamics.
If some countries test a lot, the number of cases will be high but the case fatality rate (i.e. the number of deaths per active cases) will be smaller. Countries that test only people with severe symptoms will report low number of cases but higher fatality rates. In any case, it is difficult to tell how the spread differs from one country to the other.

How then can we really know what is going on?
One solution is to focus on increase rates instead of number of cases. Assuming that the way countries report cases doesn't change over time we can try to estimate the rate of increase in confirmed cases from one day to the other. Regardless whether one tests a lot or little, the number of new cases against the previous sum is representative of the spread. This is not something new and people have tried to figure out this rate from the slopes of log-linear fits, but the problem is that these slopes are very prone to random fluctuations especially when case numbers are small. 

In the following I will present a simple approach to address the problem, and more importantly, to gauge into differences in the approaches that different countries employ to tackle the spread of the epidemic.

Data
I used data from Johns Hopkins University, Center for Systems Science and Engineering (CSSE), which are daily updated for all countries that have reported at least one confirmed case and which are freely accessible here: 
https://github.com/CSSEGISandData/COVID-19/tree/master/csse_covid_19_data. 

Case Doubling Time
Instead of looking into slopes of increase curves, I tried to estimate Case Doubling Times. This is the time (in days) it takes for the number of cases to double, given the rate of increase over a certain period of time. This means that if we have N cases on day:x you can estimate the number of days it will take to reach 2N cases, assuming a constant increase rate r.
This is easily calculated from the following equation: Nr^(dt)=2N, where dt is the doubling time in days. Solving the equation for dt gives you: dt=1/log2(r), which means that a good estimate for r is equivalent to a reliable prediction of doubling (obvious).

What I did
1. I took the timeseries data (that is, cases per day for many days in a row) from the link above and estimated a rolling increase rate for every country that has reported at least 300 cases. Rolling, in this sense, means that starting from the day that each country reported the first case, I calculated the mean rate of increase between two consecutive days until today (March 21st). 
2. As this gives a series of increase rates that is equal to the number of days of reporting (minus 1) it is, expectedly, very noisy, especially as the number of cases is still in the lows. You expect it to converge once sufficient cases have been reported (and then you hope it slowly drops to zero). This is why I used only the mean rolling rate of the last 5 days for each country. I then used this mean rate to estimate the Doubling Time explained above. 
3. Even thus, the rate values may vary significantly in countries, where cases have been reported for a short period and this is why I combined the mean rate with its standard deviation (how much it varied over the last 5 days) and the days since the first case was reported.
4.  I then plotted the estimated Doubling Times against the total number of cases, taking into account the variability of increase rates. Each country is represented by a dot. Doubling time is on the y-axis and you want this to be as high as possible. This means you have efficiently "flattened the curve". Total cases is on the x-axis and this means, well the obvious, that a lot of people are sick (but caution: not all countries report in the same way). How reliable are the data? The darker the color of the label, the smaller the standard deviation of the increase rate, thus the more confident we can be of the doubling time we estimate. 


Picture
Conclusions?
So, what can we make of this figure?

Japan is the way to go
First of  all, one should focus on the dark blue labels. These represent countries with low standard deviation of increase rates and thus reliable estimates of doubling times. That said, you want to be as high on the y-axis as possible. We knew already that Japan is doing a great job. Even though they were among the first countries to report a case, they have kept cases low (~1000) and -more importantly- with a doubling time of 25 days. This means that, provided things don't change, Japan is not expected to have more than 3000 cases by the end of April. 
  
Good signs from Italy 
Italy has, on the other hand, been described as the horror story so far. An exploding number of cases, coupled with a high fatality, both of which are probably due to strong exponential increase rates in the early days after the outbreak. Nevertheless, Italy's doubling time is now 5.4 days, almost double that of Germany (2.8 days) and more than double that of the US (a little over 2 days). Whether this is due to the heavy restrictions on movement and social distancing that were imposed a bit more than a week ago remains to be seen. Even though dt=5.4 is not perfect (if stable it means an 60-fold increase within the period of one month), Italy's doubling time was <3d, a few days ago and was less than 2 days in the early days of the epidemic, which means that significant progress is being made in stopping the spread. One possibility is that Italy has simply reached a saturation point in terms of testing capacity and can now only perform a certain number of test everyday, most of which come out positive. This may also be the case of Iran, one of the countries that suffered mostly but which now reports low number of cases and doubling time of more than 10 days. This will be made clearer in the days to come.

Central Europe should look towards Scandinavia
What about the rest? Bad news for the US, Germany, the UK and Austria, for all of which doubling times are estimated to be on the lows. The Netherlands, Belgium and Switzerland are doing a bit better. The Scandinavian countries are performing rather well and in spite great numbers of cases (taking their population into account) they seem to have slowed down the spread. Not much can be said for countries like Greece, Iceland or Slovenia where the spread looks to be halted but the number of cases is still low to allow for accurate estimates.  Countries in the bottom left part of the plot have very low number of cases and great standard deviations for increase rates. It is just too early tell.

What more can we look into?
These are highly volatile data and so, one needs to keep looking for more as the timeseries become longer and thus rolling estimates of increase rates converge. It would be very interesting to look into how doubling times changed for each country taking into account the sort of counter-measures imposed, and, perhaps more importantly, how early after the first case they were put into place. This will probably give us a better idea on what the best approach is. Everybody agrees we are on uncharted waters and need to approach every analysis and its interpretation with a lot of caution.

0 Comments

Bias, Prejudice and a simple "litmus test" to detect them

11/23/2018

0 Comments

 
As part of my (quite loaded) teaching schedule I get to give introductory lectures on statistics and experimental planning to undergraduate students (mostly freshmen). One of the concepts I have the greatest difficulty in explaining is that of "bias", which in statistics is the difference between an estimator's value and the actual value of the parameter that is being estimated. As it is understandably hard to explain a complex concept to freshmen, who are increasingly becoming more and more mathematically illiterate, through equally complex concepts such as "estimators" and "expected values", I often have to take less rigorous approaches. One such is to resort to more mundane explanations of the terms. According to wikipedia, bias is "a disproportionate weight in favour of or against a certain person, thing or group against another", a definition that is much easier for students to grasp as it resonates with the more commonplace notion of prejudice. In fact, most of the examples I am using to explain bias, are very "non-statistical" ranging from my old favourite story on how sharks prefer eating men to women to everyday issues such as the reporting of crimes committed by immigrants in mainstream media.

These are, of course, issues not to be taken lightly as biased ways to report, write and discuss about events are becoming ever more frequent. In my brief spell as aspiring reporter (back in the day) I was surprised to realize how easy it is to let prejudice infiltrate your reporting (Noam Chomsky and Edward Hermann, devote a whole chapter of their seminal "Manufacturing Consent" on this topic).  In our times, however, of a general return to conservatism and ever expanding bigotry, what is more striking is the way we fail to perceive prejudice and bias in every day life. It seems as if we are becoming blind to even the most outright bias in expressed opinions and this, of course, is not at all helpful for my students (as both students and citizens).
To this end I have been thinking on more straightforward ways to detect bias and I have recently co
me up with some ideas, after exchanging opinions with friends on facebook (yes, it is possible). The examples I am be posting below, were inspired by three very different topics that came up on my facebook timeline on the same day.

The first was brought to my attention by an old school mate and it had to do with a somewhat famous greek actress defending her decision not to have kids after being repeatedly asked why she wouldn't in various interviews. My friend, herself also married and happily childless, was infuriated by the way the actress (Katerina Lechou) had to defend "a woman's right" to not have kids. What immediatelly stroke me was the fact that we were discussing this as a "woman's right" and went on to ask why are men never asked this question. This is a very straightforward way to realize that the question "why won't you have kids?" is only part of the problem as long as it is only addressed to women. My friend and many other women were rightfully offended by the content of the question but failed to realize it was also greatly prejudiced since it implied that having children is either something to be decided by women alone or something that men should not really care about.

The second story had to do with a British food editor being forced to resign after making an admittedly bad joke about vegans in an email. Even though, I respectfully understand that some groups of people may be more sensitive to comments than others, you will have to agree with me that there is absolutely no chance that William Sitwell would have quitted his job, had he asked in his message that "all meat eaters be killed". Here again, we see how easy it is to spot bias simply by substituting the object of the statement with its conceptual counterpart (here "vegans" become "meat eaters"). This is the essence of what I call the "bias touchstone". Invert the argument and see if it makes sense or not. If it still sounds reasonable then bias is not so likely. If, however, it doesn't, prejudice may be implied. In this case, the prejudice is a positive one, aimed at "protecting" the sensitivity of vegans. It remains a prejudice nonetheless.

The last example is a bit more personal, as it has to do with the negative evaluation of a grant application, which I received yesterday. The sole reviewer of my proposal had made an honest effort to read it and had a few plausible arguments for rejecting it, especially given the very low acceptance rate of the call. What was however alarming was his/her blunt statement regarding our work. In the part of his/her assessment, where rejection was being justified, he/she had no reservation in writing that "This is a purely computational biology project". The way it read, made it sound almost offensive to be proposing a computational work in a Life Sciences panel. Besides the fact that I have been working in Biology and Biomedical institutes for my entire adult life, I could not resist applying the "bias touchstone" to the statement. The same reviewer has surely never used a comment like "this is a purely developmental/molecular/cellular biology project" as a justification for rejection. (Even though I can think of other subdisciplines such as structural or evolutionary biology that may have been targeted in a similar manner). A simple substitution of the agent of bias in the statement quickly reveals the bias itself.

As double standards are increasingly becoming the norm in may forms of public discourse, this very simple idea can be easily extended as a first assessment for any sort of statement. In science, it can also serve as a rough evaluation of the originality of a given finding. Take any sentence like "X is found to interact with/regulate/inhibit Y" and form its negation: "
X is found NOT to interact with/regulate/inhibit Y". If it still sounds plausible, then the finding is quite interesting. In what regards X and Y anything could be going on, but now -thanks to this work- we know that it's X that regulates Y. If, however, the negative statement sounds quite improbable then the original finding is suddenly not so original. In this case, the goal is not to spot bias but to confirm an imbalance between two -initially- equivalent possibilities.

​At this point you may have realized that this mental experiment is silently conducted all the time, especially by editors and reviewers of scientific journals when assessing the possible "impact" of a scientific finding. It forms the basis of a rather special kind of bias, called "confirmation bias". 
​But this is a story for another post.  
0 Comments

Invisible Cities, Utopias and Genome Architectures

9/8/2017

32 Comments

 
I have always been fascinated by the way analogies can be found between biology and the most -seemingly- irrelevant topics. In the past I have worked on linguistic analogies of the genomic texts (see examples here and here) and we have also been particularly interested in drawing analogies for genomes as self-organising entities (see for instance this nice paper).

​In our group, one of our most active interest is the evolution of genome architecture in an almost literal sense of the term. Thus we try to envision the genomes, especially those of eukaryotes, as structures with an inherent organization, which we cannot (obviously) yet fully decipher. In this respect, we have been studying genomic characteristics for which we can get a lot of data, such as gene expression, regulation, chromatin structure, conservation etc under the lens of their spatial distribution in the genome in both linear and three-dimensional space. 

So earlier this year, starting from an already published dataset, on which we had worked before, we tried to answer a simple question: Do genes with similar properties cluster in the linear space of eukaryotic genomes? This is obviously not a new question. In fact since the very first gene expression analysis papers, people had already realized that gene expression is correlated with gene position (see for example here and here). Soon after came analyses that showed that the proximity between genes was also associated with common regulation (like e.g. in this great paper). In our work we tried to combine all of the above studied properties, plus some more, to show that genes in a simple eukaryotic genome like yeast tend to cluster in particular domains with clear structural and organizational characteristics. Your can read more in our paper here and two relevant blog posts here and there, but the main point is that the yeast genome looks to have involved in such a way, so that their is a "division of labour" of genes in distinct neigbourhoods of the genome, the clearest of which is a distinction between the edges and the centers (i.e. the centromeres) of the chromosomes. 
Picture
This work lead to some immediate follow-up questions. Thus, we were able to show that genes, located close to the centromeres tend to be more conserved that the genome average, or that genes located away from the centromeres tend to be spaced with longer intergenic regions inbetween, but a more direct question would be to define some more detailed aspects of these domains in terms of macroscopic properties discernible by cellular machinery. Using another urbanistic analogy, one could wonder how these "gene neighborhoods looked like".  Imagine flying above a city that you don't know very well. If the city has been planned in a way that everything is homogeneous you would not be able to say much about the basic landmarks of the city. This holds true for a certain type of urban landscapes called "utopic". These are not in the sense of "earthly paradises" but in the literal sense of the term ("ου τόπος" which is greek for "no place"), meaning that there is a lack of landmarks exactly because of the fact that they have been ultra-planned to lack any such point of reference. Everything is similar to each other and this overall homogeneity is supposed to enforce equality of access and maximal sharing of space.

​​Back in the 20s and 30s these concepts were highly popular as can be seen in Le Corbusier's Ville Radieuse which was the inspiration for Oscar Niemeyer planning of the city of Brazilia. Utopian architecture has also inspired Italo Calvino, who in one of his Invisible Cities describes the city of Zoe as a place where "the lack of signs does not allow you to understand the function of each building; your are lost in an indivisible environment". The Ville Radieuse (shown below) or an artistic representation of Zoe (above right) are examples of how the lack of landmarks may appear at the same time all-encompassing but also hard to navigate. In fact, utopian architectures are nowadays only used in particular cases such as airport terminals or metro stations, which you can only navigate when assisted by additional signals (gate numbers, names of stations) and very often the constant help of the voice of an announcer. Most modern city landscapes are not like this. They have particular landmarks: parks, boulevards, avenues and buildings that are distinct from each other, which gives them a symbolic character. They are not designed to be thus but have probably "evolved" (in a relaxed sense of the term) like this through the gradual aggregation of elements. They are in this sense easier to "parse" by being able to distinguish the City Hall from the General Hospital, the University campus from an industrial zone etc.
Picture
But what is the case for the genomes of eukaryotes? Do they look like uniform Utopias or are they embedded with particular landscapes? To answer this question we set out to define one set of possible landmarks in the genome of S. cerevisiae by looking into a particular class of genomic "neighborhoods". Starting from a recent observation of topologically associated domain in yeast (see here) we split the genome in such domains that are expected to be structurally self-contained, (that is they tend to have much more physical interactions within them than inbetween) and then looked into one particular chromatin property. After splitting the yeast genome in topologically defined territories we turned into a more short-scale characteristic, one that we know too well from our previous works on promoter architectures in both yeasts and men: nucleosome positioning.

​By obtaining publicly available nucleosome occupancy data we looked into the patterns of nucleosome positioning around the transcription start sites of the genes contained in each of the defined domains. Even though there is a general pattern of nucleosome positioning in yeast genes, with a clear nucleosome free-region (NFR) located exactly upstream of the TSS, we were surprised to see that different topological domains harboured genes with very different nucleosomal architectures. We were, moreover, able to cluster yeast TADs in six different categories, according to their patterns of nucleosome positioning. In the Figure below, one can see how radically different the short-scale chromatin structures can be. A clear distinction between a more "open" and a more "closed" chromatin conformation can be directly linked to regulatory and functional properties of these domains. The main point though, was that we were able to answer our initial question: There appear to be structurally-defined landmarks that distinguish different "neighbourhoods" in the yeast genome. Even though, a large number of additional aspects can be thought of (enrichment in transcription factors, epigenetic marks etc), nucleosome positioning patterns represent a more general characteristic that is directly linked (and possibly affecting) most others.
Picture

The results of this work were published in a recent paper by our group, in which, besides putting forward some interesting hypotheses on genome evolution (more on that soon), we are also proud to have "squeezed" a reference to one of our favourite books in the title.
32 Comments

TADs in yeast and how you can go around a reviewer (if you are right)

4/20/2017

29 Comments

 
Just a few weeks ago we published a paper on Genome Urbanization, a concept describing the spatial clustering of genes in the unicellular eukaryote genome of S. cerevisiae (you can read more about it here). One of the things we put forward in that paper was the existence of discrete topological domains in the yeast genome that strongly resembled the Topologically-Associated Domains (TADs) initially discovered in mammals and now found in most complex eukaryotes. We based our arguments on some rather clear topological boundaries that we were able to observe on HiC contact maps obtained from a widely cited (Duan et al, 2010) public dataset. As you may see in the bottom of the Figure below (adapted from Figure 4D of our paper), one can locate boundaries between TAD-like domains even by eye inspection. In order to do so, we used an insulating approach (as described by Crane et al, Nature 2015) that is largely independent of the local read density. After defining such insulating regions we went on to show that genes that are up-regulated upon topological stress tend to cluster within these regions. 

​But since TADs had not (at the time) been reported in yeast, one of the main criticisms that we received during the review process was directed at this analysis. The reviewer's comment was:
"Authors declare the existence of TAD-like globules in budding yeast. However, these kinds of structures have, so far, never been detected in Saccharomyces cerevisiae. If the authors want to establish the existence of such TAD like structures, they must reinforce their analysis."

At the time, we were eager to get the paper accepted and so we down-played the original term "TAD-like" in "insulated domains" in the final version. Figure 4D remained though and was further supported by a number of analyses that showed the robustness of the boundaries upon different normalization strategies and the lack of LTRs in these regions. In our view, it mattered little to get the message of "TADs also exist in yeast" across, as our main interest was to show the tendency for spatial clustering of genes.
Picture
HiC contact maps for yeast chromosome IV. Top: Figure 1B from Eser et al. (PNAS, 2017). Bottom: Figure 4D from our Genome Urbanization paper (Tsochatzidou et al, Nucleic Acids Res, 2017). Even though the original HiC datasets are not the same (top: Noble lab 2017, bottom: Noble lab 2010) the maps show great similarity. The location of the boundaries shows significant discrepancies as the way to define them differs (see text below).

We were nevertheless right all along as was only recently shown in a paper by the Noble lab (whose original data we had used in our analysis). In a paper that just came out in PNAS, Eser et al., show that TADs do exist in yeast and that they have some very interesting properties. Eser et al., use a new HiC dataset (it seems that you cannot escape the curse of having to RE-do the experiments even if you were the one to perform them originally) but apply a different method to call the boundaries. Their "coverage score" is interesting as an approach because it is insensitive to the resolution of the obtained boundaries (a problem we had to deal with by arbitrarily choosing a 10kb window) and leads to fewer TADs that we were able to define but this is likely related to thresholds in the calling process (we used a 5%-percentile approach, while Eser et al use a local minimum function). Eser et al, find 41 TADs, (we found 85) with a median size that is more than double of the one we found (260kb vs our 100kb). The fact that remains is that there is significant coincidence between both the maps and the boundaries as you can see in the Figure above (adapted from Figures 1B from Eser et al, 2017 and 4B from our paper). 

​What is more important, the properties that are shared by the TAD boundaries in Eser et al. are matching our observations in many respects, as they are shown to be enriched in transcription activity (as originally shown by the group of Sergei Razin) and activating histone marks. Above all, Eser et al. report that regions between the defined TADs are significantly overlapping areas of topoII depletion, which constitutes an immediate link to our finding that genes that tend to be up-regulated by topoII inactivation are enriched in insulated regions (remember this was our way to call TAD boundaries, without using the term "TAD"). Thus, even though one of the main arguments in Eser et al. is that TADs in yeast are mostly related to DNA replication than transcription, it seems that you cannot really do away with transcriptional effects in relation to chromatin organization, especially in a unicellular eukaryote genome where the two processes are expected to be more tightly connected.


In closing, we can now be confident that TADs, or TAD-like domains if you will, do exist in yeast and that they are inherently related to both DNA replication and transcription (even though perhaps indirectly).  Our observations under topological stress lie in the interphase between the two processes as torsional stress accumulation inevitably affects the DNA replication process. Even more interesting, in our view, is the fact that the embedded constraints in the organization of genome architecture are reflected on the evolution of gene distribution along chromosomes, but then again we have already discussed this elsewhere. One last point that we can make is that it is reassuring to see you can constructively argue with a reviewer (provided the reviewer's sanity) if your hypothesis is solid and supported by the data and that it is always nice to see you were right in the first place, even though sometimes courtesy towards a reviewer obliges you to be less audacious in the choice of terminology.
29 Comments

Standardization and availability of data come first. Reproducibility second.

4/9/2017

0 Comments

 
Last week a new paper came out from our group's collaboration with the group of Niki Kretsovali at the IMBB, FORTH. Appearing in Stem Cell Reports, this work by lead author and friend Christiana Hadjimichael shows that Promyelocytic Leukemia Protein (PML) is a key factor in the maintenance of pluripotency in mouse ES cells, exhibiting its control on a number of pathways including that of TGF-β at an early stage. The main point of this work is that it provides a link between a protein that has been extensively studied in a different context (its role in cell cycle and apoptosis) and a developmental process such as cell differentiation.

Our group's involvement in this work was principally related to the analysis of gene expression data, the identification and prioritization of differentially expressed genes and (at the revision stage) the comparison of our data with publicly available gene expression profiles in order to validate the main finding of the paper, that PML knockdown cells have a profile that is similar to differentiated epiblast-like cells. 
Antonis Klonizakis, an undergraduate student from our group, had to go through the mill of finding, analyzing and comparing public gene expression datasets to show that PML knockdown cells resemble a state of primed differentiation, lying intermediate between ES and epiblast cells. 
Picture
But the question that prompted this post was exactly this: Why should Antonis go through the mill to do something that sounded perfectly straight-forward in the first place? There are now thousands of avaiable gene expression experiments conducted and reported every year. Why did he then suffer so much to locate just a couple to compare with Christiana's dataset? The answer is that when it comes to data availability and standardization the situation is far from "straight-forward". Starting from the beginning, I supervised most of Antonis' search only to realize that getting data was much more difficult than what we had initially thought.

First of all, there is biological variability that you can't do away with. Stem cells come in different types, "flavours" and various cell lines that are as different from each other as they are compared to other cell types. Even then, after having located profiles of the same kind of stem cells we were dealing with, we were faced with problems that had to do with the standardization of data processing. Many (most) papers failed to adequately report the data processing steps and thus we were unable to reproduce the results they were reporting by analyzing the raw data ourselves. This may sound like an excuse for irreproducibility but in my view it is the main reason behind many of the irreproducibiliy issues in research, that recently have even caught the attention of media such as the Wall Street Journal (as if they didn't have enough Wall Street-related problems to deal with already). Lack of standardization is a major issue for two reasons: first, it makes it very likely that the results that you come up with by repeating a series of complicated processing steps (that are not thoroughly reported in the original paper) do not match the ones reported and second, it makes the whole idea of comparing data so discouraging that in many cases it is preferable to repeat the whole experiment yourself. To the non-biomedically-oriented readers this may sound like an incredible waste of time, money and human resources but it is so commonplace that it was the original suggestion by the reviewers of Christiana's paper. What they asked for was to conduct gene expression analyses in other ESC lines, for which data were surely available already.

Going beyond standardization the situation becomes even worse when one considers the availability of data. In their search for datasets to compare with, Antonis and Christiana came up with papers such as this one, for which the data were not only not standardized but not even reported. That is right! You skim the paper for GEO or ArrayExpress links and find none, you read it carefully, you go through the (rudimentary) supplementary data and you still find nothing, then you (in this case I) write to both the corresponding author and the editors of a respectable journal and you are still waiting for an answer three months later. Such situations may (and should) be unheard of in other contexts but are somehow acceptable in the highly competitive field of biomedicine. To people like us, though, that are hoping that the accumulation, cross-comparison and validation of data may be a way to acquire new  knowledge all this is particularly disturbing. Not least because it makes our work harder, but because it also makes everyone else's less significant.



0 Comments

Footballomics: A modified Gini Coefficient for Club Performance

4/8/2017

0 Comments

 
Rank Gini coefficient
The last time we talked about Footbalomics (or the analysis of football data) we discussed a marked disparity in the performance of a certain Premier League club (Liverpool FC) in dealing with top competitors as opposed to lesser opponents. In that post we saw that LFC were doing remarkably well against better teams, while the norm is that most of the clubs do better when playing inferior teams (as expected) and a few do equally well against all (PL leaders Chelsea is a notable example). 
Picture
The observed disparity prompted me to attempt to summarise this trend and its fluctuation among clubs with one value. The difference between top10/bottom10 or even top6/bottom6 may not be very useful since the margins may vary depending on the skewed point distributions in the league. Other leagues may be tighter while others (e.g. the Spanish or the German, not to talk of the Greek) are one- or two-horse races. A concept that may be useful here is that of the Gini coefficient. To the uninitialized, the Gini coefficient is a measure of statistical dispersion. First introduced by Corrado Gini around the turn of the 20th century, it has earned significant attention at the turn of the 21st since it can be used to describe distribution disparity as it has, repeatedly, in the case of income distributions. In a nutshell, the Gini index tells you how much a certain value is distributed evenly on in a highly skewed manner. Assuming that all N citizens of country X share the same amount of its GDP, and thus each earns GDP/N, gives a Gini of 0, while in the (much more likely) case that one person earns all the GDP leaving 0 to everybody else, gives a Gini Index of 1. Real-life Gini coefficients range between the 0.30 and 0.70.

The Question: How can we apply a Gini coefficient in the case of football performance?
If you are interested in knowing more about how a mundane statistic like the Gini Index can provide insight on the performance of a club or the structure of a whole league you may want to read on here.
0 Comments

A first take on Footballomics: Analysis of footbal data

3/30/2017

0 Comments

 
What we are doing

Being in science combines a number of rewarding activities that make the daily working routine fulfilling in many ways. You get to learn new things everyday, you (sometimes) even understand how things work (or even better how nature works) and you get to interact, through teaching, with young people that are full of contagious optimism and aspiration. Last but not least, you have the freedom to make your own working schedule and, more often than in other jobs, find time to apply what you learn in things you were always curious about.

Footballomics: Take #1

And I am, I ’ve always been, curious (nay! crazy) about football in all aspects of it. Playing, whatching, talking, thinking and dreaming about it. Over the years, job, family and age have caught up with me and thus I have now grown a more mature way of appreciating the “beautiful game”, from worshiping players to admiring managers and from chanting on the stands to reading about football tactics. My professional involvement to data analysis and statistics, has also lead to my developing of a more “quantitative” approach about football and thus I have always wanted to try to use some of the simple (or not simple) principles of my everyday work routine, which includes making sense of data for biological problems to more “mundane” questions regarding football. In this, my first ever, attempt to analyze football data, I took the opportunity (OK, I took advantage) of teaching a (hopefully) interesting graduate class on “R for Bioinformatics” at the University of Crete, Medical School. After having introduced the basic concepts of R to the students I thought of giving them an example of how we can use it to attack simple questions based on data. And since they are (or will soon be) fed up with biological problems I thought of giving them a different kind of a puzzle,which brings us to:

The Question: Are Liverpool performing significantly better with top-flight teams than with bottom-table “minnows”?

Being a big (OK, huge) fan of Liverpool Football Club in the post-90s era can be exhilarating and frustrating at the same time. You get to experience glorious moments like the Miracle in Instabul or last year’s come-back against Borussia Dortmund, but you also get to see them miss on league after league campaigns by unexpected losses to “lesser” teams like Crystal Palace in 2014. This year in particular, this trend of being imperious in big games, only to lose nerve against teams like Burnley, Bournemouth or Swansea has been more apparent than ever. Liverpool are doing very well when playing big opponents that are title challengers and somehow sink when they find themselves against tough-to-crack defenses. I am, of course, not the first to address this issue, brought up by former managers and former players turned football pundits. The question, though, when it comes to punchlines such as “Liverpool sink against lesser sides” is how well they are founded on real data and this is exactly the question I posed to my (patient) students. What they had to do was to test whether Liverpool indeed performed worse than expected against teams at the end of the table, the word “expected” being the key.

If you are ready for a long read on football, data mining and some medium level R code you can see the rest of the details here
0 Comments
<<Previous

    RSS Feed

    It's all about...

    Bioinformatics and computational biology with a focus on chromatin and genome architecture, plus a little bit of football and occasional aspects of  University education.

    Archives

    April 2021
    December 2020
    March 2020
    November 2018
    September 2017
    April 2017
    March 2017
    December 2016
    November 2016
    February 2016
    May 2015
    November 2014
    September 2014
    July 2014
    February 2014
    November 2013
    October 2013

    Categories

    All
    Academic Life
    Bioinformatics
    ChIPSeq
    ChIPSeq Bias
    Cpg Islands
    Data Analysis
    Exons
    Football
    Footballomics
    Gene Regulation
    Genetic Diseases
    Genome Architecture
    Genome Structure
    Inflammation
    Journalism
    Math Illiteracy
    NGS
    Nucleosome Positioning
    Nucleotide Composition
    Nucleotide Skews
    Promoters
    R
    Splicing
    Statistics
    Systems Biology
    Tnf
    Transcriptome
    Variation
    Whole Exome

Powered by Create your own unique website with customizable templates.