R code: Working Trakus’ Sectionals for Meydan

UPDATE (7/2/14):Since writing this post I have created a Github account and placed the functions in a gist, which can be sourced into R by any would be user, from any computer. This will allow me to update the functions as and when I encounter problems, the gist can be found here. To get the functions into R type the following and the functions should appear:

source("https://gist.github.com/durtal/2834597685162e5dc266/raw/c39211e498c2d5e8a56c7e88609fbd7644dbf245/meydantrakus")

Intro

In this post I wanted to walk through the R code and functions I developed to help work with the Trakus sectionals, I doubt there will be much interest beyond a couple of people, but anyways, any budding sectionalistas might like to know how to go about collecting times.

As mentioned in the Sectional Review post the other day, the code I’ve written could probably be made much more efficient, but for the most part it runs smoothly. Functions are idiosyncratic rather than generic, which means they have been written with Trakus for Meydan in mind, so they are not immediately transferable to the other courses Trakus cater for. The functions are written for me, so they return outputs in a form that I deem preferable, which might not be the case for others. There are a number of separate functions, and occasionally they rely on the outputs of other functions, so are, more often than not, to be performed sequentially or errors will arise. Errors or bugs cause a large amount of frustration for an inexperienced programmer like me, but if you try any of the functions and it returns an error let me know and I will try to help. Ultimately, it would be good to see if others can utilise the code and functions without any hiccups or even offer suggestions to make the functions more efficient. Learning how to debug code and tease out any issues should help me in the long run, as I hope to one day begin creating a handicapping package in R.

Step One – getting times into R

The post-ramble in the Meydan Review did highlight that Trakus could make the times more accessible, but at the moment the Trakus sectionals found on via the “Printable Sectional Times” is probably in the best format to work with. For most of the races at Meydan, Trakus provide furlong by furlong (or 200m by 200m) cumulative times, for other courses I’ve looked at it’s typically the cumulative times for every 2f. Trakus for Meydan can be found here, and the link to the “Printable Sectional Times” is found below the drop-down menu that lets you scroll through and see individual sectional times, speeds, distance raced data, etc, or the Race Summary. The first image below is a partial screenshot of the Trakus data hosted on the DubaiWorldCup site, with the link to the Printable Sectional Times circled in red, the second image is what the printable sectional times look like: (clicking any image will open a gallery of larger images)

To manipulate and clean the sectionals times found on the Printable Sectional Times pages we need to first get them into R. I use the Opera web browser (there are issues with other browsers, Chrome, Safari, Explorer, that causes problems that I don’t understand), and to get the times into R it’s simply a case of highlighting the times, as the third image above shows, while they’re highlighted hit Ctrl+ C, copying the times to your computers’ clipboard, then in R use the code on line 1 below to get the times into R:

[If you want to use an different browser to Opera, then an additional step is necessary, first highlight the times, copy them using Ctrl+ C and paste the times onto an Excel sheet, once in Excel, the times should have been separated into cells (the highlighting process in Chrome, Explorer, Safari doesn’t separate the elements I don’t think. Can any web experts explain the issue?), now if you highlight the times in Excel, copy them with Ctrl+ C, and then go into R and enter the code on line 1 below, it should work fine.]

racedata <- read.delim("clipboard", header=FALSE, sep="\t", stringsAsFactors=FALSE)

# Sometimes an error pops up about an "incomplete final row", so I try re-highlighting and 
# copying and trying the code again, or try pasting the times into Excel and copying them 
# to the clipboard from Excel 

racedata <- read.table("clipboard", h=F, sep="\t", stringsAsFactors=F) 

# but if it continues to return an error I usually move on, very frustrated.

# I saved the times to the object "racedata", calling that object we get:

racedata
 V1 V2                   V3         V4         V5         V6         V7           V8           V9          V10          V11          V12
  1  3        GABRIAL (IRE) 15.61[12]  27.07[12]  38.61[12]  49.86[12]  1:01.87[12]  1:13.59[12]  1:25.19[11]  1:36.70[05]  1:48.75[01] 
  2  7 EL ESTRUENDOSO (ARG) 14.72[06]  26.26[06]  37.61[05]  49.12[05]  1:01.13[05]  1:12.99[04]  1:24.62[03]  1:36.27[01]  1:48.77[02] 
  3  8        TARBAWI (IRE) 15.04[08]  26.53[08]  37.90[08]  49.38[08]  1:01.39[07]  1:13.19[07]  1:24.76[04]  1:36.56[04]  1:48.91[03] 
  4 10     SANSHAAWES (SAF) 14.83[07]  26.43[07]  37.89[07]  49.35[07]  1:01.44[08]  1:13.18[06]  1:24.81[05]  1:36.88[07]  1:49.04[04] 
  5  6          VASILY (GB) 14.52[04]  26.03[04]  37.40[03]  48.95[03]  1:00.86[02]  1:12.72[02]  1:24.44[02]  1:36.53[03]  1:49.29[05] 
  6  2        ELLEVAL (IRE) 15.41[11]  26.92[11]  38.52[11]  49.69[11]  1:01.64[10]  1:13.38[09]  1:25.01[08]  1:36.84[06]  1:49.33[06] 
  7  9        AUDITOR (USA) 14.67[05]  26.24[05]  37.67[06]  49.18[06]  1:01.24[06]  1:13.28[08]  1:24.88[06]  1:36.88[08]  1:49.63[07] 
  8  5       STARBOARD (GB) 15.09[10]  26.61[10]  38.15[10]  49.61[10]  1:01.73[11]  1:13.39[10]  1:25.07[10]  1:36.99[09]  1:49.88[08] 
  9 11     WAR MONGER (USA) 14.41[02]  25.78[02]  37.00[01]  48.49[01]  1:00.52[01]  1:12.39[01]  1:24.13[01]  1:36.48[02]  1:49.90[09] 
 10 12  FANTASTIC MOON (GB) 15.07[09]  26.53[09]  37.99[09]  49.42[09]  1:01.45[09]  1:13.43[11]  1:25.32[12]  1:37.52[11]  1:51.21[10] 
 11  4      DO IT ALL (USA) 14.51[03]  26.03[03]  37.43[04]  48.97[04]  1:01.06[04]  1:13.10[05]  1:24.96[07]  1:37.11[10]  1:51.84[11] 
 12  1    WITHOUT FEAR (FR) 14.28[01]  25.77[01]  37.08[02]  48.68[02]  1:00.92[03]  1:12.89[03]  1:25.07[09]  1:37.80[12]   1:52.28[12]

Step Two – compulsory cleaning

Once Trakus’ sectionals are in R, we can use some functions I wrote to clean the times up. The first function is called cleanup and does what it says, it has one argument, the racedata. The for loop, which begins on line 2, works through the columns where the times are found (starts with column 4, labelled V4 in the above section of code), and for each element the square brackets are removed, and the times converted into seconds. Column 2 is removed from the data frame as this is deemed unnecessary info (it is the number the horse was on the racecard). (The numbers inside the square brackets are the positions the horses were in at each sectional, at the moment this is discarded, but I might look to change the function and give an option to keep the positions.)

cleanup <- function(racedata){
    for(i in 4:length(racedata)){
        racedata[,i] <- sapply(strsplit(racedata[,i], "[[]"), function(x) x[1])
    }
    toSeconds <- function(x){
        unlist(
            lapply(x, function(y){
                y <- as.numeric(strsplit(y, ":", fixed=T)[[1]])
                if(length(y)==2)
                    y[1]*60 + y[2]
                else if(length(y)==1)
                    y[1]
            }))
    }
    for(i in 4:length(racedata)){
        racedata[,i] <- toSeconds(racedata[,i])
    }
    racedata <- racedata[,-2]
    return(racedata)
}

# Run the cleanup function on the racedata from above and we see the times have been
# converted into seconds, the 2nd column has been removed (this was deemed superfluous
# info, it was the number the horse was on the racecard)

racedata1 <- cleanup(racedata)
racedata1
 V1                   V3    V4    V5    V6    V7    V8    V9   V10   V11    V12
  1        GABRIAL (IRE) 15.61 27.07 38.61 49.86 61.87 73.59 85.19 96.70 108.75
  2 EL ESTRUENDOSO (ARG) 14.72 26.26 37.61 49.12 61.13 72.99 84.62 96.27 108.77
  3        TARBAWI (IRE) 15.04 26.53 37.90 49.38 61.39 73.19 84.76 96.56 108.91
  4     SANSHAAWES (SAF) 14.83 26.43 37.89 49.35 61.44 73.18 84.81 96.88 109.04
  5          VASILY (GB) 14.52 26.03 37.40 48.95 60.86 72.72 84.44 96.53 109.29
  6        ELLEVAL (IRE) 15.41 26.92 38.52 49.69 61.64 73.38 85.01 96.84 109.33
  7        AUDITOR (USA) 14.67 26.24 37.67 49.18 61.24 73.28 84.88 96.88 109.63
  8       STARBOARD (GB) 15.09 26.61 38.15 49.61 61.73 73.39 85.07 96.99 109.88
  9     WAR MONGER (USA) 14.41 25.78 37.00 48.49 60.52 72.39 84.13 96.48 109.90
 10  FANTASTIC MOON (GB) 15.07 26.53 37.99 49.42 61.45 73.43 85.32 97.52 111.21
 11      DO IT ALL (USA) 14.51 26.03 37.43 48.97 61.06 73.10 84.96 97.11 111.84
 12    WITHOUT FEAR (FR) 14.28 25.77 37.08 48.68 60.92 72.89 85.07 97.80 112.28

# I am not sure what would happen if there was a runner who failed to complete the race, thereby
# not recording a time, I fear the function would break...

Step Three – compulsory naming of variables

The next function, called newnames, creates variable names for the racedata1 data frame, these are always dependent on the distance over which the race was run, as the sectionals will have been recorded at different distances. This is also a compulsory function if you wish to later calculate finishing speeds, individual sectionals, etc. The first argument is the cleaned data that was returned after using the cleanup function (so in this example we stored the cleansed times in racedata1), the second argument is the Distance over which the race was run in metres, but in the function it is a numeric argument, so do not add any letters. (The variable names should be self explanatory; variable and object names cannot begin with numbers, so the prefix of “c” is one I prefer to use, easy to remember as cumulative.)

newnames <- function(racedata, Distance){
    if(Distance==1000){
        names(racedata) <- c("Pos", "Horse", "c200", "c400", "c600", "c800", 
                         "c1000")
    } else if(Distance==1200){
        names(racedata) <- c("Pos", "Horse", "c200", "c400", "c600", "c800", 
                         "c1000", "c1200")
    } else if(Distance==1400){
        names(racedata) <- c("Pos", "Horse", "c200", "c400", "c600", "c800", 
                         "c1000", "c1200", "c1400")
    } else if(Distance==1600){
        names(racedata) <- c("Pos", "Horse", "c200", "c400", "c600", "c800", 
                         "c1000", "c1200", "c1400", "c1600")
    } else if(Distance==1800){
        names(racedata) <- c("Pos", "Horse", "c200", "c400", "c600", "c800", 
                         "c1000", "c1200", "c1400", "c1600", "c1800")
    } else if(Distance==1900){
        names(racedata) <- c("Pos", "Horse", "c400", "c600", "c800", "c1000", 
                         "c1200", "c1400", "c1600", "c1800", "c1900")
    } else if(Distance==2000){
        names(racedata) <- c("Pos", "Horse", "c400", "c600", "c800", "c1000", 
                         "c1200", "c1400", "c1600", "c1800", "c2000")
    } else if(Distance==2200){
        names(racedata) <- c("Pos", "Horse", "c400", "c800", "c1000", "c1200", 
                         "c1400", "c1600", "c1800", "c2000", "c2200")
    } else if(Distance==2435){
        names(racedata) <- c("Pos", "Horse", "c400", "c800", "c1200", "c1400", 
                         "c1600", "c1800", "c2000", "c2200", "c2435")
    } else { stop("Invalid Distance: amend function")} 
    racedata$Horse <- tolower(racedata$Horse)
    return(racedata)
}

racedata2 <- newnames(racedata1, 1800)
racedata2
Pos                Horse  c200  c400  c600  c800 c1000 c1200 c1400 c1600  c1800
  1        gabrial (ire) 15.61 27.07 38.61 49.86 61.87 73.59 85.19 96.70 108.75
  2 el estruendoso (arg) 14.72 26.26 37.61 49.12 61.13 72.99 84.62 96.27 108.77
  3        tarbawi (ire) 15.04 26.53 37.90 49.38 61.39 73.19 84.76 96.56 108.91
  4     sanshaawes (saf) 14.83 26.43 37.89 49.35 61.44 73.18 84.81 96.88 109.04
  5          vasily (gb) 14.52 26.03 37.40 48.95 60.86 72.72 84.44 96.53 109.29
  6        elleval (ire) 15.41 26.92 38.52 49.69 61.64 73.38 85.01 96.84 109.33
  7        auditor (usa) 14.67 26.24 37.67 49.18 61.24 73.28 84.88 96.88 109.63
  8       starboard (gb) 15.09 26.61 38.15 49.61 61.73 73.39 85.07 96.99 109.88
  9     war monger (usa) 14.41 25.78 37.00 48.49 60.52 72.39 84.13 96.48 109.90
 10  fantastic moon (gb) 15.07 26.53 37.99 49.42 61.45 73.43 85.32 97.52 111.21
 11      do it all (usa) 14.51 26.03 37.43 48.97 61.06 73.10 84.96 97.11 111.84
 12    without fear (fr) 14.28 25.77 37.08 48.68 60.92 72.89 85.07 97.80 112.28

Step Four – choices, choices

Now the times have been cleaned and have variable names, we can choose to call a much bigger function which offers a choice of outputs. The function is called fs.extras, there are two primary arguments: the racedata (in this example, it is now racedata2 from the above section of R code, that which has been cleaned and named), and the Distance over which the race was run. The output will always include the finishing speeds of each runner (a preference of mine), and by default it will be merged with the racedata input, creating a larger data frame, if you do not want to merge the two and just want the finishing speeds, then you can provide the argument merge=FALSE when calling the function. There are other options too, you can return the individual sectionals by providing the argument ISects=TRUE, and the positions of runners during the race with the argument Pos=TRUE, you can also add an empty section, with NA values, where you can append the distance raced data at a later date, to add the empty section, include the argument Trip=TRUE. The positions section is worth paying closer attention to, any horses that record the same cumulative time are ranked as having the same position, this is different to the positions Trakus provide, where if two horses record the same times they will not have the same position (criteria for choosing which is positioned in front is unclear). The problem with the Positions returned in the function below is if there are two leaders who record the same time, each horse will be be assigned position 1, okay so far, but the horse behind the two leaders is assigned position 2, due to their time being the 2nd fastest. This needs looking at it more detail.

fs.extras <- function(racedata, Dist, ISects=FALSE, Trip=FALSE, Pos=FALSE, merge=TRUE){
	dists <- c()
	firstcol <- which(names(racedata)=="Horse") + 1
	lastcol <- which(names(racedata)==paste("c", Dist, sep=""))
	for(i in (firstcol):(lastcol)){
		dists <- append(dists, sapply(strsplit(names(racedata)[i], "c"), function(x) x[2]))
	}
	FinSpd.df <- racedata[,(firstcol):(lastcol - 1)]
	finaltimes <- racedata[,lastcol]
	FinSpd.df <- finaltimes - FinSpd.df
	findists <- Dist - as.numeric(dists[1:(length(dists)-1)])
	for(i in 1:(length(dists)-1)){
		FinSpd.df[,i] <- round(((finaltimes * findists[i]) * 100 / (FinSpd.df[,i] * Dist)) / 100, 3)
	}
	names(FinSpd.df) <- paste("fs", as.character(findists), sep="")
	ISects.df <- racedata[,(firstcol):(lastcol)]
	names(ISects.df) <- paste("i", dists, sep="")
	l.df <- length(ISects.df)
	for(i in l.df:2){
		ISects.df[,i] <- ISects.df[,i] - ISects.df[,i-1]
	}
	Pos.df <- racedata[,(firstcol):(lastcol)]
	for(i in 1:length(Pos.df)){
		Pos.df[,i] <- as.numeric(factor(Pos.df[,i]))
	}
	names(Pos.df) <- paste("p", dists, sep="")
	Trip.df <- ISects.df
	names(Trip.df) <- paste("d", dists, sep="")
	n <- names(Trip.df)
	for(i in 1:length(n)){
		Trip.df[,n[i]] <- NA
	}
	if(ISects==TRUE){
		newdf <- cbind(FinSpd.df, ISects.df)
	} else { newdf <- FinSpd.df }
	if(Pos==TRUE){
		newdf <- cbind(newdf, Pos.df)
	} else { newdf <- newdf }
	if(Trip==TRUE){
		newdf <- cbind(newdf, Trip.df)
	} else { newdf <- newdf }
	if(merge==TRUE){
		return(cbind(racedata, newdf))
	} else { return(newdf)}
}

# Calling the fs.extras function and leaving the defaults as they are:

racedata3 <- fs.extras(racedata2, 1800)
racedata3
Pos                Horse  c200  c400  c600  c800 c1000 c1200 c1400 c1600  c1800 fs1600 fs1400 fs1200 fs1000 fs800 fs600 fs400 fs200
  1        gabrial (ire) 15.61 27.07 38.61 49.86 61.87 73.59 85.19 96.70 108.75  1.038  1.036  1.034  1.026 1.031 1.031 1.026 1.003
  2 el estruendoso (arg) 14.72 26.26 37.61 49.12 61.13 72.99 84.62 96.27 108.77  1.028  1.025  1.019  1.013 1.015 1.013 1.001 0.967
  3        tarbawi (ire) 15.04 26.53 37.90 49.38 61.39 73.19 84.76 96.56 108.91  1.031  1.028  1.022  1.016 1.019 1.016 1.002 0.980
  4     sanshaawes (saf) 14.83 26.43 37.89 49.35 61.44 73.18 84.81 96.88 109.04  1.029  1.027  1.022  1.015 1.018 1.014 1.000 0.996
  5          vasily (gb) 14.52 26.03 37.40 48.95 60.86 72.72 84.44 96.53 109.29  1.025  1.021  1.013  1.006 1.003 0.996 0.977 0.952
  6        elleval (ire) 15.41 26.92 38.52 49.69 61.64 73.38 85.01 96.84 109.33  1.035  1.032  1.029  1.018 1.019 1.014 0.999 0.973
  7        auditor (usa) 14.67 26.24 37.67 49.18 61.24 73.28 84.88 96.88 109.63  1.026  1.023  1.016  1.008 1.007 1.005 0.984 0.955
  8       starboard (gb) 15.09 26.61 38.15 49.61 61.73 73.39 85.07 96.99 109.88  1.030  1.026  1.021  1.013 1.014 1.004 0.984 0.947
  9     war monger (usa) 14.41 25.78 37.00 48.49 60.52 72.39 84.13 96.48 109.90  1.023  1.016  1.005  0.994 0.989 0.977 0.948 0.910
 10  fantastic moon (gb) 15.07 26.53 37.99 49.42 61.45 73.43 85.32 97.52 111.21  1.028  1.021  1.013  1.000 0.993 0.981 0.955 0.903
 11      do it all (usa) 14.51 26.03 37.43 48.97 61.06 73.10 84.96 97.11 111.84  1.021  1.014  1.002  0.988 0.979 0.962 0.925 0.844
 12    without fear (fr) 14.28 25.77 37.08 48.68 60.92 72.89 85.07 97.80 112.28  1.018  1.009  0.995  0.981 0.972 0.950 0.917 0.862

# choosing not to merge the finishing speeds with the racedata input

fs.extras(racedata2, 1800, merge=FALSE)
fs1600 fs1400 fs1200 fs1000 fs800 fs600 fs400 fs200
 1.038  1.036  1.034  1.026 1.031 1.031 1.026 1.003
 1.028  1.025  1.019  1.013 1.015 1.013 1.001 0.967
 1.031  1.028  1.022  1.016 1.019 1.016 1.002 0.980
 1.029  1.027  1.022  1.015 1.018 1.014 1.000 0.996
 1.025  1.021  1.013  1.006 1.003 0.996 0.977 0.952
 1.035  1.032  1.029  1.018 1.019 1.014 0.999 0.973
 1.026  1.023  1.016  1.008 1.007 1.005 0.984 0.955
 1.030  1.026  1.021  1.013 1.014 1.004 0.984 0.947
 1.023  1.016  1.005  0.994 0.989 0.977 0.948 0.910
 1.028  1.021  1.013  1.000 0.993 0.981 0.955 0.903
 1.021  1.014  1.002  0.988 0.979 0.962 0.925 0.844
 1.018  1.009  0.995  0.981 0.972 0.950 0.917 0.862

# choosing to return the individual sectionals with the finishing speeds, but not to merge

fs.extras(racedata2, 1800, ISects=T, merge=F)
fs1600 fs1400 fs1200 fs1000 fs800 fs600 fs400 fs200  i200  i400  i600  i800 i1000 i1200 i1400 i1600 i1800
 1.038  1.036  1.034  1.026 1.031 1.031 1.026 1.003 15.61 11.46 11.54 11.25 12.01 11.72 11.60 11.51 12.05
 1.028  1.025  1.019  1.013 1.015 1.013 1.001 0.967 14.72 11.54 11.35 11.51 12.01 11.86 11.63 11.65 12.50
 1.031  1.028  1.022  1.016 1.019 1.016 1.002 0.980 15.04 11.49 11.37 11.48 12.01 11.80 11.57 11.80 12.35
 1.029  1.027  1.022  1.015 1.018 1.014 1.000 0.996 14.83 11.60 11.46 11.46 12.09 11.74 11.63 12.07 12.16
 1.025  1.021  1.013  1.006 1.003 0.996 0.977 0.952 14.52 11.51 11.37 11.55 11.91 11.86 11.72 12.09 12.76
 1.035  1.032  1.029  1.018 1.019 1.014 0.999 0.973 15.41 11.51 11.60 11.17 11.95 11.74 11.63 11.83 12.49
 1.026  1.023  1.016  1.008 1.007 1.005 0.984 0.955 14.67 11.57 11.43 11.51 12.06 12.04 11.60 12.00 12.75
 1.030  1.026  1.021  1.013 1.014 1.004 0.984 0.947 15.09 11.52 11.54 11.46 12.12 11.66 11.68 11.92 12.89
 1.023  1.016  1.005  0.994 0.989 0.977 0.948 0.910 14.41 11.37 11.22 11.49 12.03 11.87 11.74 12.35 13.42
 1.028  1.021  1.013  1.000 0.993 0.981 0.955 0.903 15.07 11.46 11.46 11.43 12.03 11.98 11.89 12.20 13.69
 1.021  1.014  1.002  0.988 0.979 0.962 0.925 0.844 14.51 11.52 11.40 11.54 12.09 12.04 11.86 12.15 14.73
 1.018  1.009  0.995  0.981 0.972 0.950 0.917 0.862 14.28 11.49 11.31 11.60 12.24 11.97 12.18 12.73 14.48

Full Example

I’ll run through an entire example below. The best way to keep the functions is to save them all in one .R file (which can be created in Notepad, but saved as with the suffix .R), then sourcing the file every time you need the functions. All told, it boils down to 5lines of code shown below; line 2 sources the functions, line 6 reads the Trakus times into R either from Opera web browser or Excel (Step One), line 9 cleans the Trakus times (Step Two), line 12 names the variables (Step Three), while line 15 or line 34 adds the finishing speeds and other options (Step Four).

# My file is kept in this working directory, and the file is called "meydantrakus.R"
source("./Protracted Contemplation/Meydan/R work/meydantrakus.R")

# From Step One, getting the times into R, Meydan Race 6 from here 
# http://tnetwork.trakus.com/tnet/t_Sectional.aspx?OtherInfo=MEY&EventID=55359
racedata <- read.delin("clipboard", header=F, sep="\t", stringsAsFactors=F)

# Clean the times
racedata <- cleanup(racedata)

# Add Variable names
racedata <- newnames(racedata, 1800)

# Add Individual Sectionals
racedata <- fs.extras(racedata, 1800, ISects=TRUE)
racedata
Pos                Horse  c200  c400  c600  c800 c1000 c1200 c1400 c1600  c1800 fs1600 fs1400 fs1200 fs1000 fs800 fs600 fs400 fs200  i200  i400  i600  i800 i1000 i1200 i1400 i1600 i1800
  1        gabrial (ire) 15.61 27.07 38.61 49.86 61.87 73.59 85.19 96.70 108.75  1.038  1.036  1.034  1.026 1.031 1.031 1.026 1.003 15.61 11.46 11.54 11.25 12.01 11.72 11.60 11.51 12.05
  2 el estruendoso (arg) 14.72 26.26 37.61 49.12 61.13 72.99 84.62 96.27 108.77  1.028  1.025  1.019  1.013 1.015 1.013 1.001 0.967 14.72 11.54 11.35 11.51 12.01 11.86 11.63 11.65 12.50
  3        tarbawi (ire) 15.04 26.53 37.90 49.38 61.39 73.19 84.76 96.56 108.91  1.031  1.028  1.022  1.016 1.019 1.016 1.002 0.980 15.04 11.49 11.37 11.48 12.01 11.80 11.57 11.80 12.35
  4     sanshaawes (saf) 14.83 26.43 37.89 49.35 61.44 73.18 84.81 96.88 109.04  1.029  1.027  1.022  1.015 1.018 1.014 1.000 0.996 14.83 11.60 11.46 11.46 12.09 11.74 11.63 12.07 12.16
  5          vasily (gb) 14.52 26.03 37.40 48.95 60.86 72.72 84.44 96.53 109.29  1.025  1.021  1.013  1.006 1.003 0.996 0.977 0.952 14.52 11.51 11.37 11.55 11.91 11.86 11.72 12.09 12.76
  6        elleval (ire) 15.41 26.92 38.52 49.69 61.64 73.38 85.01 96.84 109.33  1.035  1.032  1.029  1.018 1.019 1.014 0.999 0.973 15.41 11.51 11.60 11.17 11.95 11.74 11.63 11.83 12.49
  7        auditor (usa) 14.67 26.24 37.67 49.18 61.24 73.28 84.88 96.88 109.63  1.026  1.023  1.016  1.008 1.007 1.005 0.984 0.955 14.67 11.57 11.43 11.51 12.06 12.04 11.60 12.00 12.75
  8       starboard (gb) 15.09 26.61 38.15 49.61 61.73 73.39 85.07 96.99 109.88  1.030  1.026  1.021  1.013 1.014 1.004 0.984 0.947 15.09 11.52 11.54 11.46 12.12 11.66 11.68 11.92 12.89
  9     war monger (usa) 14.41 25.78 37.00 48.49 60.52 72.39 84.13 96.48 109.90  1.023  1.016  1.005  0.994 0.989 0.977 0.948 0.910 14.41 11.37 11.22 11.49 12.03 11.87 11.74 12.35 13.42
 10  fantastic moon (gb) 15.07 26.53 37.99 49.42 61.45 73.43 85.32 97.52 111.21  1.028  1.021  1.013  1.000 0.993 0.981 0.955 0.903 15.07 11.46 11.46 11.43 12.03 11.98 11.89 12.20 13.69
 11      do it all (usa) 14.51 26.03 37.43 48.97 61.06 73.10 84.96 97.11 111.84  1.021  1.014  1.002  0.988 0.979 0.962 0.925 0.844 14.51 11.52 11.40 11.54 12.09 12.04 11.86 12.15 14.73
 12    without fear (fr) 14.28 25.77 37.08 48.68 60.92 72.89 85.07 97.80 112.28  1.018  1.009  0.995  0.981 0.972 0.950 0.917 0.862 14.28 11.49 11.31 11.60 12.24 11.97 12.18 12.73 14.48

# Add all options (individual sectionals, positions at each sectional, and an empty section for
# distance raced data that would need to be added by scrolling through Trakus' site

racedata <- fs.extras(racedata, 1800, ISects=TRUE, Trip=TRUE, Pos=TRUE)
racedata
Pos                Horse  c200  c400  c600  c800 c1000 c1200 c1400 c1600  c1800 fs1600 fs1400 fs1200 fs1000 fs800 fs600 fs400 fs200  i200  i400  i600  i800 i1000 i1200 i1400 i1600 i1800 p200 p400 p600 p800 p1000 p1200 p1400 p1600 p1800 d200 d400 d600 d800 d1000 d1200 d1400 d1600 d1800
  1        gabrial (ire) 15.61 27.07 38.61 49.86 61.87 73.59 85.19 96.70 108.75  1.038  1.036  1.034  1.026 1.031 1.031 1.026 1.003 15.61 11.46 11.54 11.25 12.01 11.72 11.60 11.51 12.05   12   10   12   12    12    12    10     5     1   NA   NA   NA   NA    NA    NA    NA    NA    NA
  2 el estruendoso (arg) 14.72 26.26 37.61 49.12 61.13 72.99 84.62 96.27 108.77  1.028  1.025  1.019  1.013 1.015 1.013 1.001 0.967 14.72 11.54 11.35 11.51 12.01 11.86 11.63 11.65 12.50    6    5    5    5     5     4     3     1     2   NA   NA   NA   NA    NA    NA    NA    NA    NA
  3        tarbawi (ire) 15.04 26.53 37.90 49.38 61.39 73.19 84.76 96.56 108.91  1.031  1.028  1.022  1.016 1.019 1.016 1.002 0.980 15.04 11.49 11.37 11.48 12.01 11.80 11.57 11.80 12.35    8    7    8    8     7     7     4     4     3   NA   NA   NA   NA    NA    NA    NA    NA    NA
  4     sanshaawes (saf) 14.83 26.43 37.89 49.35 61.44 73.18 84.81 96.88 109.04  1.029  1.027  1.022  1.015 1.018 1.014 1.000 0.996 14.83 11.60 11.46 11.46 12.09 11.74 11.63 12.07 12.16    7    6    7    7     8     6     5     7     4   NA   NA   NA   NA    NA    NA    NA    NA    NA
  5          vasily (gb) 14.52 26.03 37.40 48.95 60.86 72.72 84.44 96.53 109.29  1.025  1.021  1.013  1.006 1.003 0.996 0.977 0.952 14.52 11.51 11.37 11.55 11.91 11.86 11.72 12.09 12.76    4    3    3    3     2     2     2     3     5   NA   NA   NA   NA    NA    NA    NA    NA    NA
  6        elleval (ire) 15.41 26.92 38.52 49.69 61.64 73.38 85.01 96.84 109.33  1.035  1.032  1.029  1.018 1.019 1.014 0.999 0.973 15.41 11.51 11.60 11.17 11.95 11.74 11.63 11.83 12.49   11    9   11   11    10     9     8     6     6   NA   NA   NA   NA    NA    NA    NA    NA    NA
  7        auditor (usa) 14.67 26.24 37.67 49.18 61.24 73.28 84.88 96.88 109.63  1.026  1.023  1.016  1.008 1.007 1.005 0.984 0.955 14.67 11.57 11.43 11.51 12.06 12.04 11.60 12.00 12.75    5    4    6    6     6     8     6     7     7   NA   NA   NA   NA    NA    NA    NA    NA    NA
  8       starboard (gb) 15.09 26.61 38.15 49.61 61.73 73.39 85.07 96.99 109.88  1.030  1.026  1.021  1.013 1.014 1.004 0.984 0.947 15.09 11.52 11.54 11.46 12.12 11.66 11.68 11.92 12.89   10    8   10   10    11    10     9     8     8   NA   NA   NA   NA    NA    NA    NA    NA    NA
  9     war monger (usa) 14.41 25.78 37.00 48.49 60.52 72.39 84.13 96.48 109.90  1.023  1.016  1.005  0.994 0.989 0.977 0.948 0.910 14.41 11.37 11.22 11.49 12.03 11.87 11.74 12.35 13.42    2    2    1    1     1     1     1     2     9   NA   NA   NA   NA    NA    NA    NA    NA    NA
 10  fantastic moon (gb) 15.07 26.53 37.99 49.42 61.45 73.43 85.32 97.52 111.21  1.028  1.021  1.013  1.000 0.993 0.981 0.955 0.903 15.07 11.46 11.46 11.43 12.03 11.98 11.89 12.20 13.69    9    7    9    9     9    11    11    10    10   NA   NA   NA   NA    NA    NA    NA    NA    NA
 11      do it all (usa) 14.51 26.03 37.43 48.97 61.06 73.10 84.96 97.11 111.84  1.021  1.014  1.002  0.988 0.979 0.962 0.925 0.844 14.51 11.52 11.40 11.54 12.09 12.04 11.86 12.15 14.73    3    3    4    4     4     5     7     9    11   NA   NA   NA   NA    NA    NA    NA    NA    NA
 12    without fear (fr) 14.28 25.77 37.08 48.68 60.92 72.89 85.07 97.80 112.28  1.018  1.009  0.995  0.981 0.972 0.950 0.917 0.862 14.28 11.49 11.31 11.60 12.24 11.97 12.18 12.73 14.48    1    1    2    2     3     3     9    11    12   NA   NA   NA   NA    NA    NA    NA    NA    NA

# As mentioned I am still learning about the wonderful ggplot2 package, but I will create a couple of plots
# We need the ggplot2 and reshape packages (I should probably be using the reshape2 package)
# Both packages developed by R wizard Hadley Wickham, who told me there might be a horse racing R book being written

library(ggplot2)
library(reshape)

# ggplot requires data to be in long form, so we need the melt function from reshape

racedatalong <- melt(racedata, id.vars=c("Pos", "Horse"))

# A first 6 rows of the new long form data frame looks like:

head(racedatalong)
Pos                Horse variable value
  1        gabrial (ire)     c200 15.61
  2 el estruendoso (arg)     c200 14.72
  3        tarbawi (ire)     c200 15.04
  4     sanshaawes (saf)     c200 14.83
  5          vasily (gb)     c200 14.52
  6        elleval (ire)     c200 15.41

# I want to plot the individual sectionals of runners so I do need to find the rows in which
# these variables are, and subset the data into a new data.frame
isectslong <- racedatalong[205:312,]

# find rows with gabrial's and el estruendoso's times, so we can choose to plot their sectionals
gabrial <- grep("gabrial", isectslong$Horse)
elestru <- grep("el estru", isectslong$Horse)

# The first plot is created with the following very simple code looks like Plot One below:

ggplot(isectslong, aes(x=variable, y=value, color=Pos)) + geom_jitter()

# The second plot is built up with layers of options found in the following code.  Typing it all out every time
# isn't necessary, just make small changes/additions, create the plot, change/edit the existing code a bit, 
# create a newer plot, edit/change, plot, etc, etc, until satisfied, you can make so many plots in such a short
# amount of time.  The following code produces Plot Two

ggplot(isectslong[-c(gabrial, elestru),], aes(x=variable, y=value, color=Pos)) + 
    geom_jitter(size=3, position=position_jitter(width=.2)) + 
    labs(x="Sectional", y="Time (s)", title="Meydan (23/1/14) Race 6") +
    theme_bw() +
    scale_color_gradient(limits=c(1,12), low="green")  +
    geom_point(data=i1[gabrial,], color="red", size=4, alpha=.8) +
    geom_point(data=i1[elestru,], color="black", size=3, alpha=.8) +
    geom_text(aes(x=8, y=15.5), label="Gabrial", color="red", size=5) +
    geom_text(aes(x=8, y=15.35), label="El Estruendoso", color="black", size=4)
Plot One

Plot One

Plot Two

Plot Two

About these ads

About this entry