Wednesday, February 21, 2018

R Basic Vector/Matrix Stuff (for the Statistically Inclined but Computer Programming Challenged)

R Basic Vector/Matrix Stuff (for the Statistically Inclined but Computer Programming Challenged)

R Basic Vector/Matrix Stuff (for the Statistically Inclined but Computer Programming Challenged)

Introduction

After some feedback on my previous R blog I have found that a 'Newbie' R/Statistics person needs to have a better foundation in the Vector arithmetic and representation that is the foundation of R. I thought the cursory look provided in my previous blog would suffice. I realize now that R provides multiple ways of accessing Vectors and Matrices (esp. Matrices) that hide the "Vectorness" that is inherent in the language. There are many thing in R that older programmers have already had experience with. The original vector language developed by IBM was known as APL. Dr. Ken Iverson developed a specialized math syntax while at Harvard. IBM hired him to implement that syntax into a computer programming language (Original concepts detailed in reference [2]). This all happened in the 1960s. For those that learned Computer Science in the 60s and 70s they would have had exposure to this language. It has continued on and there is even a free GNU version available today[6]. The problem for many people was the strange symbols that were the basis of the language. Since APL there have been many offshoots that have carried forward this idea of 'Vectors' being the built in data structure of the language but with a design change that uses standard characters found on your standard keyboard for syntax. The language K is probably the most successful commercial implementation of this offshoot[3]. R is probably the most successful open source implementation of these concepts. My personal favorite is the J language which the late Dr. Iverson developed as a redesign of his APL concepts. J has an active user forum and a great collection of articles on their website on the history of APL, Dr. Iverson and many technical articles showing various uses of J in many different areas(see reference [1]).

This history that many Professors and teachers experienced first hand make it difficult for them to explain. It is very easy to assume that something is a simple concept because you forget that you didn't learn it in R. You learned it in some other computer language, programming different types of things. Jumping into R was not that difficult and you appreciate how R has transformed some of the menial tasks into simple function calls. For the 'Newbie' they are left with many WTF moments as things seem to happen by magic. The goal of this blog post is to show you how the basic vector concepts are in everything that you do. This will help you as you try to dissect your data stored in a table. R has many layers on that data that help facilitate creating charts and statistics, but in the end it is all just vectors and matrices (aka arrays and tables).

Vectors/Arrays

A vector has its roots in physics. The idea behind it is that many physical properties are described by a value and a direction. I may push something along at 25 miles per hour but that is only part of the story. I am also pushing it along in a certain direction. Once I come up with a way of telling direction I now must carry 2 values along to let you know exactly what I am doing. So the concept of a vector is a way of carrying around multiple values to describe a single concept. In math and in computers it's not hard to envision that we might want to carry around more than just 2 values. Why not 3? There are after all 3 dimensions. Why not 10? Why not 1000? Hence for our purposes a vector is a way of carrying around multiple pieces of information and referencing them by a single name and an index. Mathematics uses a subscript to identify a particular item in a vector:

\[x = {2,4,6,8}\] \[x_1 = 2 \] \[x_4 = 8 \]

In R access to individual vector elements is accomplished as follows:

> x = c(2,4,6,8) #combine 2 4 6 8 into a vector and store it in x
> x
[1] 2 4 6 8
> # since subscripting is a pain in the neck R uses square brackets
> x[2]
[1] 4
> x[1]
[1] 2
> x[4]
[1] 8
> 

Seems easy enough. In math rather than write out every element of a vector we can use an ellipsis to continue an established pattern. So for example to represent the numbers from 1 to 100 in a vector in Math we do the following:

\[x = {1,2,3,4,\ldots,99,100} \] \[x_3 = 3 \] \[x_{98} = 98 \]

R rotates the ellipsis and uses the ':' (the colon) to implement similar functionality:

> x = c(1:100)
> x
  [1]   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18  19  20  21
 [22]  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42
 [43]  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63
 [64]  64  65  66  67  68  69  70  71  72  73  74  75  76  77  78  79  80  81  82  83  84
 [85]  85  86  87  88  89  90  91  92  93  94  95  96  97  98  99 100
> x[3]
[1] 3
> x[98]
[1] 98
> 

Now here is where R can be deceiving. The colon operator is like the ellipsis but not exactly alike. The colon is only good for generating an increment by one pattern. So for example in math

\[ x = {2,4,6,\ldots,20,22} \]

You instinctively understand I mean to count by 2's up to 22. Trying this in R with the colon operator just increments by 1s from 6 to 20:

> x = c(2,4,6:20,22)
> x
 [1]  2  4  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 22
> # from 6 to 20 R counts by 1s it doesn't try to infer my pattern 

Now that doesn't mean I have to enter in every value for R if I want to count by 2's. But it does mean I have to be more arithmetically distinct in what I tell R to do. Counting by 2's is just counting by 1's up to half the maximum value and multiplying the result by 2. So to accomplish the same thing in R:

> x = 2 * c(1:11)
> x
 [1]  2  4  6  8 10 12 14 16 18 20 22
>

R does do one bit of inference with this operator:

> # one thing R will infer is that if you reverse the order and put the larger number first
> # R will count backwards for you
> x = c(11:1)
> x
 [1] 11 10  9  8  7  6  5  4  3  2  1
> 

But if R didn't do this, it would be easy to reconstruct with some added R functionality: the reverse function 'rev'. This function gives the reverse order of a vector

> # Create a reverse order without switching
> x = c(1:11)
> x
 [1]  1  2  3  4  5  6  7  8  9 10 11
> rev(x)
 [1] 11 10  9  8  7  6  5  4  3  2  1
> # in one line
> x = rev(c(1:11))
> x
 [1] 11 10  9  8  7  6  5  4  3  2  1
> 

I hope at this point you can extrapolate and realize that by investigating the functions available in R we can create our own vectors of data without having to resort to reading it in from a file. This comes in handy for putting together some simple testing data.

Matrix/Matrices

A Matrix wasn't originally a computer driven reality to enslave people to provide power to machines. It is just a mathematical concept for a table of values. It is an extension of the concept of a vector. While a vector has multiple values it is considered a one-dimensional object. This means I only need one index to obtain a value. If I took a set of vectors of the same length and piled them on top of each other I would create a table or Matrix. In mathmatics notation you just put a table of numbers in parenthesis:

\[ M = \begin{pmatrix} 1 & 2 & 3 & 4 & 5 \\ 11 & 12 & 13 & 14 & 15 \\ 21 & 22 & 23 & 24 & 25 \end{pmatrix} \]

\[ M_{1,2} = 2 \] \[ M_{3,3} = 23 \]

Matrices can be created directly in R. But first a little segue to go from vectors to matrices In R start by creating 3 vectors of 5 elements each. Vector1 = {1,2,3,4,5}, Vector2={11,12,13,14,15} and Vector3={21,22,23,24,25}. To save typing call them V1, V2, and V3. Here is the R session to set that up.

> # 3 Vectors of length 5 (notice I use a little math to help create different values)
> V1 = c(1:5)
> V2 = 10+V1
> V3 = 20+V1
> V1
[1] 1 2 3 4 5
> V2
[1] 11 12 13 14 15
> V3
[1] 21 22 23 24 25
> # notice that R added a number to the whole vector V1
>

Even though I had to type each variable to display the data, notice the natural tabular form that appears when looking at the last 3 lines of numbers above. They look like 3 rows of a table. If I wanted the second element of the first row, the 4th element of the second row and the 1st element of the third row. I could access them all as follows (continuing with the vectors I have set up):

> V1[2]
[1] 2
> V2[4]
[1] 14
> V3[1]
[1] 21
> 

I named the vectors with numbers purposefully. If I could form a table and R could extend it's access to account for rows and columns (which it does) I could use one variable name and access any element by just giving the row and column number of that element. V1[2] would be M[1,2] in a table constucted of these vectors and stored in M. Similarly V2[4] -> M[2,4] and V3[1] -> M[3,1] Not only do I save typing but I can also create loops that would be able to go through every member in the matrix in almost any conceivable order I can imagine making looping programs do.

Experimenting with R and its matrix creation function I was able to use the vectors to create a table with each vector above as one row. I did have to use the matrix transpose function 't' (initially). Transpose will flip the matrix by swapping rows for columns (look up matrix transpose if you don't quite understand what it's doing from the session below). In the end I figured out the proper parameters for the matrix function to pile the vectors on top of each other (in row fashion) in one fell swoop.

> # Use matrix function to create a matrix from V1, V2, and Ve
> M = matrix(c(V1,V2,V3),nrow=3,ncol=5)
> M
     [,1] [,2] [,3] [,4] [,5]
[1,]    1    4   12   15   23
[2,]    2    5   13   21   24
[3,]    3   11   14   22   25
> # matrix fills columns first not rows what to do?
> t(M)
     [,1] [,2] [,3]
[1,]    1    2    3
[2,]    4    5   11
[3,]   12   13   14
[4,]   15   21   22
[5,]   23   24   25
> # Lets flip the dimensions around and see what happens
> M = matrix(c(V1,V2,V3),nrow=5,ncol=3)
> M
     [,1] [,2] [,3]
[1,]    1   11   21
[2,]    2   12   22
[3,]    3   13   23
[4,]    4   14   24
[5,]    5   15   25
> # since matrix fills columns first lets fill a vector per column by switching dimensions
> # like above. Now transpose should get us the form we were looking for which is a 
> # vector per row
> t(M)
     [,1] [,2] [,3] [,4] [,5]
[1,]    1    2    3    4    5
[2,]   11   12   13   14   15
[3,]   21   22   23   24   25
> # so lets put it all into one line to make a matrix of our three vectors with each
> # vector in its own row
> M = t(matrix(c(V1,V2,V3),nrow=5,ncol=3))
> M
     [,1] [,2] [,3] [,4] [,5]
[1,]    1    2    3    4    5
[2,]   11   12   13   14   15
[3,]   21   22   23   24   25
> # Now M[1,2] should match V1[2], M[2,4] = V2[4] and M[3,1] = V3[1]
> M[1,2]
[1] 2
> V1[2]
[1] 2
> M[2,4]
[1] 14
> V2[4]
[1] 14
> M[3,1]
[1] 21
> V3[1]
[1] 21
> # Had I dug a little deeper into the matrix function there is a flag to fill by called 'byrow'
M = matrix(c(V1,V2,V3),nrow=3,ncol=5,byrow=TRUE)
> M
     [,1] [,2] [,3] [,4] [,5]
[1,]    1    2    3    4    5
[2,]   11   12   13   14   15
[3,]   21   22   23   24   25
> # got the matrix in 1 step

The above session has an important nuance. I assumed that R would think the way I do: Put vectors into rows. But as the session unfolded it was clear that R is column oriented by default. I was able to adjust once I saw the way R was doing things. This is important! As you begin to think in terms of vector and matrix operations you may find your answer coming from R is not formatted properly or the data doesn't seem to have the right appearance. When you see wierd things happening you must break down your operations and make sure you and R are on the same page (more so you since R is not going to change). When in doubt go to one operation per line, display the results of each operation (or a portion thereof if you have a considerable amount of data). Verify that each operation you are performing is what you expect. You would be surprised how one small typographical error can cause you hours of debugging and anxiety. Your mind will overlook the small error because it will fill in a missing operation as you are looking at it (or ignore it if there is an extra operation). By breaking it down you are verifying to yourself that each operation works as intended.

Row and Column names

I use term 'table' above rather loosely above. Don't confuse this with any add-on packages that have tables. I mean it in the simplest sense as a way of describing 2 dimensional data. R has another table type structure called a 'data frame'. So what's the difference between a matrix (which I have shown as a 'table' of numbers) and an R data frame? In an R data frame you can have a mix of data types between columns. Each individual column needs to have data of the same type but the next column can have a completely different datatype (as long as it's consistent within that column). So in a matrix all the data must be the same across all rows and columns and in a data frame there can be some mixing of data types on a column by column basis.

Now you access data in a 'data frame' by indexing the same way as you do with a matrix. The trick is not to do any operation on that data that is inconsitent with the datatype of the column. So in a matrix (since all the data is the same type) I can add together any 2 selected elements (if the data is of numeric type).

> # Create a vector of 25 elements from 1 to 25
> v <- 1:25
> v
 [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
> # Use vector v to create a matrix that is 5x5 of those elements
> m <- matrix (v,5)
> m
     [,1] [,2] [,3] [,4] [,5]
[1,]    1    6   11   16   21
[2,]    2    7   12   17   22
[3,]    3    8   13   18   23
[4,]    4    9   14   19   24
[5,]    5   10   15   20   25
> # Add m[2,3] and m[3,2] together
> m[2,3]
[1] 12
> m[3,2]
[1] 8
> m[2,3]+m[3,2]
[1] 20
>

Nothing surprising. I make a matrix of integer values and I can add them together any way I please.

What about naming columns and rows? Here it turns out there are multiple ways of naming columns and rows depending if the underlying data structure is a matrix or 'data frame'. The following calls work the same across all of those structures. A 'data frame' has a built in $ operator it is used to access a whole column of data in a 'data frame' by name. I include its use the session below:

> # Give names to the columns and rows
> m
     [,1] [,2] [,3] [,4] [,5]
[1,]    1    6   11   16   21
[2,]    2    7   12   17   22
[3,]    3    8   13   18   23
[4,]    4    9   14   19   24
[5,]    5   10   15   20   25
> colnames(m) <- c("C1","C2","C3","C4","c5")
> m
     C1 C2 C3 C4 c5
[1,]  1  6 11 16 21
[2,]  2  7 12 17 22
[3,]  3  8 13 18 23
[4,]  4  9 14 19 24
[5,]  5 10 15 20 25
> # Now the rows
> rownames(m) <- c("r1","R2","r3","R4","r5")
> m
   C1 C2 C3 C4 c5
r1  1  6 11 16 21
R2  2  7 12 17 22
r3  3  8 13 18 23
R4  4  9 14 19 24
r5  5 10 15 20 25
> # We can still access with number indexes as before
> m[2,3]
[1] 12
> # But now we can use names as indexes instead
> m ["R2","C3"]
[1] 12
> # Is this where we can start using the $ in the variable name?
> m$C2
Error in m$C2 : $ operator is invalid for atomic vectors
> # No we can't use that type of access for a matrix
> # Turn m into a dataframe d and see what we can do
> d <- as.data.frame(m)
> d
   C1 C2 C3 C4 c5
r1  1  6 11 16 21
R2  2  7 12 17 22
r3  3  8 13 18 23
R4  4  9 14 19 24
r5  5 10 15 20 25
> # It doesn't look that much different but here are the different ways
> # to access data.
> d[2,3]
[1] 12
> d["R2","C3"]
[1] 12
> d["R2",]$C3
[1] 12
> d$C3
[1] 11 12 13 14 15
> d[2,]
   C1 C2 C3 C4 c5
R2  2  7 12 17 22
> d["R2",]
   C1 C2 C3 C4 c5
R2  2  7 12 17 22
> 

Data Frames

The data frame's strength comes from being able to handle tabular data of different data types. The following session creates a data frame with a mix of data types and shows how you have to be careful what operations you choose to do. By supplying column names in the creation of the 'data frame' there is no need to perform a separte operation to insert them into the 'data frame'.

> d2 <- data.frame(C1=c(1:5),C2=c("a","b","c","d","e"),C3=c("john","joesph","james","jane","janet"))
> d2
  C1 C2     C3
1  1  a   john
2  2  b joesph
3  3  c  james
4  4  d   jane
5  5  e  janet
> d2[1,1]+d2[3,1]
[1] 4
> d2[1,1]+d2[1,2]
[1] NA
Warning message:
In Ops.factor(d2[1, 1], d2[1, 2]) : ‘+’ not meaningful for factors
> # We can do some comparisons on the character data
> "a" == d2[2,2]
[1] FALSE
> "a" == d2[1,2]
[1] TRUE
> "james" == d2[3,2]
[1] FALSE
> "james" == d2[3,3]
[1] TRUE
> d2[1,]
  C1 C2   C3
1  1  a john
> d2$C2
[1] a b c d e
Levels: a b c d e
> 

The other strength of a 'data frame' is that it can be used seamlessly with functions that read in comma separated values. This allows you to pull in data sets from databases or websites and operate on them easily. Since comma separated value files usually include a first line of column names, the 'data frame' will already have column names inside after a read operation.

Conclusion

These topics are covered in more depth in the pdf text "An Introduction to R" [7]. Hopefully this blog has provided some insight into the workings of R and vector languages in general. The purpose here was to give just enough vector stuff to get you through debugging a statistics assignment when things go wrong. Usually the data is structured in a manner that's different from how your mind is perceiving it. This causes you to make improper function calls. I can't say this enough when in doubt break things down! Try functions on smaller pieces of data and make sure you get an answer you expect. Once things are operating the way you expect you can extrapolate up to larger datasets.

References

  1. http://www.jsoftware.com/ great vector based language. Excellent forum to search various subjects. There is an R interface to the J language so you can work in J and use R when you need something statistical that J doesn't have. Search the website for Ken Iverson they have some execellent essays on the beginnings of APL and vector languages
  2. Iverson, Kenneth E. “A Programming Language.” A Programming Language, J Software Inc., 13 Oct. 2009, www.jsoftware.com/papers/APL.htm.
  3. https://kx.com/ The company that produces the K-language and Kdb (a database based on the K-language)
  4. http://www.r-tutor.com/ offers nice tutorials on various aspects of R. It also has some nice deep-learning info. Always seems to come up first when googling an R language reference
  5. https://stackoverflow.com/questions/2281353/row-names-column-names-in-r discussion on matrix and dataframe row and column names
  6. https://www.gnu.org/software/apl/ GNU's apl implementation
  7. https://cran.r-project.org/doc/manuals/r-release/R-intro.pdf A good general (not so statistical) introduction to the language that covers many of these details in greater depth. It's a PDF you should download a copy

Author: NASTY OLD DOG

Validate

Tuesday, February 6, 2018

Simple Statistics and the R Language

Statistics and the R Language

Statistics and the R Language

Introduction

Statistics plays such an important role in so many different fields that a large portion of College students will find themselves having to take a Statistics course. That course will generally involve learning how to use a statistics package. At the tail end of the second decade of the 21st Century that package will likely be the R language. As a new student to Statistics you aren't privy to the steps leading up to the R language. You didn't have to live throught the use of hand written routines to calculate out the statistical functions on a computer. You didn't have to live through the days of using a hand calculator with statistical function keys. You don't need to wade through statistical function tables in the appendix of a book to calculate your statistical distribution functions. At the same time though you become detatched with the actual mathematics that is statistics. The pre-packaged functions of the R statistical library hides all that hard work from you.

My introduction to statistics happened rather late in my life in the first decade of the 21st century in the Harvard extension Biostatistics class. At that time there were various statistics packages that cost a considerable amount of money to purchase (luckily there were educational discounts). The book used for the course provided different examples of solutions in a couple of different statistical packages but centered around one called Stata. By seeing the same problem solved with different package side by side, the actual mathematics of statistics became the basis for understanding the idiosyncracies between the statistics packages. This maintained a link between the mathematical representation of the statistics formulas and their use in the statistics software.

With R being the popular statistics language there is less of a tendency to dive back into the mathematical formulas and you may lose some understanding of the mathematics in the process. This blog post is meant to be an elementary bridge between some of the simple formulas of statistical mathematics and their R built-in counter parts. To do this I am going to use (actually rely on heavily for example data and answers) a paper of Dr. Keith Smillie at the University of Alberta. He produced a wonderful Statistics package for the J language and summarizes it in the paper "J Companion for Statistical Calculations"[1]. I am going to transpose some of that into R covering a subset of his implementation. Hopefully you will be able to then use that format to extrapolate the mathematical connections for subjects I haven't covered here. Use this type of framework to help you gain more understanding of the theory and implementation of statistics.

R is an Array Language

R uses Vectors or Arrays as a built-in element of it's language. If you aren't used to array languages it can seem pretty strange at first, but once you get the hang of it it's pretty cool. You do have to think in terms of operating on the whole vector. That's because R has functions that streamline vector operations. You are welcome to breakdown those operations so they look like a regular programming language, but that may slow down processing time when dealing with large amounts of data.

R is an interpreted language

This just means that R grabs your R code and tries to execute it as soon as you type it in and hit the enter key. You can also save code in a file and load it into the R console environment. But you can do quite a bit at the command prompt in the R Console. Look at the following R Console Session. The session uses the comments operator '#' to add commentary in line. The comment operator '#' is usually called the 'hash' sign (you might know it as the number sign). When R sees the hash sign it will ignore the sign and everything else to the end of the line. Anything that preceeds the hash will be executed as R code.

> # The hash comments out the rest of the line in R
> 2.3 # entering a numeric constant R returns it as a vector of one value
[1] 2.3
> # Assignment using the back arrow operator <-
> w <- 2.3 # assigning 2.3 to the variable w
> w 
[1] 2.3
> # add 2 numeric constants
> 2.3 + 2
[1] 4.3
> # you can use the = sign for assignment as well
> w = 3
> w 
[1] 3
> # see we have changed w from 2.3 to 3

Creating vectors(array or list if you prefer) of more than 1 value

You create vectors or lists by Concatenating (or Combining) values together. R has a c() function for doing just that.

> w <- c(2.3,5,3.5,6)  # w will now have multiple values in it
> # display w by typing it in at the command prompt
> w
[1] 2.3 5.0 3.5 6.0
> 

Arithmetic Mean

This is what is commonly known as the average of a list of values. We calculate it by taking a list of numbers, suming them and dividing by how many numbers are in the list

\[w = {2.3,5,3.5,6} \] \[n = 4 \]

Doing this by hand you compute the mean using the following formula: \[mean = \frac{\displaystyle\sum_{i=1}^{n}w_{i}}{n}\]

so

2.3 + 5 is 7.3
7.3 + 3.5 is 10.8
10.8 + 6 is 16.8

divide 16.8 by the total number of values (4): 16.8/4 is 4.2

How do we accomplish this in R? There are 3 categories of solutions:

  • The first is the old fashioned way: Loop through the values of interest that we stored in w, accumulate the sum of those numbers then after the looping is finished divide by the number of values.
  • The second is the vector way: use R vector operations to operate on the whole group of values
  • The third is to use an R built-in function to calculate in one step

Traditional looping average calculation

Lets create a function myave that does it the first way.

myave <- function (values_vector) {
 sum = 0
 for (i in 1:length(values_vector)) {
  sum = sum + values_vector[i]
 }
 # i should be the value of the last index and therefore the number of values
 sum/i
}

In the R console you will need to open the source editor by clicking the blank page icon in the tool bar at the top of the console. Save the above text into the editor window. Then click the 'source .r' icon and type in the name of your script.

> source("/Users/Nasty/Downloads/LearnBayes/R/myave.R") # where my file ended up
> myave
function (values_vector) {
 sum = 0
 for (i in 1:length(values_vector)) {
  sum = sum + values_vector[i]
 }
 # i should be the value of the last index and therefore the number of values
 sum/i
}
> # myave has a value of the function code so you can see it as above
> # Now lets use it create the vector of values
> w = c(2.3,5,3.5,6)
> w
[1] 2.3 5.0 3.5 6.0
> # plug it into myave
> myave(w)
[1] 4.2

A more vector way to calculate

> sum(w)
[1] 16.8
> sum(w)/length(w)
[1] 4.2
> 

This is pretty simple and to reduce typing you could create an R function to do it. You don't need the editor since it's a one-liner.

>  myave1 <- function(values) {sum(values)/length(values)}
> myave1
function(values) {sum(values)/length(values)}
> myave1(w)
[1] 4.2
> 

R built-in method

Now R being a statistics package it must have a predefined function that will do this. It does have something called 'ave'. But it has a wierd idiosyncracy that the above functions don't share. Let's try it.

> ave(w)
[1] 4.2 4.2 4.2 4.2

It turns out that the 'ave' function runs the average on subsets of our values. So it produces a vector of averages. Not quite what we wanted right now. So guess at a function name at your own risk. In the computer world there is an old acronym RTFM. Which in nice language stands for Read the Stinking Manual. So try to look up what you want to do (Google is your friend here).

So the built-in function we really want to use is 'mean'

> mean(w)
[1] 4.2

Now mean has other parameters that you can use that have default values. So for our usage it works with the default values. Depending on how in depth your statistics course is you may discover them. If you're interested check https://www.rdocumentation.org/packages/base/versions/3.4.3/topics/mean

Frequencies

It's helpful to look at frequencies of occurances many times when analyzing data. So how would we use R vector operations to build a list of frequencies. The key is to know the complete range of values that the random process we are looking at can take on. This is because in a small sample some of the values may not show up and the number of occurances will be 0. To compare all the values of one vector with all the values of a second vector we use a function known as 'outer product'. Outer product will take an operator and execute it between values of the 2 vectors. R has an outer product built-in function called 'outer'. It takes 3 parameters, 2 vectors and an operator symbol. It then does all the heavy lifting of applying the operator between each element of vector1 to each element of vector2. This will produce a table of calculations. In our case using equality operator '==' we obtain a matrix of truth values.

> # frequencies from a list of die rolls stored in a vector
> D = c(4,5,1,4,3,6,5,4,6,4,6,1)
> D
 [1] 4 5 1 4 3 6 5 4 6 4 6 1
> # range of values of a single die
> r = c(1:6)
> r
[1] 1 2 3 4 5 6
> # use the outer product function 'outer' to create a table of what in D == values in r
> outer(r,D,"==")
      [,1]  [,2]  [,3]  [,4]  [,5]  [,6]  [,7]  [,8]  [,9] [,10] [,11] [,12]
[1,] FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE
[2,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[3,] FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[4,]  TRUE FALSE FALSE  TRUE FALSE FALSE FALSE  TRUE FALSE  TRUE FALSE FALSE
[5,] FALSE  TRUE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE
[6,] FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE  TRUE FALSE  TRUE FALSE
> # these are all logical values that we want to sum to create frequencies
> # if we multiply by 1 R will convert these to numeric values for us
> 1*outer(r,D,"==")
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
[1,]    0    0    1    0    0    0    0    0    0     0     0     1
[2,]    0    0    0    0    0    0    0    0    0     0     0     0
[3,]    0    0    0    0    1    0    0    0    0     0     0     0
[4,]    1    0    0    1    0    0    0    1    0     1     0     0
[5,]    0    1    0    0    0    0    1    0    0     0     0     0
[6,]    0    0    0    0    0    1    0    0    1     0     1     0
> # sum the rows of the equality table to obtain the frequencies
> # there happens to be a function called 'rowSums' that will do just that
> rowSums(1*outer(r,D,"=="))
[1] 2 0 1 4 2 3
> 

Next we combine the range values with their respective frequencies into a table.

> # lets assign the frequencies to a variable f
> f = rowSums(1*outer(r,D,"=="))
> f
[1] 2 0 1 4 2 3
> # Now lets combine the range of values with their frequencies
> matrix(c(r,f),length(f),2)
     [,1] [,2]
[1,]    1    2
[2,]    2    0
[3,]    3    1
[4,]    4    4
[5,]    5    2
[6,]    6    3
> # to see them in horizontal representation use the matrix transpose function 't'
> t(matrix(c(r,f),length(f),2))
     [,1] [,2] [,3] [,4] [,5] [,6]
[1,]    1    2    3    4    5    6
[2,]    2    0    1    4    2    3
> # R has a 'barplot' function to make a nice graph of the frequencies
> # however it's not quite smart enough to break up our matrix representation
> # so we will go back to using r and f vectors
> barplot(f)
>

> # Not very informative if you look up the function we can add some parameters
> # to name the bars and place labels and a title
> barplot(f,names.arg = r,main="Die Roll Frequencies",xlab="Die values",ylab="Occurances")
> 

Now R has a function built in to create a frequency table. It's called 'table'

> table(D)
D
1 3 4 5 6 
2 1 4 2 3 
> 

It doesn't quite provide the same thing we calculated. This is the distinct frequency list. Remember the 2 value of the die had no rolls in our list. So 'table' doesn't include it.

Median and Quartiles

Median is the "middle" observation when you look at a set of sorted data. So rather than being a complex formula this is more of a positional definition. Sometimes it actually is in the exact middle (which happens for an odd number of items) for example: 3 4 5 6 7 then 5 is the median and the actual middle number. But if you had: 4 5 6 7, there is no actual middle number. In this case 5.5 would be condidered the median. To find the median in R we would perform the following vector operations:

  • sort the data into a new sorted vector
  • find the index of the middle number (if there are a odd number of values) or find the middle 2 numbers (if an even number of values)
  • return the number found or calculated above.

Median for even length vector

> # Median by hand
> # sample data:
> M = c(22, 14, 32, 30, 19, 16, 28, 21, 25, 31)
> M
 [1] 22 14 32 30 19 16 28 21 25 31
> # need to sort the data. we will cheat and use the sort function in R rather than 
> # doing it by hand
> sM = sort(M)
> sM
 [1] 14 16 19 21 22 25 28 30 31 32
> length(sM)
[1] 10
> # length is even so we divide length by 2 and get the value at the calculated index
> # and we need the value at the calculated index + 1 as well
> midx = length(sM)/2
> midx
[1] 5
> midx+1
[1] 6
> # so jumping ahead a couple of steps and putting all together
> (sM[midx]+sM[midx+1])/2
[1] 23.5
> # Thats the median that lies between the 2 values
> sM[midx]
[1] 22
> # and
> sM[midx+1]
[1] 25
> 

Median for odd length vector

> # lets add a value to our even vector to make the length odd
> M1 = c(M,40)
> M1
 [1] 22 14 32 30 19 16 28 21 25 31 40
> # formula for the index of odd length
> m1idx = (length(M1)+1)/2
> m1idx
[1] 6
> # we still need to sort before we find the median
> sM1 = sort(M1)
> sM1
 [1] 14 16 19 21 22 25 28 30 31 32 40
> # just select the median value at m1idx now
> sM1[m1idx]
[1] 25
>

Package it into an R script/function

mymedian <- function(myvector) {
# we will need to sort the vector for both cases
# let's do that now
 sM = sort(myvector)
 if (0 == length(myvector)%%2) {
  # even length code
  midx = length(sM)/2
  z = (sM[midx]+sM[midx+1])/2
 } else {
  # odd length code
  midx = (length(sM)+1)/2
  z = sM[midx]
 }
 z
}

Now test the function on the 2 data sets and compare mymedian against R's median function

> # Test out mymedian and R's median function
> # first what were the data sets?
> M
 [1] 22 14 32 30 19 16 28 21 25 31
> M1
 [1] 22 14 32 30 19 16 28 21 25 31 40
> # source the mymedian function
> source("/Users/Nasty/Downloads/LearnBayes/R/mymedian.R")
> mymedian
function(myvector) {
# we will need to sort the vector for both cases
# let's do that now
 sM = sort(myvector)
 if (0 == length(myvector)%%2) {
  # even length code
  midx = length(sM)/2
  z = (sM[midx]+sM[midx+1])/2
 } else {
  # odd length code
  midx = (length(sM)+1)/2
  z = sM[midx]
 }
 z
}
> mymedian(M)
[1] 23.5
> mymedian(M1)
[1] 25
> # what about R's built in median function
> median(M)
[1] 23.5
> median(M1)
[1] 25
> 

For any statistical definition, you could code it your self using R. But R was built with statistics in mind. Someone has probably already beat you to it. This means that there is probably a built-in function or a library function that already has that functionality. However, by going from a statistics book that gives you a mathematical definition and using the R function you lose the transition from mathematics into implementation that would seal in the math stuff in your mind. Your professor is going to expect that you know the theoretical math to some extent so don't overlook it. If your having trouble understanding just what the math is driving at try to implement it (if you have the time). Your text should have some simple examples with data you can test with. You won't have to do it every time and you can always google an implementation and just read it to see what it would look like. It may give you some understanding of what the text book is trying to define mathematically.

Quartiles

> # Quartiles are the numbers that mark the boundries such that
> # one quarter of the data lies between any consequtive quartile number
> # the easiest quartile to find is quartile 2. It's just the median
> Q2 = median
> # Q1 is the median of all numbers less than the value found by Q2
> Q1 <- function (myvec){median(myvec[which(Q2(myvec)>myvec)])}
> # lets set up some test data
> u = c(22,14,32,30,19,16,28,21,25,31)
> Q2(u)
[1] 23.5
> Q1(u)
[1] 19
> # Q3 is the median of all numbers greater than Q2 (ie. the median)
> Q3 <- function (myvec){median(myvec[which(Q2(myvec)<myvec)])}
> Q3(u)
[1] 30
> 

five-statistic

This is a summary of the data producing the following values:

min,Q1,Q2,Q3,max

> # five-statistic definition
> five <- function(myvec){c(min(myvec),Q1(myvec),Q2(myvec),Q3(myvec),max(myvec))}
> five(u)
[1] 14.0 19.0 23.5 30.0 32.0
> 

Mode

One thing that R does not provide a built-in for is the statistical mode. This is defined as the most frequently occuring data value in your vector of data. So this one we have to define. If I follow Prof. Smillie's example for J but use R equivalents this is what the code will look like.

# return the indexes of the most frequent values
# the %in% tests membership and produces a vector of truth values
# which converts the TRUEs to their indexes
# imx - short for indexes of maximum value of a vector
imx <- function(myvec){which(myvec %in% max(myvec))}

# ufrq - returns a frequency list of all accounted for values.
# values in the domain that don't appear are not accounted for
# ufrq short for unique values frequencies
ufrq <- function(myvec){rowSums(selfclassify(myvec))}

# simpleMode - the simple most frequent value (statistical mode) in a vector list
# it relies on the fact that ufrq will produce frequencies in the order that 
# unique returns the uniques members of the data
simpleMode <- function(myvec){unique(myvec)[imx(ufrq(myvec))]}
> source("/Users/Nasty/Downloads/LearnBayes/R/simpleMode.R")
> imx
function(myvec){which(myvec %in% max(myvec))}
> ufrq
function(myvec){rowSums(selfclassify(myvec))}
> simpleMode
function(myvec){unique(myvec)[imx(ufrq(myvec))]}
> D
 [1] 4 5 1 4 3 6 5 4 6 4 6 1
> simpleMode(D)
[1] 4
> simpleMode(c(1,2,3,2,3,2,3,4))
[1] 2 3
> simpleMode(c(1,2,3,4))
[1] 1 2 3 4
> 

Variance

This is a measure of how close each element of a data set is to the mean of the data set. This involves summing the squares of the differences between the value and the mean. Some people have trouble with this because it introduces squaring. When logically if you wanted to find out how far a value is from the mean you would just subtract the smaller number from the larger number from each other and get your answer. So a naive set of steps might look like the following R-like pseudo code:

m = mean(values)
for (i=1:length(values)) {
 if (m > values[i]) {
  diff[i] = m - values[i]
 }
 else {
  diff[i] = values[i] - m 
 }
}
diff

Now you could fix that up and get the values for the distance from each mean but we don't have a very sussinct mathematical formula. So the mathematical way of doing this without all the if statements is the following:

  • for each i={1..N} valuesi - mean
  • now that will give some negative numbers so one mathematical way of getting rid of negatives is to square the numbers.
  • for each i={1..N} (valuesi - mean)2
  • This formula will give all positive numbers that will be related to the actual distance.
  • but having separate distance related numbers doesn't tell us much so one thing you could do is sum all the squared distances up and divide by the number of values. In essence take the mean of the squared distances.
  • Divide by N or N-1. What's up with that?
    • Population - this is the total universe of everybody or everthing you want to study. Now you may have data on every single subject of interest. In this case you would be dividing by N.
    • Sample - This is where you look at a few or some number less than the number of things in the population (ie. you 'sample' the population). Since you sample you could have been unlucky and chosen things that were very similar with very similar data. This means there is still larger variation in the population. To account for that in the variance you use N-1 to divide. This enlarges the variance but as your sample size approaches the population size the difference in dividing by N or N-1 is negligible. = for example: (N=2) 1/2 = 0.5 and 1/3=0.33 the difference is 0.17 = But: (N=100) 1/99 = 0.0101 1/100 = 0.01 the difference is 0.001 = Makes sense naively you would think the more samples you have the higher the accuracy. This is just a mathematical way to express it.
> # the built in variance calculation
> # remember the w data from before?
> w
[1] 2.3 5.0 3.5 6.0
> mean(w)
[1] 4.2
> w - mean(w)
[1] -1.9  0.8 -0.7  1.8
> (w - mean(w))^2
[1] 3.61 0.64 0.49 3.24
> sum((w - mean(w))^2)
[1] 7.98
> sum((w - mean(w))^2)/(length(w)-1)
[1] 2.66
> # or using the built-in var function in r
> var(w)
[1] 2.66
> 

Standard Deviation

variance is useful for a variety of things that you may go into in your course. But you are probably thinking if I just took the absolute value of my numbers I would have a more exact representation of the distance each value is from the mean. You would be correct but the accepted mathematical way to do this is to take the square root of the variance instead and call it Standard Deviation.

> var(w)^0.5   # you should remember that square root is the same as taking to the 1/2 power
[1] 1.630951
> sqrt(var(w)) # in case you didn't believe me
[1] 1.630951
> sd(w)
[1] 1.630951
>

Create a Statistics Summary

let's put this all together and create a statistics summary. Essentially this is a report or table of everything we have just covered.

ssummary <- function(myvec){
 cat("Sample size\t\t\t",length(D),"\n")
 cat("Minimun\t\t\t\t",min(D),"\n")
 cat("Maximum\t\t\t\t",max(D),"\n")
 cat("Arithmetic mean\t\t",mean(D),"\n")
 cat("Variance\t\t\t\t",var(D),"\n")
 cat("Standard deviation\t",sd(D),"\n")
 cat("First Quartile\t\t",Q1(D),"\n")
 cat("Median\t\t\t\t",median(D),"\n")
 cat("Third Quartile\t\t",Q3(D),"\n")
}

So lets use it in R

> # using summary on our dice data variable D
> D
 [1] 4 5 1 4 3 6 5 4 6 4 6 1
> ssummary(D)
Sample size          12 
Minimun    1 
Maximum    6 
Arithmetic mean   4.083333 
Variance   2.992424 
Standard deviation  1.729862 
First Quartile   1 
Median    4 
Third Quartile   6 
> 

Now if you want something more fancy, there are ways to export this stuff to latex or HTML, rather than just printing to the screen within the R console. That is for you to look up and figure out. By the way R has a built-in 'summary' function:

> summary(D)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  1.000   3.750   4.000   4.083   5.250   6.000 
>

Conclusion

There is much more in the main reference for statistics and it's implementation in J in Dr. Smillie's paper.

Bibliography

  1. Smillie, K. (1999, January). J Companion for Statistical Calculations. Retrieved February 05, 2018, from https://webdocs.cs.ualberta.ca/~smillie

Author: NASTY OLD DOG

Validate

Friday, July 24, 2015

Studying at College

Studying At College

Studying At College

Introduction

This blog has spent a significant amount of time providing insight into AP Computer Science. The next step ultimately will be going to College. AP courses offer a taste of college material but that material is spread over the course of a school year. In college that same material is covered in about 15 weeks. That really is the difference and it's a huge difference. There is very little room for an error. There will be fewer tests each covering more material. Since many who come to this site are just starting out in the field of Computer Science I thought I would collect some web sites here that can provide you some help on studying.

My own experience, way back when, was a hodge podge of mistakes and mishaps. I only wish the world wide web was around when I was going to college. If nothing else than to find out what help was available at my school for subjects I was having trouble with. Instead I would ask around or ask someone who was doing well in the course but in the end they had limited time to help. My college years were spent muddling through as best I could, studying long hours without seeing much success on these tough courses. In my day jobs were plentiful and GPA was important but it wasn't everything. Once you graduated and gained real life experience you could shift jobs, change direction without much difficulty.

Those days are gone. Jobs are a little more difficult to come by. US corporations keep lobbying for increases in the H1B visa program so they can bring in foreign STEM workers cheap. Why should I hire a B student from any State University when I can grab an A student from India Institute of Technology? In a better day US companies were restricted from doing things like that. Now, not so much. That doesn't mean you can't find a job it just means that now more than ever your GPA right out of college is your ticket to getting your foot in the door of corporate America.

Do what you love and never work a day in your life.

There is truth to this old saying. You have to be realistic of course. No one is going to pay you to eat and drink beer all day. But you can get paid for a variety of fun jobs. If you like programming computers and designing software there are jobs available that pay pretty well. If you like training dogs there are ways to make money doing this. So your first step when you go to college is to figure out what Major you are going to love.

I didn't do that. I decided that Computer Science was too easy and that I needed to understand how the electronics worked behind the computer. So I suffered through an Electrical Engineering degree, when I could have breezed through a Computer Science degree. In the end it worked out OK but it took me 4 years after college to finally land a software job. In the end the hardware and software experience would help me start up a company so I can't complain. But I could have taken Computer Science as my major and taken the fun courses in Electrical Engineering as part of my electives and looked a heck of a lot better on paper.

I bore you with this only because if you are a college student who is struggling with their course load, I know the feeling, I've been there. I've made every mistake in the book. I've told myself every excuse there was on how it wasn't my fault. The reality was two fold: First I was in the wrong major and second I had no idea how much time I was supposed to be spending on studying. From my high school experience I needed a couple of hours of studying the night before a test. In college that is not going to cut it unless your a super genius and if you are a super genius thank you for reading my blog! But if you're a mere mortal like me it would be nice to know the minimum amount of time you should be working.

Getting into "The Flow"

You may have experienced what people call "The Flow" when working on a computer project. It's that point in time where the work ahead is clear in your mind and you are just spitting out lines of a program quickly and easily. When you come out you realize you have just been super productive have gotten 80 to 90% of the work done and you're wondering: why couldn't that have happened sooner? I would have been completely finished by now. You may have experienced "The Flow" when reading a good book. You get to a particular section where the story sort of tells itself and you don't really notice the time passing. You become "absorbed" in the book.

The key is to get into "The Flow" for whatever you happen to be studying. This can be quite difficult for a required course you don't like very much and it can be simple for a course you love. The point is you need to train yourself to get into "The Flow" quickly whenever you need to whether you like the material or not. That is easier said than done and I don't know of a method to guarantee this happens quickly. My method is to observe yourself. How long do you have to sit there and what things do you do to get yourself into "The Flow"?

Start creating a routine (this is where you start to become OCD and superstitious) that you do before you begin to study. If you have to pace back and forth 5 times after you set up your study material on the desk, so be it. If you need a sip of coffee, red bull or diet coke make sure you have it handy. If you need to wash your hands 5 times OK whatever, it doesn't matter. What does matter is that you develop a routine that you do every time you start to study. The idea is to condition the mind over time that this routine will lead to "The Flow" and sooner or later your mind will start to cooperate.

For me, I found that I basically needed 5 hours to study 3 (meaning over the 5 hours I sat down to study only 3 hours of "good" study happened). Sometimes 5 hours to get 2 hours of good study. This becomes the crux of the matter, I alluded to the minimum amount of time you needed to study for each course in the previous section. We will get to that number shortly but that number represents good study time. By observing yourself and objectively assessing your use of study time, you should be able to set up a schedule for yourself and map out your entire week. Guess what. If you need 5 hours to study 3 there isn't going to be much time left for extracirricular activities.

The Formula credit hours to study hours

The University of Michigan at Flint study website (https://www.umflint.edu/advising/surviving_college) says you need:

  • 2 hours of study for every 1 credit hour of class
  • So a 12 credit hour semester = 24 hours of extra study per week.

Based on that, between classes and studying you will chew up 36 hours per week. So even a light semester schedule is equivalent to a full time job in terms of hours spent. If you're like me and you need 4 - 5 hours to get 2 hours of good study you are looking at about 60 hours per week. That's the equivalent of full time job and a part time job at night. If you want weekends off you will need to start work at 8am and finish by 5pm Monday - Friday (includes 1 hour lunch) if your an efficient studier. To get to 60 and have weekends off 8am - 10pm with an hour for lunch and an hour for dinner included. If you want to join a club and need your evenings free, you need to study more efficiently (good luck with that because if your still reading then your doomed like me ot be an inefficient studier, not because you're reading slowly but because your still looking for advice) or you will need to use your weekend time to make up the difference.

This may seem absurd to you but I assure you that in the real world of the salaried employee there are many times where you are called on to work 60 - 80 hours per week. That's the point of college, right? To prepare you for what you might have to do in the real world? I hope you take my advice because I assure you I have tried everything to avoid having to aquiesce to this reality and in the end I found out that there is no way around hard work. Trust me on this I spent a lot of time in my life trying to make up for this fact. After college if I wasn't studying at night to perform better at my job I was taking courses so I could get the job I really wanted. So the hard work eventually gets put in, it's all a matter of timing. It also shows that what employers are really buying when they hire a college graduate is a person who has learned how to study. It provides a level of confidence that when new stuff crops up on the job the employee will be able to work through it and figure out a way to accoplish it. In a technical field most of your knowledge will be obsolete in 5 years anyway, so the reality is your entire value you bring to the table as a college graduate is your ability to study.

Links to other study sites

I would encourage you to do your own web search on the topic of study skills or study in college on a frequent basis. You may discover something that is very helpful to your own personality. Remember you must take this information and personalize it for yourself. Somethings that work for other people may not work for you. But you need to keep searching for the answers.

Study block or the little voice that won't

In any endeavor there is a part of your mind that tells you: you can't do it. this is a waste of time. Your a fool for trying. This will never work. I noticed it first when I played keeper in Soccer. It was fun to dive at first but after a few dives I noticed the thought process increased (ie. that little voice got louder). Sometimes your internal dialog is helpful, but a lot of times it's your own worst enemy. Somewhere during a game the thoughts would be you're never going to stop this guy's shots why even bother to dive? or He's in too close the ball will be well past you even if you try to dive. At this point I would dive no matter what. Even if my reaction time was off and I'm diving well after the shot. The point was to show that part of my mind we are diving anyway there is no way to avoid the pain so from now on let's get with the program and see if we can stop these shots.

That same thought process happens when you try to study. It starts as: "lets play a video game first". "Check what's on TV we can study later". "It's such a long walk to the library why bother just read a little bit hear in your room". But if you choose to entertain the thought process it devolves into. "You're going to fail anyway". "2 hours is enough you know all you can get is a C". "You're just not good at this course so why bother with it". This is all part of what I call study block. It's like writer's block but it encompasses the whole study process not just the writing part.

The way out of the thought process is rituals. By going through rituals you are telling your inner voice I'm not listening. See I'm getting ready to study and there's nothing you can do about it. Over time you train yourself to get into a study mode. I would make the following suggestions if you're having trouble studying:

  • Find a study area that is not part of your living quarters, your home should be the reward and sanctuary for good studying, or used on special occasions.
  • If possible find a place where you have access to a white board or chalk board
  • Try to get a study room if they are available rather than a carol where you have a table to spread out on
  • If you must listen to music choose instrumentals rather than anything with singing. Experiment with your musical selections and see if any produce better studying.
  • If you find that you keep reading the same passage over and over read aloud for a while. Reading aloud forces you to engage more of your brain. It slows you down a little because you can read faster when you read silently, but if it's the only way to push through the material so be it.
  • Make a cheat sheet. Not to cheat mind you, but hypothetically if you were to cheat what could you put down on an 8.5 x 11 inch piece of paper that would guarantee you a good grade? This is really a good method of condensing your notes but, somehow, because you call it a cheat sheet, the "little voice" gets intriuged and starts to help you: "Don't for get to put down stuff for this section"

More on flow

There's a lot on the internet about flow. It's a concept that's been around a while and everyone rediscovers it from time to time. Back in the late 80's early 90's I remember an article that talked about helping programmers achieve flow. The single most important thing they found was to provide them an office with a door. Literally to shut out distractions.

In Software Development there are always discussions about "The Flow". I believe that the reason so many programmers work into the wee hours of the morning has to do with maintaining the flow state. There are fewer distractions at night. You are tired which I believe stops the internal chatter that can get in the way of achieving the flow state. There is also the reward of going to bed when you're finished. I can't tell you how many times I've told myself, at 1 in the morning, let me finish up the code for this last method and make sure it compiles and I'll call it a night and finally crawl into bed around 3.

Just to show you I'm not making this stuff up check out the following short article:

One thing you will see mentioned about flow, in some articles,is to do something that you love. That's something that College Students may not have the luxury of doing. You are not going to love every course you take. But you still need to try to get into flow if you are going to study well. The above article does a good job of describing some of the symptoms of flow. But you don't really know your in it until your out of it. Then you realize that wow I just got alot done or you find that 2 hours have gone by and you hardly noticed. This state is highly personalized as well. Don't feel too bad if you don't have the highly romanticized experiences described on the web. In the end you're just looking for some highly productive time where things seemed to finally click. In the end "The Flow" is just some good efficient study time, if you get anymore emotional benefits from it it's just icing on the cake.

Author: Nasty Old Dog

Validate

Thursday, October 9, 2014

Android: Working with USB and Arduino

Android Working with USB Peripherals

Android Working with USB Peripherals

Introduction

When it comes to working with popular tablets Google Android and Apple IOS are the likely choices. Google provides a near normal Java development platform and that gives it enough of an edge to be my platform of choice. This is more because I'm lazy rather than any real difference. Objective C is easy enough to get used to. The other advantage in favor of Google is a standard miniUSB connector. I have plenty of those lying around the house which again is a sign of laziness on my part. Ultimately if one were to do this sort of thing for a living learning both platforms makes you more marketable. I do this stuff part time so using a familiar platform is an important consideration.

In terms of asthetic appeal I think the Apple IOS platform wins. I seem to prefer using my iphone and ipad over the Google tablet I have. This is a very subjective consideration and my laziness wins out over it.

Self deprecation aside what I mean to say is that I am not trying to advocate one platform over another. If something I design is useful enough at some point an Apple application would be built. But most of my design work is based on learning things that interest me and not about market research into which is the best platform to launch an application on.

NOTE: for those who like to do rather than read a link to a zipped Eclipse project with source code is included at the end of this article.

USB

For a while now Google Android has provided a USB API. For the code in this article I have been developing on the Eclipse IDE with the Android SDK. Android provides both a USB accessory API and a USB Host API. The Host API is what this article is about. This is where the Google tablet provides the USB power and a USB device is attached via USB cable. In particular an Arduino MEGA 2256.

The Android documentation on UsbHost is good but it is not a tutorial. It is very general because there are a multitude of USB device you might want to attach. In this particular case there is only one device the Arduino and it acts as an RS232 device communicating over USB.

The references below provide some extra documentation and much of the code I ended up modifying came from: (place link here). For how basic the communications were there is quite a bit of code that is needed to accomplish the USB communication tasks.

Most of my changes came from creating an object class to encapsulate as much of the USB API calls into one class so it does not take up room in the MainActivity class. I don't really consider this to be a clean separation because so much of the access to the USB API relies on System level objects placed into the Context object of the MainActivity. The current code is certainly workable and hopefully to future users is not too cumbersome.

Application and USB states

The application (Activity in Android-speak) has lifecycle states that are imposed by the Android API. Some can be ignored in most activities as there is built in default behavior provided by the API. The Activity handles these lifecycle events by overriding onCreate, onStart, onResume, onPause, onStop and onDestroy methods as needed.

Android activities do not run like typical computer applications where the user starts them and then quits when they are done. In an activity they can pause or stop. But the user does not have one touch access to quitting the Activity. The Android OS will decide when to take the Activity out of memory and make it restart from scratch. Not the most useful model for controlling a piece of hardware with USB but that's what we have to work with.

The USB has states that don't exist directly in the API, but must be dealt with. For example is a USB device connected or not? Has the communication channel been set up if it is connected. These have to be determined as the Activity is running and continually checked.

Android lifecycle considerations

The main lifecycle method that most methods deal with is the onCreate method. This is where initialization occurs for many objects that make up the Activity. This is usually auto-generated if you are using Eclipse as the development platform when you first define the activity in Eclipse. The developer is then free to add in any initializations they want in this method.

Android logging - What is it and Where is it?

Android provides a Logging API via the Log class. After placing the Android device in developer debugging mode, this class allows the developer to place logging information (ie. simple text) in the Android system log.

The easy part is placing the logging code into the source code. I tend to use 2 methods from the log class: the "i" method (short for info) and the "e" method (short for error). The info logs are used to tell when a method has been entered or when certain USB functions have been called. The error logs are used for when the code takes an unexpected turn, or when a test for a null pointer is true.

The Log methods take 2 strings. The first string is a TAG that is usually set up to tell you what class you are operating in. The second string is the developer message for what is happening or where it is happening.

public static final String version = "Version 1.12";
private final String TAG = MainActivity.class.getSimpleName() + MainActivity.version;

The version field is made static and public so it can be used by other classes TAG fields for logging purposes. The purpose of using the version string is to allow the developer to search the large Android Log file and easily find the log messages output by the Activity.

The following is an example of a Log statement using the above TAG field:

Log.i(TAG, "onCreate(): entered");

The more difficult part is to access the log file once the Activity has run. The easiest way is to plug the tablet into the USB of your development computer. Then locate the location where your Android SDK is placed. Then open a terminal window (if you're using Mac or Unix) and set your PATH variable to point at the SDK/bin directory. Now you can use the adb command to run logcat on the tablet to dump the log file. The only problem is it never returns giving a live feed so you need to Ctrl-C out of the program to get access back to the command prompt.

adb logcat >android.log   
vi android.log

I happen to use vi as an editor for this but any text editor will do.

[Add copy of vi display of the log output]

Intents

Many of the system level events that happen are communicated back to the MainActivity through the use of Intent objects. Separate Handler classes are required to handle these Intent events. The Android system can be made to map these events to a particular Activity. This allows the system to open the application when the event occurs.

USB Permission Receiver

This code gets called after the User sees a Dialog box asking for permission to use a USB device and clicks the OK button. This is a non-system level intent meaning it is created by the Activity itself and the Activity must tell the Android OS about the Intent and then register it as follows:

// register the broadcast receiver
mPermissionIntent = PendingIntent.getBroadcast(this, 0, new Intent(
                ACTION_USB_PERMISSION), 0);
IntentFilter filter = new IntentFilter(ACTION_USB_PERMISSION);
registerReceiver(mUsbReceiver, filter);

This needs to be done because we need the Android OS to pause our Activity and display a permission dialog box. This seems to be a round about way to get permission when the application already knows which device it wants to control and that the Activity could just as easily asked the user directly. Android OS seems to need this to keep track of whether it can use the USB connection to the Arduino device. So a receiver must be defined in the MainActivity to handle this intent when we ask the OS to trigger it.

Once the permission has been established this receiver will go ahead and try to establish USB communication by calling setupUsbComm() in the UsbController class. Once that is accomplished communications should be able to take place between the Tablet and the Arduino board.

private final BroadcastReceiver mUsbDeviceDetachedReceiver = new BroadcastReceiver() {
        @Override
        public void onReceive(Context context, Intent intent) {
                String action = intent.getAction();
                Log.i(TAG, "mUsbDeviceDetachedReceiver.onReceive()" + version);
                if (UsbManager.ACTION_USB_DEVICE_DETACHED.equals(action)) {
                        if (usbController != null) {
                                usbController.releaseUsb();
                                textSearchedEndpoint
                                                .append("\nACTION_USB_DEVICE_DETACHED: usbController released\n");
                        } else {
                                textSearchedEndpoint
                                                .append("\nACTION_USB_DEVICE_DETACHED: usbController null no release\n");
                        }
                }
        }
};

USB Device Attached Intent Receiver

This code gets called after the user attaches a USB device to the USB port of the Android Tablet. This Intent is generated by the Android OS and it has an ACTION string defined by the API already. So to register a handler for this intent requires one line in onCreate() to make that happen:

registerReceiver(mUsbDeviceAttachedReceiver, new IntentFilter(
                UsbManager.ACTION_USB_DEVICE_ATTACHED));

Now the receiver for this is somewhat complicated by the fact that if we just attached the USB then the UsbController class object that encapsulates the USB functionality will not have been set up properly (no device no setup). This handler must see to it that if the connected USB is the one we are looking for then we initialize it and get permission for it. Yes this handler must trigger the permission handler above. There are 2 possibilities:

  1. There is no UsbController object yet so we need to use the constructor for the class to set one up
  2. Some code has been run that has partially set up the object in the MainActivity but there was no device found to go any further

The second one occurs because we have to take into account that the device may already be connected when the Activity is run for the first time. In that case there will be no ACTIONUSBDEVICEATTACHED sent to the Activity because the device was attached long before the Activity came into being. So since the Activity may have gone through a partial setup this code is overly complex.

In both cases an important part of the code is to get the UsbDevice found by Android OS and place it in the UsbController object for use. The intent object seems to be the only place where this information (the UsbDevice) exists.

private final BroadcastReceiver mUsbDeviceAttachedReceiver = new BroadcastReceiver() {

        @Override
        public void onReceive(Context context, Intent intent) {
                String action = intent.getAction();
                Log.i(TAG, "mUsbDeviceAttachedReceiver.onReceive()");
                if (UsbManager.ACTION_USB_DEVICE_ATTACHED.equals(action)) {
                        // The device is provided as part of the intent
                        // in a larger context (ie. android attached to USB hub)
                        // it may not be our device of interest there could be other
                        // devices. But for now assume only one device
                        UsbDevice device = (UsbDevice) intent
                                        .getParcelableExtra(UsbManager.EXTRA_DEVICE);

                        if (usbController == null) {
                                textDisplayLog
                                                .append("\nACTION_USB_DEVICE_ATTACHED: usbController was null now created\n");
                                usbController = new UsbControls(
                                                UsbControls.targetVendorID,
                                                UsbControls.targetProductID,
                                                device);
                                Log.i(TAG,
                                                "mUsbDeviceAttachedReceiver: USB Controller set up");
                        } else {
                                textDisplayLog
                                                .append("\nACTION_USB_DEVICE_ATTACHED: usbController not null UsbDevice set\n");
                                usbController.setUsbDevice(device);
                                Log.i(TAG,
                                                "mUsbDeviceAttachedReceiver: USB Controller already available UsbDevice set");
                        }

                        usbController.usbInit((UsbManager) context
                                        .getSystemService(USB_SERVICE));

                        // Get permission to use
                        // mUsbManager.requestPermission(device, mPermissionIntent);
                        if (usbController.getUsbPermission(
                                        (UsbManager) context.getSystemService(USB_SERVICE),
                                        mPermissionIntent)) {
                                Log.i(TAG, "mUsbAttachedReceiver: USB ready");
                    } else {
                           Log.e(TAG, "mUsbAttachedReceiver: Cannot get permission for USB");
                    }
                }
        }
};

USB Device Detatched Intent Receiver

This code gets called when the device is disconnected from the USB port of the Android Tablet. This must be registered to the Android OS

registerReceiver(mUsbDeviceDetachedReceiver, new IntentFilter(
                UsbManager.ACTION_USB_DEVICE_DETACHED));

Without a device all of the USB objects held within the UsbController object need to be released, hence the call to usbController.releaseUsb(). There is the possiblility that something went wrong and there is no valid usbController object so we check it for a null pointer before using it.

private final BroadcastReceiver mUsbDeviceDetachedReceiver = new BroadcastReceiver() {
        @Override
        public void onReceive(Context context, Intent intent) {
                String action = intent.getAction();
                Log.i(TAG, "mUsbDeviceDetachedReceiver.onReceive()");
                if (UsbManager.ACTION_USB_DEVICE_DETACHED.equals(action)) {
                        if (usbController != null) {
                                usbController.releaseUsb();
                                textDisplayLog
                                                .append("\nACTION_USB_DEVICE_DETACHED: usbController released\n");
                        } else {
                                textDisplayLog
                                                .append("\nACTION_USB_DEVICE_DETACHED: usbController null no release\n");
                        }
                }
        }
};

Finally after we have used the Activity to our heart's content and have started using other Activities on the tablet we should unregister the Intents if the MainActivity gets "destroyed" by the Android OS. So in the onDestroy lifecycle call back the following unregister calls are made:

unregisterReceiver(mUsbReceiver);
unregisterReceiver(mUsbDeviceAttachedReceiver);
unregisterReceiver(mUsbDeviceDetachedReceiver);

Strings and bytes

Strings are the usual class used for handling text and for most Android Activities involving graphics and user input this is fine. For USB communication however bytes are the coin of the realm. For this USB program the transmit message is setup directly in bytes. But to display the message returned by the Arduino over USB a conversion from bytes to string is needed. There maybe an easier way but I chose to terminate the USB bytes with the ASCII return character 0x0A. This way you just need to loop through the bytes and find that ASCII character as the end of the message:

for (int i = 0; i < result; i++) {
    if (buf[i] != 0x0a) {
        sb.append(String.format("%c", buf[i]));
    } else {
        synchronized (syncToken) {
            msg = sb.toString();
        }
        sb = new StringBuilder();
        mHandler.sendEmptyMessage(0);
        break; // There is a question whether there is new valid data 
              // after newline but am assuming not for now
    }
}

The Code

MainActivity.java - Simple user interface to test USB communication

This MainActivity has been set up with a title, a scrolling TextView to act as a DisplayLog window for realtime feedback as to how the application is running. Finally a button that when pressed sends a preset message to the Arduino board. onStart() is probably the most complex method because it makes sure that if a USB board is attached that we try to establish communication and make sure we have permission to connect to it. The rest of the code is as explained above with the Intent receivers in their proper location. I have implemented all of the lifecycle methods with logging so when debugging connectivity with the USB you have an understanding of when things are happening. Therefore as your Activity becomes more complex you will have some insight to where things should go in the lifecycle method callbacks.

The Graphical User Interface was set up with Eclipse and looks as follows:

The actual MainActivity code:

package com.llsc.androidusb;


import android.app.Activity;
import android.app.ActionBar;
import android.app.Fragment;
import android.app.PendingIntent;
import android.content.BroadcastReceiver;
import android.content.Context;
import android.content.Intent;
import android.content.IntentFilter;
import android.hardware.usb.UsbConstants;
import android.hardware.usb.UsbDevice;
import android.hardware.usb.UsbManager;
import android.os.Bundle;
import android.os.Handler;
import android.os.Message;
import android.util.Log;
import android.view.LayoutInflater;
import android.view.Menu;
import android.view.MenuItem;
import android.view.View;
import android.view.ViewGroup;
import android.widget.Button;
import android.widget.TextView;
import android.os.Build;

public class MainActivity extends Activity {
        public static final String version = "Version 1.12";
        private final String TAG = MainActivity.class.getSimpleName() + MainActivity.version;


        TextView textDisplayLog;

        Button msgButton;

        UsbControls usbController;

        Boolean fromOnPause = false;

        private static final String ACTION_USB_PERMISSION = "com.android.example.USB_PERMISSION";

        PendingIntent mPermissionIntent;

        Handler mHandler1;
        Object syncObj;

        @Override
        protected void onCreate(Bundle savedInstanceState) {
                super.onCreate(savedInstanceState);
                Log.i(TAG, "onCreate(): entered");

                setContentView(R.layout.activity_main);

                if (savedInstanceState == null) {
                        getFragmentManager().beginTransaction()
                                        .add(R.id.container, new PlaceholderFragment()).commit();
                }

                // register the broadcast receiver
                mPermissionIntent = PendingIntent.getBroadcast(this, 0, new Intent(
                                ACTION_USB_PERMISSION), 0);
                IntentFilter filter = new IntentFilter(ACTION_USB_PERMISSION);
                registerReceiver(mUsbReceiver, filter);

                registerReceiver(mUsbDeviceAttachedReceiver, new IntentFilter(
                                UsbManager.ACTION_USB_DEVICE_ATTACHED));
                registerReceiver(mUsbDeviceDetachedReceiver, new IntentFilter(
                                UsbManager.ACTION_USB_DEVICE_DETACHED));

                mHandler1 = new Handler() {
                        public void handleMessage(Message msg) {
                                onMsgRcv();
                        }
                };
                syncObj = new Object();

                usbController = new UsbControls(UsbControls.targetVendorID,
                                UsbControls.targetProductID);
                usbController.setRcvHandler(mHandler1);
                usbController.setSyncObj(syncObj);
                /*
                 * deviceFound = this.getUsbDevice(targetVendorID, targetProductID); if
                 * (deviceFound == null) usbAttached = false; else usbAttached = true;
                 */
        }

        @Override
        protected void onStart() {
                super.onStart();
                Log.i(TAG, "onStart(): entered ");

                textDisplayLog = (TextView) findViewById(R.id.displaylog);
                msgButton = (Button) findViewById(R.id.button1);

                textDisplayLog.append("\nonStart");

                // Make sure usb is ready to go
                // For this section of code to work you must have the device plugged in.
                // if the device is not plugged in null pointers will happen
                // I think this is why this app should trigger usb setup based on device
                // plugged in
                // intent and device detatched events. Then I should be able to do all
                // the setup
                // in the intent handlers.
                // This is not correct onStart must try to establish connectivity
                // because the
                // device may already be attached.
                UsbManager manager = (UsbManager) this.getSystemService(USB_SERVICE);

                // Device maybe attached but not initialized
                        // If USB connected get permission to use
                        // must setup device first before asking for permission
                        // How do I tell if there is a device attached??
                        // Right now none of the stuff below will set up the connection if
                        // something is connected
                        usbController.setUsbDevice(manager);

                                usbController.usbInit(manager);
                                if (usbController.hasPermission(manager)) {
                                        usbController.setupUsbComm(manager);
                                } else { 
                                        if (usbController.getUsbPermission(manager,
                                                mPermissionIntent)) {
                                        Log.i(TAG, "onStart: USB ready");
                                    } else {
                                          Log.e(TAG, "onStart: Cannot get permission for USB");
                                    }
                                }
                fromOnPause = false;
        }

        @Override
        public void onResume() {
                super.onResume();
                Log.i(TAG, "onResume(): entered");
                textDisplayLog.append("\nonResume: called");
        }

        @Override
        public void onPause() {
                super.onPause();
                Log.i(TAG, "onPause(): entered");
                textDisplayLog.append("\nonPause: called");
                fromOnPause = true;
        }

        @Override
        protected void onStop() {
                super.onStop();
                Log.i(TAG, "onStop(): entered");
                if (usbController.usbAttached()) {
                        usbController.releaseUsb();
                }
        }

        @Override
        protected void onDestroy() {
                super.onDestroy();
                Log.i(TAG, "onDestroy(): entered");
                textDisplayLog.append("\nonDestroy");
                unregisterReceiver(mUsbReceiver);
                unregisterReceiver(mUsbDeviceAttachedReceiver);
                unregisterReceiver(mUsbDeviceDetachedReceiver);
        }

        public void onButton1Pressed(View view) {
                byte[] bytesHello = new byte[] { (byte) 'H', 'e', 'l', 'l', 'o', ' ',
                                'f', 'r', 'o', 'm', ' ', 'A', 'n', 'd', 'r', 'o', 'i', 'd' };
                usbController.usbSend(bytesHello);
        }

        private void onMsgRcv() {
                // Message has come in over USB get it from the USB thread
                // Do it in a sync way so buffers don't get stepped on by different
                // threads
                String s = usbController.getRcvMsg();
                textDisplayLog.append("\n\nData Received");
                textDisplayLog.append("\n\n" + s);
        }

        /*
         * @Override protected void onCreate(Bundle savedInstanceState) {
         * super.onCreate(savedInstanceState);
         * setContentView(R.layout.activity_main);
         * 
         * if (savedInstanceState == null) { getFragmentManager().beginTransaction()
         * .add(R.id.container, new PlaceholderFragment()).commit(); } }
         */
        @Override
        public boolean onCreateOptionsMenu(Menu menu) {

                // Inflate the menu; this adds items to the action bar if it is present.
                getMenuInflater().inflate(R.menu.main, menu);
                return true;
        }

        @Override
        public boolean onOptionsItemSelected(MenuItem item) {
                // Handle action bar item clicks here. The action bar will
                // automatically handle clicks on the Home/Up button, so long
                // as you specify a parent activity in AndroidManifest.xml.
                int id = item.getItemId();
                if (id == R.id.action_settings) {
                        return true;
                }
                return super.onOptionsItemSelected(item);
        }

        private final BroadcastReceiver mUsbReceiver = new BroadcastReceiver() {

                @Override
                public void onReceive(Context context, Intent intent) {
                        String action = intent.getAction();
                        Log.i(TAG, "mUsbReceiver.onReceive()");
                        if (ACTION_USB_PERMISSION.equals(action)) {
                Log.i(TAG,"mUsbReceiver.onReceive: ACTION_USB_PERMISSION found");
                                synchronized (this) {
                                        UsbManager manager = (UsbManager) context
                                                        .getSystemService(USB_SERVICE);

                                        if (intent.getBooleanExtra(
                                                        UsbManager.EXTRA_PERMISSION_GRANTED, false)) {
                                                usbController.usbInit(manager);
                                                usbController.setupUsbComm(manager);

                                                textDisplayLog
                                                                .append("\npermission receiver: permission granted");
                                                Log.i(TAG,
                                                                "mUsbReceiver: permission granted com link attemped");
                                        } else {
                                                textDisplayLog.append("permission denied for device ");
                                                Log.e(TAG, "mUsbReceiver: Permisson denied for device ");
                                        }
                                }
                        }
                }
        };

        private final BroadcastReceiver mUsbDeviceAttachedReceiver = new BroadcastReceiver() {

                @Override
                public void onReceive(Context context, Intent intent) {
                        String action = intent.getAction();
                        Log.i(TAG, "mUsbDeviceAttachedReceiver.onReceive()");
                        if (UsbManager.ACTION_USB_DEVICE_ATTACHED.equals(action)) {
                                // The device is provided as part of the intent
                                // in a larger context (ie. android attached to USB hub)
                                // it may not be our device of interest there could be other
                                // devices. But for now assume only one device
                                UsbDevice device = (UsbDevice) intent
                                                .getParcelableExtra(UsbManager.EXTRA_DEVICE);

                                if (usbController == null) {
                                        textDisplayLog
                                                        .append("\nACTION_USB_DEVICE_ATTACHED: usbController was null now created\n");
                                        usbController = new UsbControls(
                                                        UsbControls.targetVendorID,
                                                        UsbControls.targetProductID,
                                                        device);
                                        Log.i(TAG,
                                                        "mUsbDeviceAttachedReceiver: USB Controller set up");
                                } else {
                                        textDisplayLog
                                                        .append("\nACTION_USB_DEVICE_ATTACHED: usbController not null UsbDevice set\n");
                                        usbController.setUsbDevice(device);
                                        Log.i(TAG,
                                                        "mUsbDeviceAttachedReceiver: USB Controller already available UsbDevice set");
                                }

                                usbController.usbInit((UsbManager) context
                                                .getSystemService(USB_SERVICE));

                                // Get permission to use
                                // mUsbManager.requestPermission(device, mPermissionIntent);
                                if (usbController.getUsbPermission(
                                                (UsbManager) context.getSystemService(USB_SERVICE),
                                                mPermissionIntent)) {
                                        Log.i(TAG, "mUsbAttachedReceiver: USB ready");
                            } else {
                                   Log.e(TAG, "mUsbAttachedReceiver: Cannot get permission for USB");
                            }
                        }
                }
        };

        private final BroadcastReceiver mUsbDeviceDetachedReceiver = new BroadcastReceiver() {
                @Override
                public void onReceive(Context context, Intent intent) {
                        String action = intent.getAction();
                        Log.i(TAG, "mUsbDeviceDetachedReceiver.onReceive()");
                        if (UsbManager.ACTION_USB_DEVICE_DETACHED.equals(action)) {
                                if (usbController != null) {
                                        usbController.releaseUsb();
                                        textDisplayLog
                                                        .append("\nACTION_USB_DEVICE_DETACHED: usbController released\n");
                                } else {
                                        textDisplayLog
                                                        .append("\nACTION_USB_DEVICE_DETACHED: usbController null no release\n");
                                }
                        }
                }
        };

        /**
         * A placeholder fragment containing a simple view.
         */
        public static class PlaceholderFragment extends Fragment {

                public PlaceholderFragment() {

                }

                @Override
                public View onCreateView(LayoutInflater inflater, ViewGroup container,
                                Bundle savedInstanceState) {
                        View rootView = inflater.inflate(R.layout.fragment_main, container,
                                        false);
                        return rootView;
                }
        }
}

UsbControls.java - the encapsulating class for USB setup and communication

This is the class that tries to encaspulate most of the USB code. The most difficult to understand is the setup of communications.

Enumerating USB devices

The Android UsbHost API documentation covers enumerating USB devices that is housed in the getUsbDevice(vendorID,productID,UsbManager) method. It requires a UsbManager parameter because this is available at the level of the MainActivity. Originally all this code was threaded into the MainActivity class.

Initializing and Endpoint enumeration

It turns out that some investigation of the USB device can take place before permission is given to communicate. These have been separated out into usbInit(UsbManager). Enpoints are USB speak for transmit, receive and control channels available to the board. These need to be in place and intialized before the board can actually transmit and receive information

Communications setup

When permission to use the USB device has been granted setupUsbComm(UsbManager) method does the final connection arrangements and establishes common RS232 control parameters such as Baud Rate, Data bits and Stop bits. This is accomplished by calls to the UsbHost API method controlTransfer().

Sending data

The usbSend(byte[]) method sends data down the USB via the bulkTransfer() method. This routine is rather straight forward and if communication has been established properly works as expected.

The rest is just some flag fields to try to maintain some state information about the communication link.

/**
 * AndroidUsb
 * UsbControls.java
 *
 * Author: Nasty Old Dog
 * Code modified from:
 * http://android.serverbox.ch/?p=370 
 * https://github.com/mik3y/usb-serial-for-android
 */
package com.llsc.androidusb;

import java.util.HashMap;
import java.util.Iterator;

import android.app.PendingIntent;
import android.content.Context;
import android.hardware.usb.UsbConstants;
import android.hardware.usb.UsbDevice;
import android.hardware.usb.UsbDeviceConnection;
import android.hardware.usb.UsbEndpoint;
import android.hardware.usb.UsbInterface;
import android.hardware.usb.UsbManager;
import android.os.Handler;
import android.util.Log;

/**
 * @author tmcguire
 *
 */
public class UsbControls {
        private final String TAG = UsbControls.class.getSimpleName();
        public static final int targetVendorID = 0x2341;
        public static final int targetProductID = 0x42;

        UsbDevice deviceFound = null;
        UsbInterface usbInterfaceFound = null;
        UsbEndpoint endpointIn = null;
        UsbEndpoint endpointOut = null;
        UsbReceiver rcvThrd;
        UsbInterface usbInterface;
        UsbDeviceConnection usbDeviceConnection;
        int vendorID = 0;
        int productID = 0;
        private String infoText = "No device connected";

        boolean usbAttached = false;
        boolean usbConnected = false;

        private Handler rcvHandler;
        public void setRcvHandler(Handler rcvHandler) {
                this.rcvHandler = rcvHandler;
        }

        private Object syncObj;

        public void setSyncObj(Object syncObj) {
                this.syncObj = syncObj;
        }

        public UsbControls(int vID, int pID) {
                vendorID = vID;
                productID = pID;
        }

        public UsbControls(int vID, int pID, UsbDevice usbDevice) {
                vendorID = vID;
                productID = pID;
                this.deviceFound = usbDevice;
        }

        public void setUsbDevice(UsbManager manager){
                deviceFound = this.getUsbDevice(vendorID, productID, manager);
        }

        public void setUsbDevice(UsbDevice usbDevice) {
                deviceFound = usbDevice;
        }
        public String getInfoText() {
                return infoText;
        }

        public String getRcvMsg() {
                return this.rcvThrd.getMsg();
        }

        public void usbInit(UsbManager manager) {

                Log.i(TAG,"usbInit");

                if (!usbAttached) {
                        deviceFound = this.getUsbDevice(vendorID, productID, manager);
                        if (!usbAttached) {
                                Log.e(TAG, "usbInit: No device connected");
                                return;
                        } else {
                                Log.i(TAG, "usbInit: device found");
                        }
                }

                searchEndPoint(manager);
/*
                if (usbInterfaceFound != null) {
                        setupUsbComm(manager);
                        Log.i(TAG,"connectUsb: usbInterface setup attempted");
                }
                else {
            Log.e(TAG,"connectUsb: usbInterface not found, usb not setup");
                }
*/
        }

        public void releaseUsb() {
                Log.i(TAG," releaseUsb");

                if (rcvThrd != null) {
                        rcvThrd.stopUsbReceiver();

                        // Inserting delay here
                        try {
                                Thread.sleep(5000);
                        } catch (InterruptedException e) {
                                Log.e(TAG, "sleep in releaseUsb failed");
                        }

                        rcvThrd = null;
                }
                if (usbDeviceConnection != null) {
                        if (usbInterface != null) {
                                usbDeviceConnection.releaseInterface(usbInterface);
                                usbInterface = null;
                        }
                        usbDeviceConnection.close();
                        usbDeviceConnection = null;
                }

                deviceFound = null;
                usbInterfaceFound = null;
                endpointIn = null;
                endpointOut = null;
                usbAttached = false;
                usbConnected = false;
        }

        public boolean isUsbConnected() {
                return usbConnected;
        }

        public void setUsbConnected(boolean usbConnected) {
                this.usbConnected = usbConnected;
        }

        public void searchEndPoint(UsbManager manager) {

        Log.i(TAG,"searchEndPoint");
                usbInterfaceFound = null;
                endpointOut = null;
                endpointIn = null;

                Log.i(TAG, "searchEndPoint: entered");
                // Search device for targetVendorID and targetProductID
                if (deviceFound == null) {
                        deviceFound = this.getUsbDevice(vendorID, productID, manager);
                }
                if (deviceFound == null) {
            Log.e(TAG,"searchEndPoint: Device not found.");
            this.usbAttached = false;
                        return;
                } else {
                        String s = deviceFound.toString() + "\n" + "DeviceID: "
                                        + deviceFound.getDeviceId() + "\n" + "DeviceName: "
                                        + deviceFound.getDeviceName() + "\n" + "DeviceClass: "
                                        + deviceFound.getDeviceClass() + "\n" + "DeviceSubClass: "
                                        + deviceFound.getDeviceSubclass() + "\n" + "VendorID: "
                                        + deviceFound.getVendorId() + "\n" + "ProductID: "
                                        + deviceFound.getProductId() + "\n" + "InterfaceCount: "
                                        + deviceFound.getInterfaceCount();
                        infoText = s;

                        Log.i(TAG, "searchEndPoint: " + s);

                        // Search for UsbInterface with Endpoint of USB_ENDPOINT_XFER_BULK,
                        // and direction USB_DIR_OUT and USB_DIR_IN

                        for (int i = 0; i < deviceFound.getInterfaceCount(); i++) {
                                UsbInterface usbif = deviceFound.getInterface(i);

                                UsbEndpoint tOut = null;
                                UsbEndpoint tIn = null;

                                int tEndpointCnt = usbif.getEndpointCount();
                                if (tEndpointCnt >= 2) {
                                        for (int j = 0; j < tEndpointCnt; j++) {
                                                if (usbif.getEndpoint(j).getType() == UsbConstants.USB_ENDPOINT_XFER_BULK) {
                                                        if (usbif.getEndpoint(j).getDirection() == UsbConstants.USB_DIR_OUT) {
                                                                tOut = usbif.getEndpoint(j);
                                                        } else if (usbif.getEndpoint(j).getDirection() == UsbConstants.USB_DIR_IN) {
                                                                tIn = usbif.getEndpoint(j);
                                                        }
                                                }
                                        }

                                        if (tOut != null && tIn != null) {
                                                // This interface have both USB_DIR_OUT
                                                // and USB_DIR_IN of USB_ENDPOINT_XFER_BULK
                                                usbInterfaceFound = usbif;
                                                endpointOut = tOut;
                                                endpointIn = tIn;
                                        }
                                }

                        }

                        if (usbInterfaceFound == null) {
                                Log.e(TAG,"searchEndPoint: No suitable interface found!");
                        } else {
                                Log.i(TAG,"searchEndPoint: UsbInterface found: "
                                                + usbInterfaceFound.toString() + "\n\n"
                                                + "Endpoint OUT: " + endpointOut.toString() + "\n\n"
                                                + "Endpoint IN: " + endpointIn.toString());

                        }
                }
        }

        public boolean setupUsbComm(UsbManager manager) {

                // for more info, search SET_LINE_CODING and

                final int RQSID_SET_LINE_CODING = 0x20;
                final int RQSID_SET_CONTROL_LINE_STATE = 0x22;
                // final int RQSID_SET_LINE_CODING = 0x32;
                // final int RQSID_SET_CONTROL_LINE_STATE = 0x34;

                boolean success = false;

                Log.i(TAG,"setupUsbComm");

                if (!usbAttached) {
                        Log.e(TAG, "setupUsbComm: device not attached");
                        return success;
                }
//              UsbManager manager = (UsbManager) getSystemService(Context.USB_SERVICE);
                if (deviceFound == null) {
            Log.e(TAG,"setupUsbComm: device not initialized");
                        return success;
                }

                Boolean permitToRead = manager.hasPermission(deviceFound);

                if (permitToRead) { 
                        Log.i(TAG,"setupUsbComm: permitted to read");
                        usbDeviceConnection = manager.openDevice(deviceFound);
                        if (usbDeviceConnection != null) {
                                usbDeviceConnection.claimInterface(usbInterfaceFound, true);

                                // showRawDescriptors(); // skip it if you no need show
                                // RawDescriptors
                                Log.i(TAG,"setupUsbComm: device connection made");

                                int usbResult;
                                usbResult = usbDeviceConnection.controlTransfer(0x21, // requestType
                                                // usbResult = usbDeviceConnection.controlTransfer(0x40,
                                                // // requestType
                                                RQSID_SET_CONTROL_LINE_STATE, // SET_CONTROL_LINE_STATE
                                                0, // value or in 0x01 for DTR or 0x02 for RTS
                                                0, // index
                                                null, // buffer
                                                0, // length
                                                0); // timeout

                                Log.i(TAG,"setupUsbComm: controlTransfer(SET_CONTROL_LINE_STATE): " + 
                                                usbResult);

                                // baud rate = 9600
                                // 8 data bit
                                // 1 stop bit
                                byte[] encodingSetting = new byte[] { (byte) 0x80, 0x25, 0x00,
                                                0x00, 0x00, 0x00, 0x08 };
                                usbResult = usbDeviceConnection.controlTransfer(0x21, // requestType
                                                // usbResult = usbDeviceConnection.controlTransfer(0x40,
                                                // // requestType
                                                RQSID_SET_LINE_CODING, // SET_LINE_CODING
                                                0, // value
                                                0, // index
                                                encodingSetting, // buffer
                                                7, // length
                                                0); // timeout

                                Log.i(TAG,"setupUsbComm: controlTransfer(RQSID_SET_LINE_CODING): " + 
                                                usbResult);

                                // can set up receive thread
                                // This needs to be in it's own routine. 
                                rcvThrd = new UsbReceiver(usbDeviceConnection, endpointIn,
                                                rcvHandler, syncObj);
                                rcvThrd.start();
                success = true;
                usbConnected = true;
                        }

                } else {
//                      manager.requestPermission(deviceFound, mPermissionIntent);
                        Log.i(TAG,"setupUsbComm: Permission requested");
                }
                return success;
        }

        public boolean hasPermission (UsbManager manager) {
                if (deviceFound != null) {
                return manager.hasPermission(deviceFound);
                } else {
                        Log.e(TAG, "hasPermission: device null should be initialized");
                }
                return false;
        }

        public boolean getUsbPermission (UsbManager manager, PendingIntent mPermissionIntent) {
                if (this.usbAttached) {
                if (deviceFound != null) {
                        manager.requestPermission(deviceFound, mPermissionIntent);
                        Log.i(TAG,"getUsbPermission: Permission requested");
                        if (manager.hasPermission(deviceFound)) {
                                Log.i(TAG,"getUsbPermission: Permission granted");
                                return true;
                        } else {
                                Log.i(TAG,"getUsbPermission: Permission denied");
                                return false;                   
                        }
                    }
                    Log.i(TAG,"getUsbPermission: USB attached but device null");
                    return false;
                }
                Log.i(TAG,"getUsbPermission: USB not attached");
                return false;
        }

        public UsbDevice getUsbDevice(int vendor, int product, UsbManager manager) {
//                      UsbManager manager = (UsbManager) getSystemService(Context.USB_SERVICE);
                        HashMap<String, UsbDevice> deviceList = manager.getDeviceList();
                        Iterator<UsbDevice> deviceIterator = deviceList.values().iterator();

                        if (deviceList.isEmpty()) {
                                this.usbAttached = false;
                                return null;
                        }
                        while (deviceIterator.hasNext()) {
                                UsbDevice device = deviceIterator.next();

                                if (device.getVendorId() == vendor) {
                                        if (device.getProductId() == product) {
                                                this.usbAttached = true;
                                                return device;
                                        }
                                }
                        }
                        this.usbAttached = false;
                        return null;
        }

        public boolean usbAttached() {
                return usbAttached;     // was deviceFound != null
        }

        public void setUsbAttached(boolean attach) {
                usbAttached = attach;
        }

    public void resetConnection(UsbManager manager) {
                deviceFound = this.getUsbDevice(vendorID, productID, manager);
            if (deviceFound == null) {
                Log.e(TAG, "resetConnection: can not find a device");
                return;
            }
            usbInit(manager);
            if (this.hasPermission(manager)) {
                setupUsbComm(manager);
            }
    }

    public void usbSend(byte[] msg) {
                int usbResult;

                if (endpointOut != null && usbDeviceConnection != null) {
                usbResult = usbDeviceConnection.bulkTransfer(endpointOut, msg,
                                msg.length, 1000);
                  Log.i(TAG,"usbSend: Transmitted msg: " + usbResult);
            }
                else {
                        Log.i(TAG,"usbSend: endpointOut null no xmit");
                }
    }
}

UsbReceiver.java - code to receive from USB and notify MainActivity

This class creates its own thread to capture data over the USB connection. Because data may not come in one fell swoop it must loop looking for a terminating character before it places data in the msg field. To create a receiver an handler back to the MainActivity must be established so MainActivity can be signaled when there is data to be inspected and displayed. You also need the inbound endpoint, UsbDeviceConnection and a thread synchronizing object.

The bulkTransfer method is used to receive data on the inbound endpoint and some initialization occurs to make sure control of the inbound endpoint is in place via controlTransfer method.

package com.llsc.androidusb;

import android.hardware.usb.UsbDeviceConnection;
import android.hardware.usb.UsbEndpoint;
import android.os.Handler;
import android.util.Log;

public class UsbReceiver extends Thread {
        private final String TAG = UsbReceiver.class.getSimpleName() + MainActivity.version;
        private boolean urStop = false;
        private Handler mHandler;
        private Object syncToken;
        private UsbDeviceConnection mConnection;
        private UsbEndpoint epIn;
        private String msg;
        final int RQSID_SET_LINE_CODING = 0x20;
        final int RQSID_SET_CONTROL_LINE_STATE = 0x22;

        UsbReceiver(UsbDeviceConnection mConnection, UsbEndpoint epIn, Handler mh,
                        Object st) {
                this.mConnection = mConnection;
                this.epIn = epIn;
                mHandler = mh;
                syncToken = st;
                Log.i(TAG,"UsbReceiver(): Constructor initialized object instance");
        }

        public void stopUsbReceiver() {
                synchronized (syncToken) {
                  urStop = true;
                }
        }

        @Override
        public void run() {
                byte[] buf = new byte[64];         // The Arduino MEGA 2256 sends 64byte messages
                StringBuilder sb = new StringBuilder();
                if (mConnection == null) return;
                // send DTR
                int result = mConnection.controlTransfer(0x21, // requestType
                                // usbResult = usbDeviceConnection.controlTransfer(0x40, //
                                // requestType
                                RQSID_SET_CONTROL_LINE_STATE, // SET_CONTROL_LINE_STATE
                                0x01, // value or in 0x01 for DTR or 0x02 for RTS
                                0, // index
                                null, // buffer
                                0, // length
                                0); // timeout
        Log.i(TAG,"run(): USB Receiver thread set up");
                while (true) {
                                if (urStop) {
                                        break;
                                }
                                if (mConnection == null) return;
                                if (epIn == null) return;

                        result = mConnection.bulkTransfer(epIn, buf, buf.length, 1000);

/*                      int nresult = mConnection.controlTransfer(0x21, // requestType
                                        // usbResult = usbDeviceConnection.controlTransfer(0x40, //
                                        // requestType
                                        RQSID_SET_CONTROL_LINE_STATE, // SET_CONTROL_LINE_STATE
                                        0x00, // value or in 0x01 for DTR or 0x02 for RTS
                                        0, // index
                                        null, // buffer
                                        0, // length
                                        0); // timeout
*/
                        // A full message will be terminated with a Newline character '\n'
                        // so read all characters collected so far. If newline detected that's the
                        // end of the message so store full message send signal to main thread and
                        // create new StringBuilder to get next message, break out of loop 
                        if (result > 0) {
                                for (int i = 0; i < result; i++) {
                                        if (buf[i] != 0x0a) {
                                                sb.append(String.format("%c", buf[i]));
                                        } else {
                                                synchronized (syncToken) {
                                                        msg = sb.toString();
                                                }
                                                sb = new StringBuilder();
                                                mHandler.sendEmptyMessage(0);
                                                break; // There is a question whether there is new valid data 
                                                       // after newline but am assuming not for now
                                        }
                                }
                        }
                }
        }

        public String getMsg() {
                synchronized (syncToken) {
                        return msg;
                }
        }

}

ArduinoUsbTest - Arduino code to accept a message on USB and spit it back out.

This is just some simple arduino code to spit back the incoming message. It also toggles the LED so you have some feed back that data was received and a round trip message was attempted.

/**
 * AndroidUsbTest - testing platform for usb communication with 
 *                  Android usb host
 */
char charRead = 0;
char buffer[41];
char printout[20];  //max char to print: 20
int ledPin = 13;

void setup() {
  // put your setup code here, to run once:
  //Setup Serial Port with baud rate of 9600
  Serial.begin(9600);
    pinMode(ledPin, OUTPUT);
  digitalWrite(ledPin,LOW);
  buffer[0] = 0;
}

void loop() {
  // put your main code here, to run repeatedly:
    while (Serial.available()) {
      int chrs = Serial.readBytesUntil('\n',buffer,40);
      buffer[chrs] = 0;
    }

  if (buffer[0] != 0) {
    digitalWrite(ledPin, digitalRead(ledPin) ^ 1);   // toggle LED pin
    Serial.print("How are you, ");
    Serial.println(buffer);
    buffer[0] = 0;
  }

}

Conclusion

This article is somewhat hastily put together because I spent a large amount of part-time effort on the code. I wanted to get the code out there in the hopes that it will be helpful to people and save them some time setting up Android USB communication. As time allows I will try to refine the article be a little more explanatory. For now hopefully the code and the comments will assist you in getting your USB project going.

All the references below were extremly helpful. Much of what I learned was gleaned from reference 3 with some mix of reference 4. If I could simplify code to match the Android UsbHost API documentation I did so where I could. What this code does over the other applications is it provides expected functionality if you should disconnect and reconnect the USB. This was troublesome to get working to my expectations on how I believe a connected device should work with an Activity.

Finally the Android UsbHost API allows you specify intents in the xml manifest. This will certainly make the application come up when a board is connected but I had a great deal of difficulty making it work well when things are already plugged in or when connect and disconnect occurs repeatedly.

Zipped Eclipse Project Containing Source Code

Author: Nasty Old Dog

Validate