AbstractThis paper describes how to extract stock quote data and display it with a dynamic update (using free, but delayed data streams). As a part of the architecture of the program, we make use of the observer, mediator and command design patterns. We also make use of the JTable and JTableModel. Finally we show how a multi-threaded update can be performed, using the web as a data source. The methodology for converting the web data source into internal data structures is based on using HTML as input. This type of screen scraping has become a popular means of inputting data into programs. Simple string manipulation techniques are shown to be adequit for these types of well-formed data streams. 1 THE PROBLEMStock price data is generally available on the web (using a browser to format the HTML data). Given an HTML data source, we would like to find a way to create an underlying data structure that is type-safe and well formulated. We are motivated to study these problems for a variety of reasons. Firstly, for the purpose of conducting empirical studies, entering the data into the computer by hand is both error-prone and tedious. We seek a means to obtain this data, using free data feeds, so that we can build dynamically-updated displays and perform data mining functions. Secondly, we find that easy to parse data enables us to teach our students the basic concepts of data mining using a simple first example. This example is used in a first course on network programming. 2 FINDING THE DATAFinding the data, on-line, and free, is a necessary first step toward this type of data mining. Quotes have been available for years, from Yahoo. We can obtain the quotes, using comma-separated values (CSV) by constructing a URL and entering it into a browser. For example: This creates an output on the screen that looks like: The quote is typically returned in the form: Before the market opens, these numbers return N/A. US markets are open from 9:30am-4:00pm, Eastern. If, in extended hours of trading (8am-8pm Eastern), there is trading (as shown for AAPL) the extended hours volume is listed. However, the price is as of the close of business + settlement time (which accounts for why CY was listed at 4:05pm). To synthesize the URL needed to get the data, we use: In order to fetch the data, given the URL, we write a simple helper utility that contains: The goal of such a program is to convert the URL into text, with one string per line, as retrieved from the web page. This is the core of the data retrieval phase. 3 ANALYSISIn order to process the text data we need to decide how we are going to store and parse the data. To store the quote we create a Quote class: With appropriate getters and setters. To store multiple quotes, we create a container class that has high-level quote processing methods: At this point, we build an ad-hoc parsing facility, based on a combination of string processing and a StringTokenizer : The ad-hoc nature of the parsing scheme is even more evident when we examine the string manipulations needed to obtain low-level data types: The locally handled error handling can cause a general failure of robustness in the code. Of particular concern is what to do when the data format from the service provider changes. I suspect there will be a large cost associated with such a change, in terms of code rework. 4 DISPLAYWe are interested in a new "killer application" for development, called the JAddressBook program. This program is able to display stock quotes (and manage an address-book, dial the phone, print labels, do data-mining, etc.). The program can be run (as a web start application) from: And provides an interactive GUI for stock data. Figure 4-1. The Stock Quote Viewer Stock quotes are updated, dynamically, using a JTable and JTableModel, in Swing. Figure 4-1 shows an image of the JTable. In order to alter the list of stocks, the user selects the Edit:Customize menu. This displays a dialog box, shown in Figure 4-2. The list of stock symbols is stored in user preferences. Figure 4-2. Stock Symbol Dialog Figure 4-2 shows an image of the stock symbol dialog. 5 IMPLEMENTING A JTABLEMODELThe JTable is updated, dynamically, by a threaded update to a JTableModel. The model is created by subclassing the AbstractTableModel as shown below: When the fireTableDataChanged method is invoked, observers of the TableModel use the getValueAt in order to update the cells in the table view. 6 JTABLEThe StockTable class uses a has-a relationship with the StockTableModel and the JTable. Since it connects the observer with the observable, we say that it has the role of the mediator; The command design pattern is used for the menu items, as described in [ Lyon 04B]. The RunJob uses the command design pattern applied to threads (known as Project Imperion ) and is described in [ Lyon 04C]. 7 RESOURCE BUNDLINGThe stock symbols are stored in a serialized bean called the TickerSymbolsBean: This makes use of the resource bundling techniques described in [ Lyon 05A]. 8 CONCLUSIONWe show how an ad-hoc parsing technique can be used to obtain data from the web. The problem is that the way data is represented on the web is not consistent. Some sites are well-formed sources of data and others are verbose. For more sophisticated data mining tasks systems like NoDoSE are worth exploring [Adelberg]. The basic idea of obtaining data from the web and parsing it is a powerful concept. The web is a huge and growing source of data. However, we have found that relying upon others to keep data consistent, as a function of time, can create problems in a data-mining program. We have used Yahoo as a source of stock quotes and it has been reliable, both for the quality of service and its consistency of format. However, other sources of data have not been nearly as stable. The question of how to deal with ever-changing data formats remains open. There are many sources of financial data on the web. For example, the Chicago Board Options Exchange (CBOE) now has a means to query option volume. Yahoo finance has historical end-of-day market data on individual stocks. These data sources can be very useful for the purpose of writing empirical finance papers or back testing trading strategies. The implementation of the data mining mechanism for these alternative sources of data remains a topic of future work. REFERENCES[Adelberg] Brad Adelberg, "NoDoSE–a tool for semi-automatically extracting structured and semistructured data from text documents", pp. 283-294, In SIGMOD 1998. [ Lyon 04B] "Project Imperion: New Semantics, Facade and Command Design Patterns for Swing" by Douglas A. Lyon, Journal of Object Technology, vol. 3, no. 5, May-June 2004, pp. 51-64. http://www.jot.fm/issues/issue_2004_05/column6 [ Lyon 04C] "The Imperion Threading System" by Douglas A. Lyon, Journal of Object Technology. vol. 3, no. 7, July-August 2004, pp. 57-70. http://www.jot.fm/issues/issue_2004_07/column5 [ Lyon 05A] "Resource Bundling for Distributed Computing," by Douglas A. Lyon, Journal of Object Technology, vol. 4, no. 1, January-February 2005, pp. 45-58. http://www.jot.fm/issues/issue_2005_01/column4 About the author
Cite this column as follows: Douglas A. Lyon "Displaying Updated Stock Quotes", in Journal of Object Technology, vol. 6. no. 8. September-October 2007, pp. 19-31 http://www.jot.fm/issues/issue_2007_09/column2 |