Tracking
Tracking technology and projects.

 


















Subscribe to "Tracking" in Radio UserLand.

Click to see the XML version of this web page.

Click here to send an email to the editor of this weblog.

 

 

  Tuesday, August 8, 2006


While text mining 330,000 New York Times articles poses an interesting challenge, it's not as interesting as sifting through 70 million words (from over 70,000 unique documents) found in the Congressional Record. A team of political science researchers has done just that (PDF), and found that their software was able to answer questions too difficult for humans to handle on their own.

The Congressional Record is a unique source of political information. It contains verbatim transcripts of floor speeches made in both the House and the Senate and provides a view of political debate far more nuanced than the one provided by election returns, opinion polls, and vote counts.

But how to make use of that vast treasure trove of words? The research team notes that the Record has rarely been used as a source for analysis because "it contains too much information to absorb manually." Even with a large team of grad students at their disposal, researchers find it difficult to tag more than a small subset of the speeches in question, and computers have not traditionally been useful for mining text.


12:50:44 PM    

Tracking the Congressional Attention Span. Turismo writes "Ars Technica covers a new research project that uses computers to look at 70 million words from the Congressional Record. The project's goal was to track what our representatives were talking about at any given time, and researchers were able to do it without human training or intervention. From the article: '...researchers found, for instance, that "judicial nominations" have consumed steadily more Congressional attention between 1997 and 2004. In fact, the topic produced the most number of words published in a single "day" of the Congressional Record: 230,000 on November 12, 2003.' It looks like automated topic analysis has truly arrived."[Slashdot]

Editor: Just remember, not everything in the Congressional Record was actually said by that person. They are allowed to include large amounts of prepared statements in written form in such a way that you can't tell the difference.

12:48:15 PM    

Did AOL Betray Its Users? In what could be a breach of federal law, AOL releases search logs on 650,000 users to researchers. While the files are down, they are in the wild and lawyers may be circling. In 27B Stroke 6. [Wired News: Top Stories]
12:27:56 PM    

Ray Beckerman of Recording Industry vs. The People put together an article that explains how the RIAA's militant enforcement arm legal team find, obtain records on and sue ISP account holders who may or may not have ever been users of P2P applications. It's a great reference, but (no offense intended to Ray) it's dry like a bread-sandwitch.

I decided to take a stab at rewriting it in something closer to English than lawyer. In hopes that it would be more accessible.

So, with thanks to Ray Beckerman, let's take a look at The RIAA vs. John Doe, in what I hope serves as a layperson's guide to filesharing lawsuits.

11:24:35 AM    

We probably shouldn't hold our breath waiting for the civil liberties implications of this to dawn on Gordon, but the complexities and impracticalities of actually doing it will likely come to his attention sooner. How would the check be set up? Would warrants on the police national computer be matched by an automatic flagging of the individual on the NIR? No, because the police don't necessarily want everybody to know who they're looking for, and the 'automagic' linking would be a pig to set up, considering the current state of police systems. What would happen when a fugitive was IDed at POS? Tricky one this - you can't safely alert the checkout operative, or the potentially dangerous terrorist currently buying a kumquat. So it has to be an alert tripped at the NIR level and then a further alert has to go to the police response centre covering the area, then a patrol vehicle has to be alerted... Need we go on? By the time it gets to the response centre you need to have time, location, name and nature of the suspect, and he'll be long gone.

Aside from the obvious technical issues, there's the problem of convincing businesses - what's in it for them? Identity fraud, the Government keeps telling us, is a major concern (but apparently not major enough to warrant the Government measuring it properly) and needs to be fought. Banks, credit card companies and major retailers however aren't automatically going to line up behind 'rock solid ID' at any cost, and nor will their customers. Yes, ID fraud is a cost to business and an inconvenience for the victims, but the costs are bearable, and the more security you have in a system, the more inconvenient it's likely to become. So there's a pretty strong argument that businesses think that they've got just about the right level of security now, and that they can keep losses within boundaries and absorb them as a cost of business. If an ID check at POS didn't take any time and was 100 per cent reliable and didn't require new hardware investment and cost virtually nothing, then maybe they'd see it as useful. Otherwise?

In addition to this, businesses aren't likely to want to trust the accuracy, reliability and security of Government systems. The banks and credit card companies have run customer databases for years, generally fairly effectively and with relatively few security breaches. More recently the supermarkets have got fairly cute at running loyalty schemes, and while these can be vaguely sinister, they're voluntary, and there are limits to what the supermarkets can do with them without triggering massive PR disasters. Government, on the other hand, has shown itself incapable of getting absentee parents to pay for their children's upkeep, while Gordon Brown's own department is the one that gives away money on the Internet after massive ID theft from a Government department. Really, no sensible business that knows what it's doing as regards networks and personal data is going to want to play with these people unless the law forces it to.


11:20:17 AM    

The UK's Total Surveillance. Budenny writes "The Register has a story in its ongoing coverage of the UK ID Card story. This one suggests, with links to a weekend news story, that the Prime Minister in waiting has bought the idea that all electronic transactions in the UK should be linked to a central government/police database. Every cash withdrawal, every credit card purchase, ever loyalty card use ... And that data should flow back from the police database to (eg) a loyalty card use. So, for example, not only would the government know what books you were buying, but the bookstore would also know if you had an outstanding speeding ticket!" [Slashdot: Your Rights Online]
11:14:01 AM    

Weblogs, Inc. CEO Tells His AOL Bosses To "Not Keep Logs of Search Data".

Jason Calacanis is CEO of blogging network Weblogs, Inc., which AOL bought last year. In light of AOL's disclosure of 658,000 users' search queries, Calacanis publicly denounced this massive privacy violation and gave his bosses one clear message: "Frankly, I want us to NOT KEEP LOGS of our search data" (emphasis, his).

Exactly -- as discussed in our "Best Practices" white paper, online service providers shouldn't be keeping these kinds of logs. Voluntary limits on data retention would help prevent another Data Valdez like AOL's, but Congress should also strengthen and clarify privacy protections.

[EFF: Deep Links]
11:02:12 AM    

Will AOL Flap Help Privacy Awareness?

Might AOL[base ']s release of the logs of nearly 20 million web searches documenting three months of activity by 650,000 AOL users serve to raise awareness of the privacy concerns with web search surveillance (that I[base ']ve been writing about forever)? Seth Finkelstein hopes so, but also warns that the potential abuse of the released data by hackers and big business might be even worse than what we were concerned about when the DOJ asked for it:

AOL has just given us the world[base ']s biggest real-world experiment as to whether privacy invasion can be done from search-engine data. Previously, when discussing the Google Search subpoena, all people could do was speculate - the data might have this, it could include that, maybe possibly someone could do this from it. Now we have both a huge amount of data, and many interested geeks playing with it and mining it.

I joked we[base ']ll now see a huge distributed reverse-engineering collaborative effort to track down as many anonymous user ID[base ']s as possible. At least, I hope that was joke. Maybe it wasn[base ']t.

Note this data is being far, far, more widely released than the subpoena data, which would have been under confidentiality agreements and protective orders. Worrying about Big Government can be a distraction over far worse Big Corporations.

[michaelzimmer.org]
10:59:24 AM    


Click here to visit the Radio UserLand website. © Copyright 2006 Paul Hardwick.
Last update: 9/2/06; 4:21:33 AM.

August 2006
Sun Mon Tue Wed Thu Fri Sat
    1 2 3 4 5
6 7 8 9 10 11 12
13 14 15 16 17 18 19
20 21 22 23 24 25 26
27 28 29 30 31    
Jul   Sep