Atomic
|
PC Authority
|
CRN Australia
|
iTNews
|
PC Authority Business Centre
|
SC Magazine
|
careers
Servers
PCs and components
Notebooks
Storage
Printers
PDA/Phone
Software/Applications
Security
Networking
Internet/Comms
Site Map
|
Newsletter
|
RSS
Search
all categories
Servers
PCs and components
Notebooks
Storage
Printers
PDA/Phone
Software/Applications
Security
Networking
Internet/Comms
Home
Breaking News
category
Servers
PCs and components
Notebooks
Storage
Printers
PDA/Phone
Software/Applications
Security
Networking
Internet/Comms
marketplace
compare prices
browse
Reviews
Features
In the labs
Downloads
A-List
Labs Winners
Recommended
newsletter
Register your email for our weekly roundup of business news, product reviews and articles that matter to business.
about us
advertise
contact us
magazine
how we test
Site map
Feedback
Home
>
Internet/Comms
> The future of the internet according to Tim Berners-Lee
The future of the internet according to Tim Berners-Lee
By Tim Dean
Email to a friend
Print this story
Imagine doing a search on Google for 'ultimate bass'. What will you get back? A description of a hand crafted musical instrument? Or a list of the best fishing locations?
This problem stems from the very way the World Wide Web was designed. HTML, which is the language that underpins the web, has proven a tremendously powerful tool for displaying information, but ultimately it's only a set of rules on how to format information in a way
we
can read. There are tags for headings, tables, bold text, images and displaying coloured backgrounds, but there's nothing in HTML to indicate what the information it's displaying is actually
about
.
Now, imagine if there was a way to specify on a web page that the usage of the term 'bass' actually referred to musical instruments rather than aquatic animals. And not only that, but the other information on the site, like the materials it was made from, the electronics, as well as the feedback from readers about their favourite instruments, was all encoded in a way that machines could understand.
Then a search engine wouldn't just scan over the words on the surface of the page, it would delve deeper and start making relations between these different concepts.
Your searches could then get a lot more specific, like 'ultimate 5-string electric bass', or even better, they could be more vague, like 'most popular 5-string bass under $2000 for playing jazz'.
Adding meaning
This is the promise of the semantic web. It's like today's World Wide Web, except instead of only being readable by humans, it also has meaning for machines, thus enabling them to bring their considerable data crunching might to our advantage. It essentially turns the web into one giant, interconnected database that we can query to our leisure.
The pioneer of the semantic web is none other than the creator of the World Wide Web himself, Tim Berners-Lee. He paints a picture in his original 1998 paper, Semantic Web Roadmap:
'The Web was designed as an information space, with the goal that it should be useful not only for human-human communication, but also that machines would be able to participate and help. One of the major obstacles to this has been the fact that most information on the Web is designed for human consumption, and... that the structure of the data is not evident to a robot browsing the web.'
The big question is this: how do you make machines understand what you're talking about?
Putting everything on the web
The first step is giving things a machine-readable name, called a URI (Uniform Resource Identifier). This is because it's easy enough for us to understand what is meant by 'instrument' or 'Fred Bloggs', but to a computer, they're just strings of characters. The URI follows a set syntax and
represents
the thing, or 'resource', it identifies.
We already use one type of URI regularly, the URL (Uniform Resource Locater). A URL not only identifies a resource, but it also points to where it is, in this case, a web page. The big difference between a URI and a URL is that a URI is simply a machine readable moniker for something, whether that thing is accessible online or not. As such, a particular musical instrument or Fred Bloggs might have URIs, even though they aren't accessible through the web.
There are still some issues surrounding URIs and their usage, however. One big problem is that because anyone can create a URI about anything, there's a very real risk that many URIs might exist that identify the same object, such as a person or place. The question is how to mediate between these URIs and have the machines realise they actually all point to the same thing.
Machine reading
Once things have a machine-readable name, the next step is to be able to actually say stuff about these things in a machine-readable way. This is done through that perennial web favourite, XML (Extensible Markup Language). The beauty of XML is that it can be whatever you want it to be, and in the case of the sematic web it's shaped and moulded into the framework for adding meaning to raw data.
One of the easiest ways to give something meaning is to place it with like objects and group them in a category. Thus as soon as you put 'bass' in the category 'musical instrument', it is immediately distinguished from the other 'bass', which might be in the category 'fish'.
XML allows you to create these categories as markup tags -- also known as 'elements'. So, all you need to do is stick the term 'bass' between an XML 'musical instrument' tag, like this:
<musical instrument>bass</musical instrument>
And the computer knows that 'bass' belongs to the category 'musical instrument'. The computer will also be able to make sense of the following statement:
I like to play my <musical instrument>bass</musical instrument> while eating <fish>bass</fish>.
It's also worth reiterating that all these terms, like 'musical instrument' and 'bass', will have their own URIs to uniquely identify them in a machine-readable way rather than just being text.
Once this XML framework is in place, you then need to actually define all these categories, and the relationships between them, such as placing 'bass' in the category 'musical instrument', and '5-string' in the category 'bass'. This is done in an XML namespace, which is a dedicated document specifically for this purpose. Each document on the web then refers back to an XML namespace to have its particular XML markup make sense.
One feature of the semantic web as it's characterised today is that there will be many XML namespaces - as many as there are people willing to create one. This suits the open nature of the web that the W3C wish to preserve, but it also creates the problem of reconciling different namespaces and ironing out inconsistencies and contradictions.
Speaking in URIs
Now that we have all these objects with nice machine-readable URIs, and a comprehensive XML namespace that places them all in relation to each other, we can then start to create machine-readable sentences that actually make sense.
This is done by writing a sentence in XML with three parts, a subject, an object and a predicate linking the two together. Of course, each of these parts would be represented by machine-readable URIs.
Thus we could write the sentence 'Tim plays bass' as:
<http://timdean.name> <http://namespace.com/terms/plays> <http://namespace.com/terms/bass>
This syntax of stringing together URIs in meaningful ways is called RDF (Resource Description Framework), and this is where the semantic web really gets its smarts.
What is ontology?
However, we need just one more piece of the puzzle to make these sentences make sense. The computer has already been told that 'bass' is in the category 'musical instrument', but it needs to understand what a 'category' is itself. This means we need a system that defines and describes the relationship between all these terms like 'category', 'sub-category', 'properties', 'ranges' etc, so that the machine knows how to process the information in the RDF statement.
These systems are called ontologies. There are a number of ontologies that have been developed to date, including DAML (DRPA agent markup language), which was developed by DARPA (Defence Advanced Research Projects Agency). This is also often combined with OIL (Ontology Inference Layer) to make DAML+OIL. The W3C is also working on its own ontology based on DAML+OIL called OWL (Web Ontology Language).
Making sense of it all
So now we have a web populated with objects, each with their own unique URI, structured in RDF statements written in XML, and all bound together by an ontology. Suddenly that old web that was only readable by us, is now quite transparent to machines as well. And this is where the semantic web really kicks into gear.
Because the semantic web is all in a machine readable format, with a structured syntax, the machine can then use conventional logic to make inferences based on the information it finds. For example, because a search engine can now understand the difference between the two types of 'bass', and it can make sense of 'materials', 'price' etc, it can answer my question of what is the 'most popular 5-string bass under $2000 for playing jazz'.
However, pity the poor machine when it comes to trying to arbitrate between dozens, if not hundreds, of conflicting XML namespaces and RDF statements, so this is where the final piece of the puzzle comes in: trust.
Tim Berners-Lee anticipated the problem with conflicting and contradicting statements grinding things to a halt, so he suggested we create our own 'webs of trust' using digital certificates. In this way, we all digitally sign our online content, and other individuals can then specify to what extent they trust (or distrust) us. Your computer will then take these networks of trust into account when ferreting around for reliable information to answer your queries.
And there you have it -- an intelligent world wide web.
So, where is it? It's been in development for eight years now, and while the technology is now more or less in place, the semantic web has yet to take off. According to Tim Berners-Lee, it's just a matter of time before people appreciate the potential of the semantic web and start coding in XML and RDF, and it'll eventually reach critical mass, as did the World Wide Web.
When (and if) that happens, though, will take more than a smart machine to guess.
Links
www.w3.org
The World Wide Web Consortium's home page is your first stop for anything to do with Web standards.
www.w3.org/DesignIssues/Semantic.html
Check out the original brief for the semantic web, as penned by Tim Berners-Lee in 1998.
www.w3.org/TR/owl-features/
For the ontologically inclined, the W3C has also published a guide to its Web Ontology Language, OWL.
www.cyc.com
The Cyc project is another engine that attempts to create a database of human knowledge that can be accessed with natural language.
Email to a friend
Print this story
Related Features
No Related Features
A LIST - the best of the best
Printers
HP Color LaserJet CP3505x
Printers
Brother HL-4040CN
Networking
Clearswift MIMEsweeper CSW250
Storage
HP StorageWorks Ultrium 1840
Servers
Evesham SilverEDGE 1000SL