Sunday, May 10, 2009

Internet


The Internet is a global network of interconnected computers, enabling users to share information along multiple channels. Typically, a computer that connects to the Internet can access information from a vast array of available servers and other computers by moving information from them to the computer's local memory. The same connection allows that computer to send information to servers on the network; that information is in turn accessed and potentially modified by a variety of other interconnected computers. A majority of widely accessible information on the Internet consists of inter-linked hypertext documents and other resources of the World Wide Web (WWW). Computer users typically manage sent and received information with web browsers; other software for users' interface with computer networks includes specialized programs for electronic mail, online chat, file transfer and file sharing.

The movement of information in the Internet is achieved via a system of interconnected computer networks that share data by packet switching using the standardized Internet Protocol Suite (TCP/IP). It is a "network of networks" that consists of millions of private and public, academic, business, and government networks of local to global scope that are linked by copper wires, fiber-optic cables, wireless connections, and other technologies.

Terminology

The terms Internet and World Wide Web are often used in every-day speech without much distinction. However, the Internet and the World Wide Web are not one and the same. The Internet is a global data communications system. It is a hardware and software infrastructure that provides connectivity between computers. In contrast, the Web is one of the services communicated via the Internet. It is a collection of interconnected documents and other resources, linked by hyperlinks and URLs.

The term the Internet, when referring to the Internet, has traditionally been treated as a proper noun and written with an initial capital letter. There is a trend to regard it as a generic term or common noun and thus write it as "the internet", without the capital.

Creation and History


The USSR's launch of Sputnik spurred the United States to create the Advanced Research Projects Agency, known as ARPA, in February 1958 to regain a technological lead.[2][3] ARPA created the Information Processing Technology Office (IPTO) to further the research of the Semi Automatic Ground Environment (SAGE) program, which had networked country-wide radar systems together for the first time. J. C. R. Licklider was selected to head the IPTO, and networking as a potential unifying human revolution.

Licklider moved from the Psycho-Acoustic Laboratory at Harvard University to MIT in 1950, after becoming interested in information technology. At MIT, he served on a committee that established Lincoln Laboratory and worked on the SAGE project. In 1957 he became a Vice President at BBN, where he bought the first production PDP-1 computer and conducted the first public demonstration of time-sharing.

At the IPTO, Licklider got Lawrence Roberts to start a project to make a network, and Roberts based the technology on the work of Paul Baran,[4] who had written an exhaustive study for the U.S. Air Force that recommended packet switching (as opposed to circuit switching) to make a network highly robust and survivable. After much work, the first two nodes of what would become the ARPANET were interconnected between UCLA and SRI (later SRI International) in Menlo Park, California, on October 29, 1969. The ARPANET was one of the "eve" networks of today's Internet.

Following on from the demonstration that packet switching worked on the ARPANET, the British Post Office, Telenet, DATAPAC and TRANSPAC collaborated to create the first international packet-switched network service. In the UK, this was referred to as the International Packet Switched Service (IPSS), in 1978. The collection of X.25-based networks grew from Europe and the US to cover Canada, Hong Kong and Australia by 1981. The X.25 packet switching standard was developed in the CCITT (now called ITU-T) around 1976.

X.25 was independent of the TCP/IP protocols that arose from the experimental work of DARPA on the ARPANET, Packet Radio Net and Packet Satellite Net during the same time period. Vinton Cerf and Robert Kahn developed the first description of the TCP protocols during 1973 and published a paper on the subject in May 1974. Use of the term "Internet" to describe a single global TCP/IP network originated in December 1974 with the publication of RFC 675, the first full specification of TCP that was written by Vinton Cerf, Yogen Dalal and Carl Sunshine, then at Stanford University. During the next nine years, work proceeded to refine the protocols and to implement them on a wide range of operating systems.

The first TCP/IP-based wide-area network was operational by January 1, 1983 when all hosts on the ARPANET were switched over from the older NCP protocols. In 1985, the United States' National Science Foundation (NSF) commissioned the construction of the NSFNET, a university 56 kilobit/second network backbone using computers called "fuzzballs" by their inventor, David L. Mills. The following year, NSF sponsored the conversion to a higher-speed 1.5 megabit/second network. A key decision to use the DARPA TCP/IP protocols was made by Dennis Jennings, then in charge of the Supercomputer program at NSF.

The opening of the network to commercial interests began in 1988. The US Federal Networking Council approved the interconnection of the NSFNET to the commercial MCI Mail system in that year and the link was made in the summer of 1989. Other commercial electronic e-mail services were soon connected, including OnTyme, Telemail and Compuserve. In that same year, three commercial Internet service providers (ISP) were created: UUNET, PSINet and CERFNET. Important, separate networks that offered gateways into, then later merged with, the Internet include Usenet and BITNET. Various other commercial and educational networks, such as Telenet, Tymnet, Compuserve and JANET were interconnected with the growing Internet. Telenet (later called Sprintnet) was a large privately funded national computer network with free dial-up access in cities throughout the U.S. that had been in operation since the 1970s. This network was eventually interconnected with the others in the 1980s as the TCP/IP protocol became increasingly popular. The ability of TCP/IP to work over virtually any pre-existing communication networks allowed for a great ease of growth, although the rapid growth of the Internet was due primarily to the availability of commercial routers from companies such as Cisco Systems, Proteon and Juniper, the availability of commercial Ethernet equipment for local-area networking, and the widespread implementation of TCP/IP on the UNIX operating system.

Growth


Although the basic applications and guidelines that make the Internet possible had existed for almost two decades, the network did not gain a public face until the 1990s. On 6 August 1991, CERN, a pan European organisation for particle research, publicized the new World Wide Web project. The Web was invented by English scientist Tim Berners-Lee in 1989.

An early popular web browser was ViolaWWW, patterned after HyperCard and built using the X Window System. It was eventually replaced in popularity by the Mosaic web browser. In 1993, the National Center for Supercomputing Applications at the University of Illinois released version 1.0 of Mosaic, and by late 1994 there was growing public interest in the previously academic, technical Internet. By 1996 usage of the word Internet had become commonplace, and consequently, so had its use as a synecdoche in reference to the World Wide Web.

Meanwhile, over the course of the decade, the Internet successfully accommodated the majority of previously existing public computer networks (although some networks, such as FidoNet, have remained separate). During the 1990s, it was estimated that the Internet grew by 100% per year, with a brief period of explosive growth in 1996 and 1997. This growth is often attributed to the lack of central administration, which allows organic growth of the network, as well as the non-proprietary open nature of the Internet protocols, which encourages vendor interoperability and prevents any one company from exerting too much control over the network.

Using various statistics, AMD estimated the population of internet users to be 1.5 billion as of January 2009.

Internet protocols

The complex communications infrastructure of the Internet consists of its hardware components and a system of software layers that control various aspects of the architecture. While the hardware can often be used to support other software systems, it is the design and the rigorous standardization process of the software architecture that characterizes the Internet.

The responsibility for the architectural design of the Internet software systems has been delegated to the Internet Engineering Task Force (IETF). The IETF conducts standard-setting work groups, open to any individual, about the various aspects of Internet architecture. Resulting discussions and final standards are published in Requests for Comments (RFCs), freely available on the IETF web site.

The principal methods of networking that enable the Internet are contained in a series of RFCs that constitute the Internet Standards. These standards describe a system known as the Internet Protocol Suite. This is a model architecture that divides methods into a layered system of protocols (RFC 1122, RFC 1123). The layers correspond to the environment or scope in which their services operate. At the top is the space (Application Layer) of the software application, e.g., a web browser application, and just below it is the Transport Layer which connects applications on different hosts via the network (e.g., client-server model). The underlying network consists of two layers: the Internet Layer which enables computers to connect to one-another via intermediate (transit) networks and thus is the layer that establishes internetworking and the Internet, and lastly, at the bottom, is a software layer that provides connectivity between hosts on the same local link (therefor called Link Layer), e.g., a local area network (LAN) or a dial-up connection. This model is also known as the TCP/IP model of networking. While other models have been developed, such as the Open Systems Interconnection (OSI) model, they are not compatible in the details of description, nor implementation.

The most prominent component of the Internet model is the Internet Protocol (IP) which provides addressing systems for computers on the Internet and facilitates the internetworking of networks. IP Version 4 (IPv4) is the initial version used on the first generation of the today's Internet and is still in dominant use. It was designed to address up to ~4.3 billion (109) Internet hosts. However, the explosive growth of the Internet has led to IPv4 address exhaustion. A new protocol version, IPv6, was developed which provides vastly larger addressing capabilities and more efficient routing of data traffic. IPv6 is currently in commercial deployment phase around the world.

IPv6 is not interoperable with IPv4. It essentially establishes a "parallel" version of the Internet not accessible with IPv4 software. This means software upgrades are necessary for every networking device that needs to communicate on the IPv6 Internet. Most modern computer operating systems are already converted to operate with both versions of the Internet Protocol. Network infrastructures, however, are still lagging in this development

Internet structure

There have been many analyses of the Internet and its structure. For example, it has been determined that both the Internet IP routing structure and hypertext links of the World Wide Web are examples of scale-free networks.

Similar to the way the commercial Internet providers connect via Internet exchange points, research networks tend to interconnect into large subnetworks such as the following:

* GEANT
* GLORIAD
* The Internet2 Network (formally known as the Abilene Network)
* JANET (the UK's national research and education network)

These in turn are built around relatively smaller networks. See also the list of academic computer network organizations.

Computer network diagrams often represent the Internet using a cloud symbol from which network communications pass in and out.

ICANN

The Internet Corporation for Assigned Names and Numbers (ICANN) is the authority that coordinates the assignment of unique identifiers on the Internet, including domain names, Internet Protocol (IP) addresses, and protocol port and parameter numbers. A globally unified namespace (i.e., a system of names in which there is at most one holder for each possible name) is essential for the Internet to function. ICANN is headquartered in Marina del Rey, California, but is overseen by an international board of directors drawn from across the Internet technical, business, academic, and non-commercial communities. The US government continues to have the primary role in approving changes to the root zone file that lies at the heart of the domain name system. Because the Internet is a distributed network comprising many voluntarily interconnected networks, the Internet has no governing body. ICANN's role in coordinating the assignment of unique identifiers distinguishes it as perhaps the only central coordinating body on the global Internet, but the scope of its authority extends only to the Internet's systems of domain names, IP addresses, protocol ports and parameter numbers.

On November 16, 2005, the World Summit on the Information Society, held in Tunis, established the Internet Governance Forum (IGF) to discuss Internet-related issues.

Language

The prevalent language for communication on the Internet is English. This may be a result of the Internet's origins, as well as English's role as a lingua franca. It may also be related to the poor capability of early computers, largely originating in the United States, to handle characters other than those in the English variant of the Latin alphabet.

After English (28.6% of Web visitors) the most requested languages on the World Wide Web are Chinese (20.3%), Spanish (8.2%), Japanese (5.9%), French and Portuguese (4.6%), German (4.1%), Arabic (2.6%), Russian (2.4%), and Korean (2.3%).[12]

By region, 41% of the world's Internet users are based in Asia, 25% in Europe, 16% in North America, 11% in Latin America and the Caribbean, 3% in Africa, 3% in the Middle East and 1% in Australia.[9]

The Internet's technologies have developed enough in recent years, especially in the use of Unicode, that good facilities are available for development and communication in most widely used languages. However, some glitches such as mojibake (incorrect display of foreign language characters, also known as kryakozyabry) still remain.

The Internet viewed on mobile devices

The Internet can now be accessed virtually anywhere by numerous means. Mobile phones, datacards, handheld game consoles and cellular routers allow users to connect to the Internet from anywhere there is a cellular network supporting that device's technology.

Within the limitations imposed by the small screen and other limited facilities of such a pocket-sized device, all the services of the Internet, including email and web browsing, may be available in this way. Service providers may restrict the range of these services and charges for data access may be significant, compared to home usage.

E-mail

The concept of sending electronic text messages between parties in a way analogous to mailing letters or memos predates the creation of the Internet. Even today it can be important to distinguish between Internet and internal e-mail systems. Internet e-mail may travel and be stored unencrypted on many other networks and machines out of both the sender's and the recipient's control. During this time it is quite possible for the content to be read and even tampered with by third parties, if anyone considers it important enough. Purely internal or intranet mail systems, where the information never leaves the corporate or organization's network, are much more secure, although in any organization there will be IT and other personnel whose job may involve monitoring, and occasionally accessing, the e-mail of other employees not addressed to them. Today you can send pictures and attach files on e-mail. Most e-mail servers today also feature the ability to send e-mail to multiple e-mail addresses.

The World Wide Web


Many people use the terms Internet and World Wide Web (or just the Web) interchangeably, but, as discussed above, the two terms are not synonymous.

The World Wide Web is a huge set of interlinked documents, images and other resources, linked by hyperlinks and URLs. These hyperlinks and URLs allow the web servers and other machines that store originals, and cached copies of, these resources to deliver them as required using HTTP (Hypertext Transfer Protocol). HTTP is only one of the communication protocols used on the Internet.

Web services also use HTTP to allow software systems to communicate in order to share and exchange business logic and data.

Software products that can access the resources of the Web are correctly termed user agents. In normal use, web browsers, such as Internet Explorer, Firefox and Apple Safari, access web pages and allow users to navigate from one to another via hyperlinks. Web documents may contain almost any combination of computer data including graphics, sounds, text, video, multimedia and interactive content including games, office applications and scientific demonstrations.

Through keyword-driven Internet research using search engines like Yahoo! and Google, millions of people worldwide have easy, instant access to a vast and diverse amount of online information. Compared to encyclopedias and traditional libraries, the World Wide Web has enabled a sudden and extreme decentralization of information and data.

Using the Web, it is also easier than ever before for individuals and organisations to publish ideas and information to an extremely large audience. Anyone can find ways to publish a web page, a blog or build a website for very little initial cost. Publishing and maintaining large, professional websites full of attractive, diverse and up-to-date information is still a difficult and expensive proposition, however.

Many individuals and some companies and groups use "web logs" or blogs, which are largely used as easily updatable online diaries. Some commercial organisations encourage staff to fill them with advice on their areas of specialization in the hope that visitors will be impressed by the expert knowledge and free information, and be attracted to the corporation as a result. One example of this practice is Microsoft, whose product developers publish their personal blogs in order to pique the public's interest in their work.

Collections of personal web pages published by large service providers remain popular, and have become increasingly sophisticated. Whereas operations such as Angelfire and GeoCities have existed since the early days of the Web, newer offerings from, for example, Facebook and MySpace currently have large followings. These operations often brand themselves as social network services rather than simply as web page hosts.

Advertising on popular web pages can be lucrative, and e-commerce or the sale of products and services directly via the Web continues to grow.

In the early days, web pages were usually created as sets of complete and isolated HTML text files stored on a web server. More recently, websites are more often created using content management or wiki software with, initially, very little content. Contributors to these systems, who may be paid staff, members of a club or other organisation or members of the public, fill underlying databases with content using editing pages designed for that purpose, while casual visitors view and read this content in its final HTML form. There may or may not be editorial, approval and security systems built into the process of taking newly entered content and making it available to the target visitors.

Remote access

The Internet allows computer users to connect to other computers and information stores easily, wherever they may be across the world. They may do this with or without the use of security, authentication and encryption technologies, depending on the requirements.

This is encouraging new ways of working from home, collaboration and information sharing in many industries. An accountant sitting at home can audit the books of a company based in another country, on a server situated in a third country that is remotely maintained by IT specialists in a fourth. These accounts could have been created by home-working bookkeepers, in other remote locations, based on information e-mailed to them from offices all over the world. Some of these things were possible before the widespread use of the Internet, but the cost of private leased lines would have made many of them infeasible in practice.

An office worker away from his desk, perhaps on the other side of the world on a business trip or a holiday, can open a remote desktop session into his normal office PC using a secure Virtual Private Network (VPN) connection via the Internet. This gives the worker complete access to all of his or her normal files and data, including e-mail and other applications, while away from the office.

This concept is also referred to by some network security people as the Virtual Private Nightmare, because it extends the secure perimeter of a corporate network into its employees' homes.

Collaboration

The low cost and nearly instantaneous sharing of ideas, knowledge, and skills has made collaborative work dramatically easier. Not only can a group cheaply communicate and share ideas, but the wide reach of the Internet allows such groups to easily form in the first place. An example of this is the free software movement, which has produced Linux, Mozilla Firefox, OpenOffice.org etc.

Internet "chat", whether in the form of IRC chat rooms or channels, or via instant messaging systems, allow colleagues to stay in touch in a very convenient way when working at their computers during the day. Messages can be exchanged even more quickly and conveniently than via e-mail. Extensions to these systems may allow files to be exchanged, "whiteboard" drawings to be shared or voice and video contact between team members.

Version control systems allow collaborating teams to work on shared sets of documents without either accidentally overwriting each other's work or having members wait until they get "sent" documents to be able to make their contributions.

Business and project teams can share calendars as well as documents and other information. Such collaboration occurs in a wide variety of areas including scientific research, software development, conference planning, political activism and creative writing.

File sharing

A computer file can be e-mailed to customers, colleagues and friends as an attachment. It can be uploaded to a website or FTP server for easy download by others. It can be put into a "shared location" or onto a file server for instant use by colleagues. The load of bulk downloads to many users can be eased by the use of "mirror" servers or peer-to-peer networks.

In any of these cases, access to the file may be controlled by user authentication, the transit of the file over the Internet may be obscured by encryption, and money may change hands for access to the file. The price can be paid by the remote charging of funds from, for example, a credit card whose details are also passed—hopefully fully encrypted—across the Internet. The origin and authenticity of the file received may be checked by digital signatures or by MD5 or other message digests.

These simple features of the Internet, over a worldwide basis, are changing the production, sale, and distribution of anything that can be reduced to a computer file for transmission. This includes all manner of print publications, software products, news, music, film, video, photography, graphics and the other arts. This in turn has caused seismic shifts in each of the existing industries that previously controlled the production and distribution of these products.

Streaming media

Many existing radio and television broadcasters provide Internet "feeds" of their live audio and video streams (for example, the BBC). They may also allow time-shift viewing or listening such as Preview, Classic Clips and Listen Again features. These providers have been joined by a range of pure Internet "broadcasters" who never had on-air licenses. This means that an Internet-connected device, such as a computer or something more specific, can be used to access on-line media in much the same way as was previously possible only with a television or radio receiver. The range of material is much wider, from pornography to highly specialized, technical webcasts. Podcasting is a variation on this theme, where—usually audio—material is downloaded and played back on a computer or shifted to a portable media player to be listened to on the move. These techniques using simple equipment allow anybody, with little censorship or licensing control, to broadcast audio-visual material on a worldwide basis.

Webcams can be seen as an even lower-budget extension of this phenomenon. While some webcams can give full-frame-rate video, the picture is usually either small or updates slowly. Internet users can watch animals around an African waterhole, ships in the Panama Canal, traffic at a local roundabout or monitor their own premises, live and in real time. Video chat rooms and video conferencing are also popular with many uses being found for personal webcams, with and without two-way sound.

YouTube was founded on 15 February 2005 and is now the leading website for free streaming video with a vast number of users. It uses a flash-based web player to stream and show video files. Registered users may upload an unlimited amount of video and build their own personal profile. YouTube claims that its users watch hundreds of millions, and upload hundreds of thousands, of videos daily.

Internet Telephony (VoIP)

VoIP stands for Voice-over-Internet Protocol, referring to the protocol that underlies all Internet communication. The idea began in the early 1990s with walkie-talkie-like voice applications for personal computers. In recent years many VoIP systems have become as easy to use and as convenient as a normal telephone. The benefit is that, as the Internet carries the voice traffic, VoIP can be free or cost much less than a traditional telephone call, especially over long distances and especially for those with always-on Internet connections such as cable or ADSL.

VoIP is maturing into a competitive alternative to traditional telephone service. Interoperability between different providers has improved and the ability to call or receive a call from a traditional telephone is available. Simple, inexpensive VoIP network adapters are available that eliminate the need for a personal computer.

Voice quality can still vary from call to call but is often equal to and can even exceed that of traditional calls.

Remaining problems for VoIP include emergency telephone number dialling and reliability. Currently, a few VoIP providers provide an emergency service, but it is not universally available. Traditional phones are line-powered and operate during a power failure; VoIP does not do so without a backup power source for the phone equipment and the Internet access devices.

VoIP has also become increasingly popular for gaming applications, as a form of communication between players. Popular VoIP clients for gaming include Ventrilo and Teamspeak, and others. PlayStation 3 and Xbox 360 also offer VoIP chat features.

Internet access

Common methods of home access include dial-up, landline broadband (over coaxial cable, fiber optic or copper wires), Wi-Fi, satellite and 3G technology cell phones.

Public places to use the Internet include libraries and Internet cafes, where computers with Internet connections are available. There are also Internet access points in many public places such as airport halls and coffee shops, in some cases just for brief use while standing. Various terms are used, such as "public Internet kiosk", "public access terminal", and "Web payphone". Many hotels now also have public terminals, though these are usually fee-based. These terminals are widely accessed for various usage like ticket booking, bank deposit, online payment etc. Wi-Fi provides wireless access to computer networks, and therefore can do so to the Internet itself. Hotspots providing such access include Wi-Fi cafes, where would-be users need to bring their own wireless-enabled devices such as a laptop or PDA. These services may be free to all, free to customers only, or fee-based. A hotspot need not be limited to a confined location. A whole campus or park, or even an entire city can be enabled. Grassroots efforts have led to wireless community networks. Commercial Wi-Fi services covering large city areas are in place in London, Vienna, Toronto, San Francisco, Philadelphia, Chicago and Pittsburgh. The Internet can then be accessed from such places as a park bench.

Apart from Wi-Fi, there have been experiments with proprietary mobile wireless networks like Ricochet, various high-speed data services over cellular phone networks, and fixed wireless services.

High-end mobile phones such as smartphones generally come with Internet access through the phone network. Web browsers such as Opera are available on these advanced handsets, which can also run a wide variety of other Internet software. More mobile phones have Internet access than PCs, though this is not as widely used. An Internet access provider and protocol matrix differentiates the methods used to get online.

Saturday, May 9, 2009

Data recovery

Data recovery is the process of salvaging data from damaged, failed, corrupted, or inaccessible secondary storage media when it cannot be accessed normally. Often the data are being salvaged from storage media formats such as hard disk drives, storage tapes, CDs, DVDs, RAID, and other electronics. Recovery may be required due to physical damage to the storage device or logical damage to the file system that prevents it from being mounted by the host operating system.

The most common "data recovery" issue involves an operating system (OS) failure (typically on a single-disk, single-partition, single-OS system), where the goal is to simply copy all wanted files to another disk. This can be easily accomplished with a Live CD, most of which provide a means to 1) mount the system drive, 2) mount and backup disk or media drives, and 3) move the files from the system to the backup with a file manager or optical disc authoring software. Further, such cases can be mitigated by disk partitioning and consistently moving valuable data files to a different partition from the replaceable OS system files.

The second type involves a disk-level failure such as a compromised file system, disk partition, or a hard disk failure —in each of which the data cannot be easily read. Depending on the case, solutions involve repairing the file system, partition table or MBR, or hard disk recovery techniques ranging from software-based recovery of corrupted data to hardware replacement on a physically damaged disk. These last two typically indicate the permanent failure of the disk, thus "recovery" means sufficient repair for a one-time recovery of files.

A third type involves the process of retrieving files that have been deleted from a storage media. Although there is some confusion as to the term, the term "data recovery" may be used to refer to such cases in the context of forensic purposes or spying.

Recovering data after physical damage

A wide variety of failures can cause physical damage to storage media. CD-ROMs can have their metallic substrate or dye layer scratched off; hard disks can suffer any of several mechanical failures, such as head crashes and failed motors; tapes can simply break. Physical damage always causes at least some data loss, and in many cases the logical structures of the file system are damaged as well. This causes logical damage that must be dealt with before any files can be salvaged from the failed media.

Most physical damage cannot be repaired by end users. For example, opening a hard disk in a normal environment can allow airborne dust to settle on the platter and become caught between the platter and the read/write head, causing new head crashes that further damage the platter and thus compromise the recovery process. Furthermore, end users generally do not have the hardware or technical expertise required to make these repairs. Consequently, costly data recovery companies are often employed to salvage important data. These firms often use "Class 100" / ISO-5 cleanroom facilities to protect the media while repairs are being made. (Any data recovery firm without a pass certificate of ISO-5 or better will not be accepted by hard drive manufacturers for warranty purposes.

Hardware repair


Examples of physical recovery procedures are: removing a damaged PCB (printed circuit board) and replacing it with a matching PCB from a healthy drive, performing a live PCB swap (in which the System Area of the HDD is damaged on the target drive which is then instead read from the donor drive, the PCB then disconnected while still under power and transferred to the target drive), read/write head assembly with matching parts from a healthy drive, removing the hard disk platters from the original damaged drive and installing them into a healthy drive, and often a combination of all of these procedures. Some data recovery companies have procedures that are highly technical in nature and are not recommended for an untrained individual. Any of them will almost certainly void the manufacturer's warranty.

Disk imaging

The extracted raw image can be used to reconstruct usable data after any logical damage has been repaired. Once that is complete, the files may be in usable form although recovery is often incomplete.

Open source tools such as DCFLdd v1.3.4-1 or DOS tools such as HDClone can usually recover data from all but the physically-damaged sectors. A 2007 Defense Cyber Crime Institute study shows that the DCFLdd v1.3.4-1 installed on a Linux 2.4 Kernel system produces extra "bad sectors", resulting in the loss of information that is actually available. The study states that when installed on a FreeBSD Kernel system, only the bad sectors are lost. Another tool that can correctly image damaged media is ILook IXImager, a tool available only to government and Law Enforcement.

Typically, Hard Disk Drive data recovery imaging has the following abilities: (1) Communicating with the hard drive by bypassing the BIOS and operating system which are very limited in their abilities to deal with drives that have "bad sectors" or take a long time to read.(2) Reading data from “bad sectors” rather than skipping them (by using various read commands and ECC to recreate damaged data). (3) Handling issues caused by unstable drives, such as resetting/repowering the drive when it stops responding or skipping sectors that take too long to read (read instability can be caused by minute mechanical wear and other issues). and (4) Pre-configuring drives by disabling certain features, such a SMART and G-List re-mapping, to minimize imaging time and the possibility of further drive degradation.

Recovering data after logical damage


Logical damage is primarily caused by power outages that prevent file system structures from being completely written to the storage medium, but problems with hardware (especially RAID controllers) and drivers, as well as system crashes, can have the same effect. The result is that the file system is left in an inconsistent state. This can cause a variety of problems, such as strange behavior (e.g., infinitely recursing directories, drives reporting negative amounts of free space), system crashes, or an actual loss of data. Various programs exist to correct these inconsistencies, and most operating systems come with at least a rudimentary repair tool for their native file systems. Linux, for instance, comes with the fsck utility, Mac OS X has Disk Utility and Microsoft Windows provides chkdsk. Third-party utilities such as The Coroners Toolkit and The Sleuth Kit are also available, and some can produce superior results by recovering data even when the disk cannot be recognized by the operating system's repair utility. Utilities such as TestDisk can be useful for reconstructing corrupted partition tables.

Some kinds of logical damage can be mistakenly attributed to physical damage. For instance, when a hard drive's read/write head begins to click, most end-users will associate this with internal physical damage. This is not always the case, however. Another possibility is that the firmware of the drive or its controller needs to be rebuilt in order to make the data accessible again.

Preventing logical damage

The increased use of journaling file systems, such as NTFS 5.0, ext3, and XFS, is likely to reduce the incidence of logical damage. These file systems can always be "rolled back" to a consistent state, which means that the only data likely to be lost is what was in the drive's cache at the time of the system failure. However, regular system maintenance should still include the use of a consistency checker. This can protect both against bugs in the file system software and latent incompatibilities in the design of the storage hardware. One such incompatibility is the result of the disk controller reporting that file system structures have been saved to the disk when it has not actually occurred. This can often occur if the drive stores data in its write cache, then claims it has been written to the disk. If power is lost, and this data contains file system structures, the file system may be left in an inconsistent state such that the journal itself is damaged or incomplete. One solution to this problem is to use hardware that does not report data as written until it actually is written. Another is using disk controllers equipped with a battery backup so that the waiting data can be written when power is restored. Finally, the entire system can be equipped with a battery backup that may make it possible to keep the system on in such situations, or at least to give enough time to shut down properly.

Recovery techniques

Two common techniques used to recover data from logical damage are consistency checking and data carving. While most logical damage can be either repaired or worked around using these two techniques, data recovery software can never guarantee that no data loss will occur. For instance, in the FAT file system, when two files claim to share the same allocation unit ("cross-linked"), data loss for one of the files is essentially guaranteed.

Consistency checking

The first, consistency checking, involves scanning the logical structure of the disk and checking to make sure that it is consistent with its specification. For instance, in most file systems, a directory must have at least two entries: a dot (.) entry that points to itself, and a dot-dot (..) entry that points to its parent. A file system repair program can read each directory and make sure that these entries exist and point to the correct directories. If they do not, an error message can be printed and the problem corrected. Both chkdsk and fsck work in this fashion. This strategy suffers from two major problems. First, if the file system is sufficiently damaged, the consistency check can fail completely. In this case, the repair program may crash trying to deal with the mangled input, or it may not recognize the drive as having a valid file system at all. The second issue that arises is the disregard for data files. If chkdsk finds a data file to be out of place or unexplainable, it may delete the file without asking. This is done so that the operating system may run smoother, but the files deleted are often important user files which cannot be replaced. Similar issues arise when using system restore disks (often provided with proprietary systems like Dell and Compaq), which restore the operating system by removing the previous installation. This problem can often be avoided by installing the operating system on a separate partition from your user data.

Data carving

Data Carving is a data recovery technique that allows for data with no file system allocation information to be extracted by identifying sectors and clusters belonging to the file. Data Carving usually searches through raw sectors looking for specific desired file signatures. The fact that there is no allocation information means that the investigator must specify a block size of data to carve out upon finding a matching file signature. This presents the challenge that the beginning of the file is still present and that there is (depending on how common the file signature is) a risk of many false hits. Also, data carving requires that the files recovered be located in sequential sectors (rather than fragmented) as there is no allocation information to point to fragmented file portions. This method can be time and resource intensive.

Recovering overwritten data

When data have been physically overwritten on a hard disk it is generally assumed that the previous data are no longer possible to recover. In 1996, Peter Gutmann, a respected computer scientist, presented a paper that suggested overwritten data could be recovered through the use of Scanning transmission electron microscopy. In 2001, he presented another paper on a similar topic. Substantial criticism has followed, primarily dealing with the lack of any concrete examples of significant amounts of overwritten data being recovered. To guard against this type of data recovery, he and Colin Plumb designed the Gutmann method, which is used by several disk scrubbing software packages.

Although Gutmann's theory may be correct, there's no practical evidence that overwritten data can be recovered. Moreover, there are good reasons to think that it cannot.

Recovery software

Bootable

Data recovery cannot always be done on a running system. As a result boot disk, Live CD, Live USB, or any other type of Live Distro containing a minimal operating system and a set of repair tools.

* Knoppix : The original Linux Live CD. It contains many useful utilities for data recovery.
* Ubuntu Rescue Remix : A GNU/Linux live system that runs from CD or USB pen drive that includes free-libre, open source data recovery and forensics tools. [9]
* SystemRescueCD : A Gentoo based Live CD, useful for repairing unbootable computer systems and retrieving data after a system crash
* NeroBackItUp ImageTool : A user friendly bootable environment, through which you can restore your image created by NeroBackItUp and/or NeroBackItUp ImageTool to make your machine back to consistent state.
* SpinRite : Bootable CD running FreeDOS. Drive recovery software written by Steve Gibson of Gibson Research Corporation. Will run on most PC (x86, non-Apple) hardware regardless of installed operating system.

Consistency checkers

* CHKDSK : A consistency checker for DOS and Windows systems.
* Disk First Aid : A consistency checker for Mac OS 9.
* Disk Utility : A consistency checker for Mac OS X.
* fsck : A consistency checker for UNIX file systems.

File Recovery

* PhotoRec : Multi-platform console program used to recover files.
* SanDisk RescuePRO : Software introduced by SanDisk and distributed through LC Technology International Inc.

Forensics

* The Coroner's Toolkit : A suite of utilities aimed at assisting in forensic analysis of a UNIX system after a break-in.
* The Sleuth Kit : Also known as TSK, The Sleuth Kit is a suite of forensic analysis tools developed by Brian Carrier for UNIX, Linux and Windows systems. TSK includes the Autopsy forensic browser.
* EnCase : A suite of forensic tools developed by Guidance Software that is used for imaging and forensic analysis for UNIX, Linux, and Windows systems.
* FTK by AccessData (Forensic Tool Kit) Used by law enforcement.

Imaging tools
Main article: Disk image

* ddrescue : The GNU tool for imaging failing hard drives.

Friday, May 8, 2009

Personal computer game

A personal computer game (also known as a computer game or simply PC game) is a game played on a personal computer, rather than on a video game console or arcade machine. Computer games have evolved from the simple graphics and gameplay of early titles like Spacewar!, to a wide range of more visually advanced titles.

PC games are created by one or more game developers, often in conjunction with other specialists (such as game artists) and either published independently or through a third party publisher. They may then be distributed on physical media such as DVDs and CDs, as Internet-downloadable shareware, or through online delivery services such as Direct2Drive and Steam. PC games often require specialized hardware in the user's computer in order to play, such as a specific generation of graphics processing unit or an Internet connection for online play, although these system requirements vary from game to game.

Early growth

Although personal computers only became popular with the development of the microprocessor, mainframe and minicomputers, computer gaming has existed since at least the 1960s. One of the first computer games was developed in 1961, when MIT students Martin Graetz and Alan Kotok, with MIT employee Steve Russell, developed Spacewar! on a PDP-1 computer used for statistical calculations.

The first generation of PC games were often text adventures or interactive fiction, in which the player communicated with the computer by entering commands through a keyboard. The first text-adventure, Adventure, was developed for the PDP-11 by Will Crowther in 1976, and expanded by Don Woods in 1977. By the 1980s, personal computers had become powerful enough to run games like Adventure, but by this time, graphics were beginning to become an important factor in games. Later games combined textual commands with basic graphics, as seen in the SSI Gold Box games such as Pool of Radiance, or Bard's Tale.

By the mid-1970s, games were developed and distributed through hobbyist groups and gaming magazines, such as Creative Computing and later Computer Gaming World. These publications provided game code that could be typed into a computer and played, encouraging readers to submit their own software to competitions.

Microchess was one of the first games for microcomputers which was sold to the public. First sold in 1977, Microchess eventually sold over 50,000 copies on cassette tape.

Industry crash

As the video game market became flooded with poor-quality games created by numerous companies attempting to enter the market, and over-production of high profile releases such as the Atari 2600 adaptation of E.T. and Pacman grossly underperformed, the popularity of personal computers for education rose dramatically. In 1983, consumer interest in console video games dwindled to historical lows, as interest in computer games rose.

The effects of the crash were largely limited to the console market, as established companies such as Atari posted record losses over subsequent years. Conversely, the home computer market boomed, as sales of low-cost color computers such as the Commodore 64 rose to record highs and developers such as Electronic Arts benefited from increasing interest in the platform.

The console market experienced a resurgence in the United States with the release of the Nintendo Entertainment System. In Europe, computer gaming continued to boom for many years after

New genres


Increasing adoption of the computer mouse, driven partially by the success of games such as the highly successful King's Quest series, and high resolution bitmap displays allowed the industry to include increasingly high-quality graphical interfaces in new releases. Meanwhile, the Commodore Amiga computer achieved great success in the market from its release in 1985, contributing to the rapid adoption of these new interface technologies.

Further improvements to game artwork were made possible with the introduction of the first sound cards, such as AdLib's Music Synthesizer Card, in 1987. These cards allowed IBM PC compatible computers to produce complex sounds using FM synthesis, where they had previously been limited to simple tones and beeps. However, the rise of the Creative Labs Sound Blaster card, which featured much higher sound quality due to the inclusion of a PCM channel and digital signal processor, led AdLib to file for bankruptcy in 1992.

The year before, id Software had produced one of the first first-person shooter games, Hovertank 3D, which was the company's first in their line of highly influential games in the genre. There were other companies that made fps shooters such as Day of the Viper from the company Accolade made in 1989. Id Software went on to develop Wolfenstein 3D in 1992, which helped to popularize the genre, kick-starting a genre that would become one of the highest-selling in modern times. The game was originally distributed through the shareware distribution model, allowing players to try a limited part of the game for free but requiring payment to play the rest, and represented one of the first uses of texture mapping graphics in a popular game, along with Ultima Underworld.

While leading Sega and Nintendo console systems kept their CPU speed at 3-7 MHz, the 486 PC processor ran much faster, allowing it to perform many more calculations per second. The 1993 release of Doom on the PC was a breakthrough in 3D graphics, and was soon ported to various game consoles in a general shift toward greater realism. In the same time frame, games such as Myst took advantage of the new CD-ROM delivery format to include many more assets (sound, images, video) for a richer game experience.

Many early PC games included extras such as the peril-sensitive sunglasses that shipped with The Hitchhiker's Guide to the Galaxy. These extras gradually became less common, but many games were still sold in the traditional over-sized boxes that used to hold the extra "feelies". Today, such extras are usually found only in Special Edition versions of games, such as Battlechests from Blizzard.

Contemporary gaming

By 1996, the rise of Microsoft Windows and success of 3D console titles such as Super Mario 64 sparked great interest in hardware accelerated 3D graphics on the PC, and soon resulted in attempts to produce affordable solutions with the ATI Rage, Matrox Mystique and Silicon Graphics ViRGE. Tomb Raider, which was released in 1996, was one of the first third person shooter games and was praised for its revolutionary graphics. As 3D graphics libraries such as DirectX and OpenGL matured and knocked proprietary interfaces out of the market, these platforms gained greater acceptance in the market, particularly with their demonstrated benefits in games such as Unreal.[11] However, major changes to the Microsoft Windows operating system, by then the market leader, made many older MS-DOS-based games unplayable on Windows NT, and later, Windows XP (without using an emulator, such as DOSbox).[12]

The faster graphics accelerators and improving CPU technology resulted in increasing levels of realism in computer games. During this time, the improvements introduced with products such as ATI's Radeon R300 and NVidia's GeForce 6 Series have allowed developers to increase the complexity of modern game engines. PC gaming currently tends strongly toward improvements in 3D graphics.

Unlike the generally accepted push for improved graphical performance, the use of physics engines in computer games has become a matter of debate since announcement and 2005 release of the nVidia PhysX PPU, ostensibly competing with middleware such as the Havok physics engine. Issues such as difficulty in ensuring consistent experiences for all players, and the uncertain benefit of first generation PhysX cards in games such as Tom Clancy's Ghost Recon Advanced Warfighter and City of Villains, prompted arguments over the value of such technology.

Similarly, many game publishers began to experiment with new forms of marketing. Chief among these alternative strategies is episodic gaming, an adaptation of the older concept of expansion packs, in which game content is provided in smaller quantities but for a proportionally lower price. Titles such as Half-Life 2: Episode One took advantage of the idea, with mixed results rising from concerns for the amount of content provided for the price.

PC game development

Game development, as with console games, is generally undertaken by one or more game developers using either standardised or proprietary tools. While games could previously be developed by very small groups of people, as in the early example of Wolfenstein 3D, many popular computer games today require large development teams and budgets running into the millions of dollars.

PC games are usually built around a central piece of software, known as a game engine, that simplifies the development process and enables developers to easily port their projects between platforms. Unlike most consoles, which generally only run major engines such as Unreal Engine 3 and RenderWare due to restrictions on homebrew software, personal computers may run games developed using a larger range of software. As such, a number of alternatives to expensive engines have become available, including open source solutions such as Crystal Space, OGRE and DarkPlaces

User-created modifications

The multi-purpose nature of personal computers often allows users to modify the content of installed games with relative ease. Since console games are generally difficult to modify without a proprietary software development kit, and are often protected by legal and physical barriers against tampering and homebrew software,[20][21] it is generally easier to modify the personal computer version of games using common, easy-to-obtain software. Users can then distribute their customised version of the game (commonly known as a mod) by any means they choose.

The inclusion of map editors such as UnrealEd with the retail versions of many games, and others that have been made available online such as GtkRadiant, allow users to create modifications for games easily, using tools that are maintained by the games' original developers. In addition, companies such as id Software have released the source code to older game engines, enabling the creation of entirely new games and major changes to existing ones.

Modding had allowed much of the community to produce game elements that would not normally be provided by the developer of the game, expanding or modifying normal gameplay to varying degrees. One notable example is the Hot Coffee mod for the PC port of Grand Theft Auto: San Andreas, which enables access to an abandoned sex minigame by modifying the game's data file.

Distribution

Computer games are typically sold on standard storage media, such as compact discs, DVD, and floppy disks. These were originally passed on to customers through mail order services, although retail distribution has replaced it as the main distribution channel for video games due to higher sales. Different formats of floppy disks were initially the staple storage media of the 1980s and early 1990s, but have fallen out of practical use as the increasing sophistication of computer games raised the overall size of the game's data and program files.

The introduction of complex graphics engines in recent times has resulted in additional storage requirements for modern games, and thus an increasing interest in CDs and DVDs as the next compact storage media for personal computer games. The rising popularity of DVD drives in modern PCs, and the larger capacity of the new media (a single-layer DVD can hold up to 4.7 gigabytes of data, more than five times as much as a single CD), have resulted in their adoption as a format for computer game distribution. To date, CD versions are still offered for most games, while some games offer both the CD and the DVD versions.

Shareware

Shareware marketing, whereby a limited or demonstration version of the full game is released to prospective buyers without charge, has been used as a method of distributing computer games since the early years of the gaming industry and was seen in the early days of Tanarus as well as many others. Shareware games generally offer only a small part of the gameplay offered in the retail product, and may be distributed with gaming magazines, in retail stores or on developers' websites free of charge.

In the early 1990s, shareware distribution was common among fledging game companies such as Apogee Software, Epic Megagames and id Software, and remains a popular distribution method among smaller game developers. However, shareware has largely fallen out of favor among established game companies in favour of traditional retail marketing, with notable exceptions such as Big Fish Games and PopCap Games continuing to use the model today.

Online delivery

With the increased popularity of the Internet, online distribution of game content has become more common. Retail services such as Direct2Drive and Download.com allow users to purchase and download large games that would otherwise only be distributed on physical media, such as DVDs, as well as providing cheap distribution of shareware and demonstration games. Other services, allow a subscription-based distribution model in which users pay a monthly fee to download and play as many games as they wish.

The Steam system, developed by Valve Corporation, provides an alternative to traditional online services. Instead of allowing the player to download a game and play it immediately, games are made available for "pre-load" in an encrypted form days or weeks before their actual release date. On the official release date, a relatively small component is made available to unlock the game. Steam also ensures that once bought, a game remains accessible to a customer indefinitely, while traditional mediums such as floppy disks and CD-ROMs are susceptible to unrecoverable damage and misplacement. The user would however depend on the Steam servers to be online to download its games. According to the terms of service for Steam, Valve has no obligation to keep the servers running. Therefore, if the Valve Corporation shut down, so would the servers.

PC game genres

The real-time strategy genre, which accounts for more than a quarter of all PC games sold, has found very little success on video game consoles, with releases such as Starcraft 64 failing in the marketplace. Strategy games tend to suffer from the design of console controllers, which do not allow fast, accurate movement.

Conversely, action games have found considerable popularity on video game consoles, making up nearly a third of all console video games sold in 2004, compared to just four percent on the computer. Sports games have also found greater support on game consoles compared to personal computers.

Hardware

Modern computer games place great demand on the computer's hardware, often requiring a fast central processing unit (CPU) to function properly. CPU manufacturers historically relied mainly on increasing clock rates to improve the performance of their processors, but had begun to move steadily towards multi-core CPUs by 2005. These processors allow the computer to simultaneously process multiple tasks, called threads, allowing the use of more complex graphics, artificial intelligence and in-game physics.

Similarly, 3D games often rely on a powerful graphics processing unit (GPU), which accelerates the process of drawing complex scenes in realtime. GPUs may be an integrated part of the computer's motherboard, the most common solution in laptops, or come packaged with a discrete graphics card with a supply of dedicated Video RAM, connected to the motherboard through either an AGP or PCI-Express port. It is also possible to use multiple GPUs in a single computer, using technologies such as NVidia's Scalable Link Interface and ATI's CrossFire.

Sound cards are also available to provide improved audio in computer games. These cards provide improved 3D audio and provide audio enhancement that is generally not available with integrated alternatives, at the cost of marginally lower overall performance. The Creative Labs SoundBlaster line was for many years the de facto standard for sound cards, although its popularity dwindled as PC audio became a commodity on modern motherboards.

Physics processing units (PPUs), such as the Nvidia PhysX (formerly AGEIA PhysX) card, are also available to accelerate physics simulations in modern computer games. PPUs allow the computer to process more complex interactions among objects than is achievable using only the CPU, potentially allowing players a much greater degree of control over the world in games designed to use the card.[32]

Virtually all personal computers use a keyboard and mouse for user input. Other common gaming peripherals are a headset for faster communication in online games, joysticks for flight simulators, steering wheels for driving games and gamepads for console-style games.

Software

Computer games also rely on third-party software such as an operating system (OS), device drivers, libraries and more to run. Today, the vast majority of computer games are designed to run on the Microsoft Windows OS. Whereas earlier games written for MS-DOS would include code to communicate directly with hardware, today Application programming interfaces (APIs) provide an interface between the game and the OS, simplifying game design. Microsoft's DirectX is an API that is widely used by today's computer games to communicate with sound and graphics hardware. OpenGL is a cross-platform API for graphics rendering that is also used. The version of the graphics card's driver installed can often affect game performance and gameplay. It is not unusual for a game company to use a third-party game engine, or third-party libraries for a game's AI or physics.

Local area network gaming

Multiplayer gaming was largely limited to local area networks (LANs) before cost-effective broadband Internet access became available, due to their typically higher bandwidth and lower latency than the dial-up services of the time. These advantages allowed more players to join any given computer game, but have persisted today because of the higher latency of most Internet connections and the costs associated with broadband Internet.

LAN gaming typically requires two or more personal computers, a router and sufficient networking cables to connect every computer on the network. Additionally, each computer must have a network card installed or integrated onto its motherboard in order to communicate with other computers on the network. Optionally, any LAN may include an external connection to the Internet.

Online games

Online multiplayer games have achieved popularity largely as a result of increasing broadband adoption among consumers. Affordable high-bandwidth Internet connections allow large numbers of players to play together, and thus have found particular use in massively multiplayer online RPGs, Tanarus and persistent online games such as World War II Online.

Although it is possible to participate in online computer games using dial-up modems, broadband internet connections are generally considered necessary in order to reduce the latency between players (commonly known as "lag"). Such connections require a broadband-compatible modem connected to the personal computer through a network interface card (generally integrated onto the computer's motherboard), optionally separated by a router. Online games require a virtual environment, generally called a "game server." These virtual servers inter-connect gamers, allowing real time, and often fast paced action. To meet this subsequent need, Game Server Providers (GSP) have become increasingly more popular over the last half decade. While not required for all gamers, these servers provide a unique "home," fully customizable (such as additional modifications, settings, etc) - giving the end gamers the experience they desire. Today there are over 500,000 game servers hosted in North America alone.

Emulation

Emulation software, used to run software without the original hardware, are popular for their ability to play legacy video games without the consoles or operating system for which they were designed. Console emulators such as NESticle and MAME are relatively commonplace, although the complexity of modern consoles such as the Xbox or Playstation makes them far more difficult to emulate, even for the original manufacturers.

Most emulation software mimics a particular hardware architecture, often to an extremely high degree of accuracy. This is particularly the case with classic home computers such as the Commodore 64, whose software often depends on highly sophisticated low-level programming tricks invented by game programmers and the demoscene.

Computer software

Computer software, or just software is a general term used to describe a collection of computer programs, procedures and documentation that perform some task on a computer system.

The term includes:

Application software such as word processors which perform productive tasks for users.
Firmware which is software programmed resident to electrically programmable memory devices on board mainboards or other types of integrated hardware carriers.
Middleware which controls and co-ordinates distributed systems.
System software such as operating systems, which interface with hardware to provide the necessary services for application software.
Software testing is a domain independent of development and programming. It consists of various methods to test and declare a software product fit before it can be launched for use by either an individual or a group. Many tests on functionality, performance and appearance are conducted by modern testers with various tools such as QTP, Load runner, Black box testing etc to edit a checklist of requirements against the developed code. ISTQB is a certification that is in demand for engineers who want to pursue a career in testing.
Testware which is an umbrella term or container term for all utilities and application software that serve in combination for testing a software package but not necessarily may optionally contribute to operational purposes. As such, testware is not a standing configuration but merely a working environment for application software or subsets thereof.
Software includes websites, programs, video games, etc. that are coded by programming languages like C, C++, etc.

"Software" is sometimes used in a broader context to mean anything which is not hardware but which is used with hardware, such as film, tapes and records.

Relationship to computer hardware

Computer software is so called to distinguish it from computer hardware, which encompasses the physical interconnections and devices required to store and execute (or run) the software. At the lowest level, software consists of a machine language specific to an individual processor. A machine language consists of groups of binary values signifying processor instructions which change the state of the computer from its preceding state. Software is an ordered sequence of instructions for changing the state of the computer hardware in a particular sequence. It is usually written in high-level programming languages that are easier and more efficient for humans to use (closer to natural language) than machine language. High-level languages are compiled or interpreted into machine language object code. Software may also be written in an assembly language, essentially, a mnemonic representation of a machine language using a natural language alphabet. Assembly language must be assembled into object code via an assembler.

The term "software" was first used in this sense by John W. Tukey in 1958.In computer science and software engineering, computer software is all computer programs. The theory that is the basis for most modern software was first proposed by Alan Turing in his 1935 essay Computable numbers with an application to the Entscheidungsproblem.

Types of softwares

Practical computer systems divide software systems into three major classes: system software, programming software and application software, although the distinction is arbitrary, and often blurred.


System software
System software helps run the computer hardware and computer system. It includes:

device drivers,
operating systems,
servers,
utilities,
windowing systems,
(these things need not be distinct)

The purpose of systems software is to unburden the applications programmer from the details of the particular computer complex being used, including such accessory devices as communications, printers, readers, displays, keyboards, etc. And also to partition the computer's resources such as memory and processor time in a safe and stable manner.


Programming software
Programming software usually provides tools to assist a programmer in writing computer programs, and software using different programming languages in a more convenient way. The tools include:

compilers,
debuggers,
interpreters,
linkers,
text editors,
An Integrated development environment (IDE) is a single application that attempts to manage all these functions.


Application software
Application software allows end users to accomplish one or more specific (not directly computer development related) tasks. Typical applications include:

industrial automation,
business software,
computer games,
telecommunications, (ie the internet and everything that flows on it)
databases,
educational software,
medical software,

Design and implementation

Design and implementation of a software varies depending on the complexity of the software. For instance design and creation of Microsoft Word software will take much longer time than designing and developing Microsoft Notepad because of the difference in functionalities in each one.

Software is usually designed and created (coded/written/programmed) in integrated development environments (IDE) like emacs, xemacs, Microsoft Visual Studio and Eclipse that can simplify the process and compile the program. As noted in different section, software is usually created on top of an existing software and the application programming interface (API) that the underlying software provides like GTK+, JavaBeans, Swing etc. Libraries (APIs) are categorized for different purposes. For instance JavaBeans library is used for designing enterprise applications, Windows Forms library is used for designing graphical user interface (GUI) applications like Microsoft Word and Windows Communication Foundation is used for designing web services. There are also underlying concepts in computer programming like quicksort, hashtable, array, binary tree that can be useful to creating a software. When a program is designed, it relies on the API. For instance, if a user is designing a Microsoft Windows desktop application, he/she might use the .NET Windows Forms library to design the desktop application and call its APIs like Form1.Close() and Form1.Show() to close or open the application and write the additional operations him/herself that it need to have. Without these APIs, the programmer needs to write these APIs him/herself. Companies like Sun Microsystems, Novell and Microsoft provide their own APIs so that many applications are written using their software libraries that usually have numerous APIs in them.

Software has special economic characteristics that make its design, creation, and distribution different from most other economic goods.

A title of a person that creates a software is called a programmer, software engineer, software developer and code monkey that all essentially have a same meaning.

Industry and organizations

Software has its own niche industry that is called the software industry made up of different entities and peoples that produce software, and as a result there are many software companies and programmers in the world. Because software is increasingly used in many different areas like in finance, searching, mathematics, space exploration, gaming and mining and such, software companies and people usually specialize in certain areas. For instance, Electronic Arts primarily creates video games.

Also selling a software can be quite a profitable industry. For instance, Bill Gates, the founder of Microsoft is the second richest man in the world in 2008 largely by selling the Microsoft Windows and Microsoft Office software programs, and same goes for Larry Ellison largely through his Oracle database software.

There are also many non-profit software organizations like the Free Software Foundation, GNU Project, Mozilla Foundation. Also there are many software standard organizations like the W3C, IETF and others that try to come up with a software standard so that many software can work and interoperate with each other like through standards such as XML, HTML, HTTP, FTP, etc.

Some of the well known software companies include Microsoft, Apple, IBM, Oracle, Novell, SAP, HP, etc.

Friday, May 1, 2009

File Transfer Protocol

File Transfer Protocol (FTP) is a network protocol used to exchange and manipulate files over a TCP computer network, such as the Internet. An FTP client may connect to an FTP server to manipulate files on that server.

Purpose

The objectives of FTP, as outlined by its RFC, are:

1. To promote sharing of files (computer programs and/or data).
2. To encourage indirect or implicit use of remote computers.
3. To shield a user from variations in file storage systems among different hosts.
4. To transfer data reliably, and efficiently.

Connection methods

FTP runs over TCP.It defaults to listen on port 21 for incoming connections from FTP clients. A connection to this port from the FTP Client forms the control stream on which commands are passed from the FTP client to the FTP server and on occasion from the FTP server to the FTP client. FTP uses out-of-band control, which means it uses a separate connection for control and data. Thus, for the actual file transfer to take place, a different connection is required which is called the data stream. Depending on the transfer mode, the process of setting up the data stream is different. Port 21 for control (or program), port 20 for data.

In active mode, the FTP client opens a dynamic port, sends the FTP server the dynamic port number on which it is listening over the control stream and waits for a connection from the FTP server. When the FTP server initiates the data connection to the FTP client it binds the source port to port 20 on the FTP server.

In order to use active mode, the client sends a PORT command, with the IP and port as argument. The format for the IP and port is "h1,h2,h3,h4,p1,p2". Each field is a decimal representation of 8 bits of the host IP, followed by the chosen data port. For example, a client with an IP of 192.168.0.1, listening on port 49154 for the data connection will send the command "PORT 192,168,0,1,192,2". The port fields should be interpreted as p1×256 + p2 = port, or, in this example, 192×256 + 2 = 49154.

In passive mode, the FTP server opens a dynamic port, sends the FTP client the server's IP address to connect to and the port on which it is listening (a 16-bit value broken into a high and low byte, as explained above) over the control stream and waits for a connection from the FTP client. In this case, the FTP client binds the source port of the connection to a dynamic port.

To use passive mode, the client sends the PASV command to which the server would reply with something similar to "227 Entering Passive Mode (127,0,0,1,192,52)". The syntax of the IP address and port are the same as for the argument to the PORT command.

In extended passive mode, the FTP server operates exactly the same as passive mode, however it only transmits the port number (not broken into high and low bytes) and the client is to assume that it connects to the same IP address that was originally connected to. Extended passive mode was added by RFC 2428 in September 1998.

While data is being transferred via the data stream, the control stream sits idle. This can cause problems with large data transfers through firewalls which time out sessions after lengthy periods of idleness. While the file may well be successfully transferred, the control session can be disconnected by the firewall, causing an error to be generated.

The FTP protocol supports resuming of interrupted downloads using the REST command. The client passes the number of bytes it has already received as argument to the REST command and restarts the transfer. In some commandline clients for example, there is an often-ignored but valuable command, "reget" (meaning "get again"), that will cause an interrupted "get" command to be continued, hopefully to completion, after a communications interruption.

Resuming uploads is not as easy. Although the FTP protocol supports the APPE command to append data to a file on the server, the client does not know the exact position at which a transfer got interrupted. It has to obtain the size of the file some other way, for example over a directory listing or using the SIZE command.

In ASCII mode (see below), resuming transfers can be troublesome if client and server use different end of line characters.

Security problems

The original FTP specification is an inherently unsecure method of transferring files because there is no method specified for transferring data in an encrypted fashion. This means that under most network configurations, user names, passwords, FTP commands and transferred files can be captured by anyone on the same network using a packet sniffer. This is a problem common to many Internet protocol specifications written prior to the creation of SSL, such as HTTP, SMTP and Telnet. The common solution to this problem is to use either SFTP (SSH File Transfer Protocol), or FTPS (FTP over SSL), which adds SSL or TLS encryption to FTP as specified in RFC 4217.

FTP return codes

FTP server return codes indicate their status by the digits within them. A brief explanation of various digits' meanings are given below:

* 1xx: Positive Preliminary reply. The action requested is being initiated but there will be another reply before it begins.
* 2xx: Positive Completion reply. The action requested has been completed. The client may now issue a new command.
* 3xx: Positive Intermediate reply. The command was successful, but a further command is required before the server can act upon the request.
* 4xx: Transient Negative Completion reply. The command was not successful, but the client is free to try the command again as the failure is only temporary.
* 5xx: Permanent Negative Completion reply. The command was not successful and the client should not attempt to repeat it again.
* x0x: The failure was due to a syntax error.
* x1x: This response is a reply to a request for information.
* x2x: This response is a reply relating to connection information.
* x3x: This response is a reply relating to accounting and authorization.
* x4x: Unspecified as yet
* x5x: These responses indicate the status of the Server file system vis-a-vis the requested transfer or other file system action.

Anonymous FTP

A host that provides an FTP service may additionally provide Anonymous FTP access as well. Under this arrangement, users do not strictly need an account on the host. Instead the user typically enters 'anonymous' or 'ftp' when prompted for username. Although users are commonly asked to send their email address as their password, little to no verification is actually performed on the supplied data.

As modern FTP clients typically hide the anonymous login process from the user, the ftp client will supply dummy data as the password (since the user's email address may not be known to the application). For example, the following ftp user agents specify the listed passwords for anonymous logins:

* Mozilla Firefox (3.0.7) — mozilla@example.com
* KDE Konqueror (3.5) — anonymous@
* wget (1.10.2) — -wget@
* lftp (3.4.4) — lftp@

The Gopher protocol has been suggested as an alternative to anonymous FTP, as well as Trivial File Transfer Protocol and File Service Protocol.

Data format

While transferring data over the network, several data representations can be used. The two most common transfer modes are:

1. ASCII mode
2. Binary mode: In "Binary mode", the sending machine sends each file byte for byte and as such the recipient stores the bytestream as it receives it. (The FTP standard calls this "IMAGE" or "I" mode)

In "ASCII mode", any form of data that is not plain text will be corrupted. When a file is sent using an ASCII-type transfer, the individual letters, numbers, and characters are sent using their ASCII character codes. The receiving machine saves these in a text file in the appropriate format (for example, a Unix machine saves it in a Unix format, a Windows machine saves it in a Windows format). Hence if an ASCII transfer is used it can be assumed plain text is sent, which is stored by the receiving computer in its own format. Translating between text formats might entail substituting the end of line and end of file characters used on the source platform with those on the destination platform, e.g. a Windows machine receiving a file from a Unix machine will replace the line feeds with carriage return-line feed pairs. It might also involve translating characters; for example, when transferring from an IBM mainframe to a system using ASCII, EBCDIC characters used on the mainframe will be translated to their ASCII equivalents, and when transferring from the system using ASCII to the mainframe, ASCII characters will be translated to their EBCDIC equivalents.

By default, most FTP clients use ASCII mode. Some clients try to determine the required transfer-mode by inspecting the file's name or contents, or by determining whether the server is running an operating system with the same text file format.

The FTP specifications also list the following transfer modes:

1. EBCDIC mode - this transfers bytes, except they are encoded in EBCDIC rather than ASCII. Thus, for example, the ASCII mode server
2. Local mode - this is designed for use with systems that are word-oriented rather than byte-oriented. For example mode "L 36" can be used to transfer binary data between two 36-bit machines. In L mode, the words are packed into bytes rather than being padded. Given the predominance of byte-oriented hardware nowadays, this mode is rarely used. However, some FTP servers accept "L 8" as being equivalent to "I".

In practice, these additional transfer modes are rarely used. They are however still used by some legacy mainframe systems.

The text (ASCII/EBCDIC) modes can also be qualified with the type of carriage control used (e.g. TELNET NVT carriage control, ASA carriage control), although that is rarely used nowadays.

Note that the terminology "mode" is technically incorrect, although commonly used by FTP clients. "MODE" in RFC 959 refers to the format of the protocol data stream (STREAM, BLOCK or COMPRESSED), as opposed to the format of the underlying file. What is commonly called "mode" is actually the "TYPE", which specifies the format of the file rather than the data stream. FTP also supports specification of the file structure ("STRU"), which can be either FILE (stream-oriented files), RECORD (record-oriented files) or PAGE (special type designed for use with TENEX). PAGE STRU is not really useful for non-TENEX systems, and RFC1123 section 4.1.2.3 recommends that it not be implemented.

FTP and web browsers

Most recent web browsers and file managers can connect to FTP servers, although they may lack the support for protocol extensions such as FTPS. This allows manipulation of remote files over FTP through an interface similar to that used for local files. This is done via an FTP URL, which takes the form ftp(s):// (e.g., ftp://ftp.gimp.org/). A password can optionally be given in the URL, e.g.: ftp(s)://:@:. Most web-browsers require the use of passive mode FTP, which not all FTP servers are capable of handling. Some browsers allow only the downloading of files, but offer no way to upload files to the server.

FTP and NAT devices

The representation of the IP addresses and port numbers in the PORT command and PASV reply poses another challenge for Network address translation (NAT) devices in handling FTP. The NAT device must alter these values, so that they contain the IP address of the NAT-ed client, and a port chosen by the NAT device for the data connection. The new address and port will probably differ in length in their decimal representation from the original address and port. This means that altering the values on the control connection by the NAT device must be done carefully, changing the TCP Sequence and Acknowledgment fields for all subsequent packets. Such translation is not usually performed in most NAT devices, but special application layer gateways exist for this purpose.

FTP over SSH (not SFTP)

FTP over SSH (not SFTP) refers to the practice of tunneling a normal FTP session over an SSH connection.

Because FTP uses multiple TCP connections (unusual for a TCP/IP protocol that is still in use), it is particularly difficult to tunnel over SSH. With many SSH clients, attempting to set up a tunnel for the control channel (the initial client-to-server connection on port 21) will protect only that channel; when data is transferred, the FTP software at either end will set up new TCP connections (data channels) which will bypass the SSH connection, and thus have no confidentiality, integrity protection, etc.

Otherwise, it is necessary for the SSH client software to have specific knowledge of the FTP protocol, and monitor and rewrite FTP control channel messages and autonomously open new forwardings for FTP data channels. Version 3 of SSH Communications Security's software suite, and the GPL licensed FONC are two software packages that support this mode.

FTP over SSH is sometimes referred to as secure FTP; this should not be confused with other methods of securing FTP, such as with SSL/TLS (FTPS). Other methods of transferring files using SSH that are not related to FTP include SFTP and SCP; in each of these, the entire conversation (credentials and data) is always protected by the SSH protocol.

Download manager

A download manager is a computer program dedicated to the task of downloading (and sometimes uploading) possibly unrelated stand-alone files from (and sometimes to) the Internet for storage. This is unlike a World Wide Web browser, which is mainly intended to browse web pages, composed of a multitude of smaller files, where error-free moving of files for permanent storage is of secondary importance. (A failed or incomplete web page file rarely ruins the page.) The typical download manager at a minimum provides means to recover from errors without losing the work already completed, and can optionally split the file to be downloaded (or uploaded) into 2 or more segments, which are then moved in parallel, potentially making the process faster within the limits of the available bandwidth. (A few servers are known to block moving files in parallel segments, on the principle that server capacity should be shared equally by all users.) Multi-source is the name given to files that are downloaded in parallel

Features of Download Manager

Download managers commonly include one or more of the following features:

* Pausing the downloading of large files.
* Resuming broken or paused downloads (especially for very large files).
* Downloading files on poor connections.
* Downloading several files from a site automatically according to simple rules (file types, updated files, etc. - see also Offline Browser).
* Automatic recursive downloads (mirroring).
* Scheduled downloads (including, automatic hang-up and shutdown).
* Searching for mirror sites, and the handling of different connections to download the same file more quickly (segmented downloading).
* Variable bandwidth usage.
* Automatic subfolder generation.

Download managers are useful for very active Internet users. For dial-up users, they can automatically dial the Internet Service Provider at night, when rates or tariffs are usually much lower, download the specified files, and hang-up. They can record which links the user clicks on during the day, and queue these files for later download. For broadband users, download managers can help download very large files by resuming broken downloads, by limiting the bandwidth used, so that other internet activities are not affected (slowed) and the server is not overloaded, or by automatically navigating a site and downloading pre-specified content (photo galleries, MP3 collections, etc.) this can also include automatically downloading whole sites and regularly updating them (see Mirroring).

Many download managers support Metalink, an XML file listing mirrors, checksums, and other information useful for downloading.

Uploading and downloading

In networks, uploading and downloading refer to the two canonical directions (corresponding to send and receive, respectively) that information can move, and further defines such data as being copied and compiled (indicated by the term "loading") to create a complete file, after a period of time. Downloading is distinguished from the related concept of streaming, which indicates a download in which the data is sequentially usable as it downloads, or "streams," and that (typically) the data is not stored.

To download is to receive data to a local system from a remote system, such as a webserver, FTP server, email server, or other similar systems. A download is any file that is offered for downloading or that has been downloaded.

The inverse operation, uploading, is the sending of data from a local system to a remote system, such as a server, or peer, with the intent that the remote system should save a copy of whatever is being transferred.