File Transfer

Steven Zeil

Last modified: Aug 29, 2023
Contents:

If you prepare files on one machine but want to use them on another, you need some means of transferring them. For example, if you edit files on your home PC, you may eventually need to get those files onto the CS Department network. On the other hand, you may want to take files your instructor has provided off of that network for use on your home PC.

1 Before You Start: Binary and Text Files

A complicating factor in transferring files from one computer to another is that you must decide whether the files you want to transfer should be treated as “text” or as “binary”. (We’ve talked a bit about this already, but it’s particularly important when transferring files.)

All files are containers of bytes. But in many files, these bytes are intended to represent text:

Such files are referred to as text files. The way in which text is encoded as numbers in a text file is governed by various international standards, which we’ll look at in just a moment. Because so many programs observe these standards, you can safely manipulate text files with a wide range of different text-oriented programs, even ones that the original creator of the text file might not have intended or might not have known about.

In other files, the numbers really are numbers. They encode data in a more complicated fashion, not line by line of text. These are called binary files. The way in which information is encoded into a binary file is entirely determined by the programs (and their programmers) intended for use with the file. Manipulating a binary file with any program not intended for that specific kind of encoding can be disastrous.

Some binary files can only be used on one specific operating system. Compiled, executable programs, for example can only be run on the operating system for which they were prepared. You cannot run a Windows executable on a Linux machine, nor run a Linux executable on a Windows machine.

Some binary files can be used on multiple operating systems. A Microsoft Word .doc file may start life on a Windows PC, but Word is also available on Apple OS/X and Android, and can also be processed by the OpenOffice and LibreOffice suites that run on all of those operating systems and on Linux as well. Similarly, .zip files and .pdf` can be processed under almost any operating system, even though the exact programs that do so may vary.

1.1 Variations among Text Files

1.1.1 ASCII

 

Since the late 1960’s, most text files were encoded using the ASCII character set. This encoding uses the numbers 0…127 (in hexadecimal, 0…7F) and so fits comfortably into an 8-bit byte (which can actually hold the numbers 0…255).

 

Originally, these control characters were used to “control” output device behavior. For example CR (carriage return) cause a printer to move its print head to the leftmost column of a page. LF (line feed) caused a printer to move one line down on the page.

In modern usage, only a few of these control characters will appear in a text file. TAB characters are common, and CR and LF are, as we shall see, used as line terminators. FF (form feed) characters were originally used to tell a printer to feed in a new form (page) and are still occasionally used to indicate the start of a new page.

One of the ways to tell if a file is intended to be an ASCII text file is to look for characters that fall outside the normal range of characters. If it has bytes containing numbers 128–255, it definitely is not ASCII text. If it has bytes containing numbers in the range 0–31, other than 10 (LF), 11 (TAB), 12 (FF), or 13 (CR), it probably is not ASCII text.

1.1.2 ASCII Line Termination

Even if you have ASCII text in a file, there is some variation on how to encode the file. Windows and other operating systems disagree on how to divide an ASCII text file into lines.

This means that ASCII text files created in Windows tend to look, to a Unix program, as if they have extra ^M characters near the end of each line. ASCII files created in Unix, on the other hand, tend to look to a Windows program as if the entire file contains only one line with odd ^J characters sprinkled inside.

Example 1: Try This:
  1. Log into one of our Linux servers.

    Do

    more /home/cs252/Assignments/textFiles/wintext.txt 
    more /home/cs252/Assignments/textFiles/linuxtext.txt 
    more /home/cs252/Assignments/textFiles/mixedtext.txt
    

    You won’t see a difference, because more is smart enough to handle both styles of line termination.

  2. Give the command

    diff /home/cs252/Assignments/textFiles/wintext.txt /home/cs252/Assignments/textFiles/linuxtext.txt 
    

    Remember that diff compares two files and lists the liens that are different. In this case, it list all of the lines in each file.

  3. Give the command

    diff -b /home/cs252/Assignments/textFiles/wintext.txt /home/cs252/Assignments/textFiles/linuxtext.txt 
    

    The -b option tells diff to ignore blanks and other “whitespace”, i.e., characters that take up space but don’t produce any visible “ink” in their place. Whitepace characters include blanks, tab characters, and carriage returns.

    When ignoring carriage returns, diff considers the files to be identical. It doesn’t list any liens at all.

  4. Give the commands

    file /home/cs252/Assignments/textFiles/wintext.txt 
    file /home/cs252/Assignments/textFiles/linuxtext.txt 
    file /home/cs252/Assignments/textFiles/mixedtext.txt
    

    Observe how the file command tells you about the contents of each.

In general, it’s hard to predict what Linux commands will do with Windows-style text files. Some, like more, will handle them just fine. Some will convert then to Linux-style. And some will choke on what they regard as illegal input. For this reason, it’s usually safest to convert Windows-style text files to Linux-style, as described in an upcoming section.

1.1.3 Unicode

Although ASCII has been queen of the text encoding world for most of the history of computing, it has limitations. Modern applications need many more than the 96 printable characters available in ASCII. Unicode is an international standard encoding that uses multiple bytes per character to extend ASCII (the 128 ASCII characters are preserved in Unicode at their original numeric values), adding characters from international alphabets, mathematical and musical symbols, simple graphics, and a variety of other “utility” characters.

Unicode actually has multiple ways of encoding these extensive characters. One Unicode encoding simply uses two bytes per character, for a total of 65238 possible characters. But since most text files are still heavily oriented towards ASCII (0–127), this doubles the size of the typical text file with a lot of zero bytes. So another popular encoding (called "“UTF-8”) uses 1 byte to represent an ASCII character, with a special non-ASCII byte value used to signal that the next character coming will be a multi-byte Unicode value.

Like ASCII text files, lines in Unicode text files can be terminated by LF or by a CR-LF sequence, depending on the operating system. Unicode also introduces a its own optional non-ASCII control characters to signal the end of a line and the end of a paragraph. Given this many options, though, it’s generally safe to assume that any program sophisticated enough to handle Unicode will be able to cope with any of the multiple options for line termination a file might employ.

1.2 Identifying the File Contents

How can you tell if a file is text or binary?

Let’s get one thing out of the way right now:

You can not tell if a text is text or binary by double-clicking on it in an operating system window to open it up. Launching a file in this way simply runs whatever program the operating system believes is most appropriate to that file. That program may very well show you text, but that doesn’t mean the information was encoded as a text file. On the other hand, the program might show you graphics with no text at all, but that does not mean that the graphics were not drawn from a description written in ASCII text.

You can get a hint as to whether a file is text or binary by looking at the file extension (the 2 or 3 letters after the final ‘.’ in a file name).

But that’s only a hint. There are so many programs in the world that some are bound to use the same extensions.

In Linux, the best way to see what kind of data is in a file is to use the file command, e.g.,

file mystery.dat

The file command will print a description fo the file contents. This description can be a few lines long. If the file contains ASCII text or Unicode text, this will be stated explicitly as part of the description. If the file is ASCII text but with Windows-style line termination, it will state that as well.

In Windows, your best way to see if a file is text or binary is to open it in a text editor such as NotePad (but not in a word processor such as Word).

1.3 Converting Text Files

If you find yourself with a Windows-style text file on a Linux machine, or if you have a Linux-style text file that you want to transfer to a Windows machine, you can convert from one form to the other.

Important! Make sure that the file you are working on really is text before trying any of these conversions. If you use any of these on a binary file, you will corrupt the file past the point of possible repair.

That’s precisely why we spent so much time, above, explaining how to tell if a file is binary or text.

1.3.1 Windows to Linux

To convert a Windows-style text file to Linux, use the command tr:

tr -d '\r' < file1 > file2

This produces a new file file2 from file1 by converting the line endings to the Unix format. (Note that file1 and file2 cannot be the same file.)

On most (not all) Linux systems, you could also use the command dos2unix:

dos2unix file1

1.3.2 Linux to Windows

You can also prepare a text file for transfer to a Windows system with unix2dos:

unix2dos file1

Be sure to check your file with the file command before using dos2unix or unix2dos. If the file is not, in truth, ASCII text, these commands will likely leave you with a badly corrupted file.

2 Transferring Files Across the ODU Local Network: Samba

If you are sitting at a Windows PC that is part of either the CS Dept’s own local network (e.g., in the CS Dept. labs or connected to the CS Dept’s wireless network) or part of the ODU ITS local network (most ODU computer labs on the Norfolk, Virginia Beach, and Peninsula Gradate Center campuses), then you can access your Unix account directories directly from within Windows. This is because the CS Dept. Unix file servers run a service called “Samba”, a program that mediates file access between UNIX and other systems.

Again, let me emphasize that Samba only works on a local network. If you are connecting to the campus via the Internet, forget Samba.

To use Samba, you might not need to do anything at all. If you are logged in to a CS Dept PC and you have a Z: drive mapped, that is actually a Samba connection to your Unix home directory.

If you have no such drive, use the Windows “Start->Run” button to run

\\userdata.cs.odu.edu\undergrad\your-login-name

(Graduate students would use “grad” instead of “undergrad” in the path above.) You may be prompted for a password, or a login name and password. Supply your CS Unix login/password and, if all is well, a Windows Explorer window should open displaying the contents of your Unix home directory. You can now manipulate files in this Window just as you would in any Windows directory/folder, but the changes are occurring in your Unix directory.

Now that we know that it works, we can make this whole process more convenient by mapping a Windows drive letter to your Unix account, giving you a “fake” disk drive that actually accesses your Unix files. From inside any Windows Explorer or “ My Computer” window, select “Tools” (or right-click on the “My Computer” icon) and select “ Map Network Drive…”. Select an unused drive letter, and enter that same address/command string as in the last step for the “ Folder”. Make sure the “Reconnect at logon” box is checked. Finally, if your login name for logging into the PC is different from your CS Unix login name, look for a “Connect using a different user name” link, click on that and supply your Unix login information. Click on OK/Finish and within a few seconds, you should have a new drive available that actually maps onto your Unix account.

Two things to keep in mind when using Samba to access your Unix files from Windows:

3 Transferring Files Across the Internet: ftp

FTP (File Transfer Protocol) is the mechanism for transferring files over the Internet. Although most browsers provide some support for FTP, they usually only permit downloads (from the remote machine to your local PC) and usually only permit access to public repositories, not to password-protected accounts.

FTP (File Transfer Protocol) is one of the oldest mechanisms for for transferring files over the Internet. Although most browsers provide some support for FTP, they usually only permit downloads (from the remote machine to your local PC) and usually only permit access to public repositories, not to password-protected accounts.

From a modern perspective, FTP has a number of limitations.

Consequently, FTP has been largely supplanted in practice by SFTP (the Secure File Transfer Protocol) which leverages the built-in encryption of SSH and a special ability of SSH to set up secure, router & fire-wall friendly, “tunnels” through which other protocols can work.

Still, you may occasionally see a download link from a public (no-login) repository with an ftp://… URL. And many programs that support SFTP will also offer an FTP option that, in most cases, you should simply ignore.

4 Secure File Transfer: sftp and scp

SFTP (Secure File Transfer Protocol) is a more modern file transfer that encodes your entire session. It is built upon the secure ssh service, and therefore shares ssh’s ability to tunnel out through most reasonably configured firewalls.

Most SSH servers are also SFTP servers, so if you have an account on a machine that lets you use an SSH client program to open command sessions, you can probably use an SFTP client program to open file transfer sessions on that same server.

You will need an SFTP client program on your local machine to do this.

We’ll look at both styles of SFTP client.

4.1 SFTP via a Text-based Interface

Text-based sftp clients are widely available. Linux and OS/X machines will have one as part of the standard OS distribution. Windows does not come with one, but if you have isntalled the Windows 10 OpenSSH, that will have included a text-based sftp command.

Typically, a text-based client is launched from your local machine with the command

sftp yourLoginName@serverName

The one potentially confusing thing about working with the sftp command is that you need to be constantly aware of what directory you are working with on two different machines: the local machine you are issuing the commands from and the remote machine that you have connected to.

The commands provided by a typical text-based SFTP client are:

Here is a sample SFTP session:

The commands supplied by the user are shown like this.

sftp yourName@linux.cs.odu.edu  ➀ 
Connected to linux.cs.odu.edu. 
yourName@linux.cs.odu.edu's password: 
sftp> lcd courseData/temp                ➁
sftp> cd data 
sftp>  ls
file1.txt  file2.dat  file3.txt  file4.dat   ➂
sftp>  get file1.txt               ➃
Fetching /home/yourName/data/file1.txt to file1.txt 
/home/yourName/data/file1.txt                     100%   20KB  19.6KB/s   00:0 226 Transfer complete 
sftp>  get *.dat                   ➄
Fetching /home/yourName/data/file2.dat to file2.dat 
/home/yourName/data/file2.dat                     100% 1001 1.0KB/s   00:
Fetching /home/yourName/data/file4.dat to file4.dat 
/home/yourName/data/file4.dat 100% 2123     2.1KB/s   0 226 Transfer complete 
sftp> quit 

4.2 SFTP via GUI Interface

If the text-based interface does not appeal (or is not available on your local machine), you can get a free GUI-based SFTP client. See the Resources page for suggestions.

GUI-based SFTP clients differ considerably in the details of how you operate them. If you use one of these, you will have to rely on its own built-in help or its source website to learn how to use it. Here, we will have to settle for an example of one such client.

 

One that I can recommend fir Windows uses is WinSCP, shown here. It is fairly typical of GUI-based SFTP clients, showing two window “panes”, side by side. One shows directories and files on your local machine. The other shows directories and files on a remote machine to which you have connected. You can transfer files from one machine to the other by dragging and dropping file icons from one pane to the other.

WinSCP is available on the CS Dept’s Windows PCs, including the Virtual PC lab.

 

Another option, FileZilla, is shown here. As is typical of such file transfer clients, the interface is dominated by two window “panes”, side by side. One shows directories and files on your local machine. The other shows directories and files on a remote machine to which you have connected. You can transfer files from one machine to the other by dragging and dropping file icons from one pane to the other.

FileZilla can be installed on Windows, OS/X, and Linux machines.

 

With both of these clients, you will want to be sure that you have selected SFTP and not the older FTP as your transfer mode. Again, consult their documentation and/or built-in help for details.

Students have reported that FileZilla may time out when connecting to some ODU machines. (This can also happen with WinSCP, but is less common because WinSCP immediately offers to keep trying for another 60 seconds before giving up.)

This can be caused by…

  1. an internal setting of FileZilla that sets the timeout to 20 seconds. You can change this from the FileZilla Settings... entry under the Edit menu. Try bumping this up to 60.
  2. entering incorrect login or password information or misspelling the host/server name.
  3. attempting to connect via the (default) FTP method rather than SFTP,

4.3 Try It Out

Install an sftp client on your local PC, if you don’t have one already. Consult the Resources page for options.

Whichever form of sftp client you have decided to use, text-based or GUI, …

Example 2: Try This:
  1. Using your SSH client, log in to one of our Unix servers. Create a ~/playing/transfers directory. Log off.

  2. You should still have the files wintext.txt and linuxtext.txt on your local machine from the earlier Try This exercise. If not, download them again.

  3. On your local PC, use a command line or GUI-based SFTP client to connect to one of our Linux servers. Transfer the wintext.txt and linuxtext.txt files from your local PC to your ~/playing/transfers directory on the Linux server..

  4. Using your SSH client, log in to one of our Unix servers. cd to your ~/playing/transfers directory and examine its contents. you should find the two files you tried to transfer in there.

  5. Give the commands

    file wintext.txt
    file linuxtext.txt
    

    Take note of what this tells you about the line termination.

4.4 scp

Another secure trasfer approach is SCP , which you can think of as an attempt to extend the normal Unix cp command to work across networks. SCP uses the same underlying protocol as SFTP, so usually any SSH server will support for SFTP and SCP. The clients, however, are usually quite different. SCP is, by its nature, generally done via a text-mode interface by issuing an scp command.

The basic format of an scp command is

scp loginName1@_machine1_:_file1_ loginName2@_machine2_:_file1_

to copy a file from one machine to another.

For example, from my home Linux machine, if I wanted to grab a copy of my .emacs file from my home directory on linux.cs.odu.edu , I might say:

scp zeil@linus.cs.odu.edu:/home/zeil/.emacs myLinux.emacs

Personally, I seldom use command-line scp because the paths on the remote machine tend to get long and, unlike paths on your local machine, you cannot use the Tab-key to complete file and directory names after typing the first few characters. I generally use sftp instead. Many command-line sftp clients do tab completion on the remote files. Even if they do not, the built-in ls command to list the current directory on the remote machine makes it easy to copy-and-paste long file names.

5 Fetching Files from Across the Internet: wget

Finally, there is a convenient way to get a copy of a file that is provided via a web server. The wget command accepts a URL (a web address), fetches the file at that URL, and deposits a copy of that file in your current directory.

Example 3: Try This:

Earlier, you downloaded a file linuxtext.txt onto a Windows PC and then transferred it to your Linux account.

Now let’s get one of them directly.

  1. Log in to one of our Linux servers.
  2. cd to any convenient directory that does not already contain that file.
  3. Right click on the linuxtext.txt link above and select the option to copy the link address to your clipboard.
  4. In your Linux command session, type “wget”, a space, and then paste the URL you just copied. Hit Enter/Return to run the command.
  5. Examine your directory. you should find a copy of linuxtext.txt.