Subtitles section Play video
[MUSIC PLAYING]
DAVID MALAN: So today we're going to talk
about challenges at this crucial intersection of law and technology.
And the goal at the end of today is not to have provided you with more answers,
but hopefully generated more questions about what this intersection is
and where we're going to go forward.
Because at this intersection lie a lot of really interesting and challenging
problems that are at the forefront of what we're doing.
And you, as a practitioner, may be someone
who is asked to confront and contend with and provide resolutions
for some of these problems.
This lecture's going to be divided into two parts roughly.
In the first part, we're going to discuss
trust, whether we can trust the software that we receive
and what implications that might have for software
that's transmitted over the internet.
And in the second part, we're going to talk about regulatory challenges that
might be faced.
As new emergent technologies come into play,
how is the law prepared, or is the law prepared
to contend with those challenges?
But let's start by talking about this idea of a trust model,
trust model being a computational term for basically
do we trust something that we're receiving over the internet?
Do we trust that software is what it says it is?
Do we trust that a provider is providing a service in the way they describe,
or are there doing other things behind the scenes?
Now, as part of this lecture, there's a lot of supplementary reading materials
that we've incorporated in that we're going to draw on quite a bit
throughout the course of today.
And the first of those is a paper called "Reflections on Trusting Trust."
This is arguably one of the most famous papers in computer science.
It was written in 1984 by Ken Thompson.
Ken Thompson was one of the inventors of the Unix operating
system, on which Linux was based, on which subsequently,
based on a version of Linux, Mac OS is based.
And so he's quite a well-known figure in the computer science community.
And he wrote this paper to accept an award called the Turing Award, again,
one of the most famous awards in computer science.
And in it, he's trying to highlight the problem of trust in software.
And he begins by discussing about a computer
program that can reproduce itself.
We typically refer to this as what's called a quine in computer science.
But the idea is can you write a simple program that reproduces itself?
And we won't go through that exercise here.
But Thompson shows us that, yes, it is relatively trivial actually
to write programs that do this.
But what does this then lead to?
So the next step of the process that Thompson discusses is
stage two in this paper, is how do you teach a computer
to teach itself something?
And he uses the idea of a compiler.
Recall that we use compilers in some programming languages
to turn source code, the human-like syntax
that we understand-- languages like C, for example,
will be written in source code.
And they need to be compiled, or transformed,
into zeros and ones, machine code, because computers only
understand these zeros and ones.
They don't understand the human-like syntax
that we're familiar with as programmers when we are writing our code.
And what Thompson is suggesting that we can do
is we can teach the compiler, the program that
actually takes the source code and transforms it into zeros and ones,
to compile itself.
And he starts out by doing this by introducing a new character
for the compiler to understand.
The analogy is drawn to the newline character, which we
type when we reach the end of a line.
We want to go down and back to the beginning of a new one.
We enter the newline character.
There are other characters that were not initially envisioned
as part of the C compiler.
And one of those is vertical tab, which basically
allows you to jump down several lines without necessarily resetting back
to the beginning of the line as newline would.
And so Thompson goes through the process,
that I won't expound on here because it's
covered in the paper, of how to teach the compiler what
this new character, this vertical tab means.
He shows us that we can write code in the C programming language
and then have the compiler compile that code into zeros and ones that
create something called a binary, a program
that a computer can execute and understand.
And then we can use that newly created compiler
that we've just created to compile other C programs.
Which means that once we've taught the computer how
to understand what this vertical tab character is,
it then can propagate into any other C program that we write.
The computer is learning, effectively, a new thing to interpret,
and it can then interpret that in every other program.
But then Thompson leads us into stage three,
which is, what if that's not all the computer or the compiler does?
What if instead of just adding that vertical tab character
whenever we did it, we also secretly, as part of the source code,
insert a bug into the code, such that now whenever we compile the code
and we encounter that backslash V, that vertical tab character,
we're not only putting that into the code
so that the computer can understand and pass this slash
V, the character that it never knew about before,
but we've also sort of surreptitiously hidden a bug in the code.
And again, Thompson goes into great detail
about exactly how that can be done and exactly what steps we can then
take to make it look like that was never there.
We can change the source code, modify it,
and make it look like we never had a bug in there,
even though it is now propagating into all of the source code
we ever write or we ever compile going forward.
We've created a way to surreptitiously hide bugs in our code.
And the conclusion that Thompson draws is, is it
possible to ever trust software that was written by anyone else?
In this course we've talked about some of the tools that are available
to programmers that would allow them to go back in time-- for example,
we've discussed GitHub on several occasions to go back in time--
and see prior versions of code.
In the 1980s, when this paper was written,
that wasn't necessarily possible.
It was relatively easy to hide source code changes so that the untrained eye
wouldn't know about them.
Code was not shared via the internet.
Code was shared via floppy disks or hard disk that were being
passed between people who needed them.
And so there was no easy way to verify that code that was written by somebody
else is actually trustworthy.
Now, again, this paper came out 35-plus years ago now.
And it came out around the time that the Computer Fraud and Abuse
Act, which we've also previously discussed,
was being drafted and run through Congress.
Did lawmakers heed the advice of Ken Thompson?
Do we still today trust that our programs that we receive
or that we write are free of bugs?
Is there a way for us to verify that?
What should happen if code is found to be buggy?
What if it's unintentionally buggy?
What if it's maliciously buggy?
Do we have a way to challenge things like that?
Do we have a way to prosecute those kinds of cases
if the bug creates some sort of catastrophic failure in some business?
Not exactly.
The challenge of figuring out whether or not we should trust software
is something that we have to contend with every day.
And there's no bright line answer for exactly how to do so.
Now let's turn to perhaps a more modern interpretation of this idea
and take a look at the Samsung Smart TV policy.
So this was a bit of news a few years ago,
that Samsung was recording or was capturing voice commands
so people could make use of their television without needing a remote.
You could say something like, television,
please turn the volume up, or television, change the channel.
But it turned out that when Samsung was collecting this information,
they were transmitting it to a third party, a third-party language
processor, who would ostensibly be taking the commands they hear
and feeding them into their own database to improve the quality of understanding
what these commands were.
So it would hear--
let's say thousands of people use this brand of television.
It would take the thousands of people's voices all making the same command,
feed it into its algorithm to process this command, and hopefully try
and come up with a better or more comprehensive understanding of what
that command meant to avoid the mistake of I say one thing,
and the TV does something else because it misinterprets what I do.
If you take a look at Samsung's policy, it says things like the device
will collect IP addresses, cookies, your hardware and software configuration, so
the settings that you have put onto your television, your browser information.
Some of these TVs, these smart TVs, have web browsers built into them.
And so you may be also sharing information about your history
and so on.
Is this necessarily a bad thing?
When it became a news story it was mildly scandalous in the tech world
because it was unexpected.
No one thought that that was something a television should be doing.
But is it really all that different from when you use your browser anyway?
We've seen in this course that whenever we connect to a website,
we need to provide our IP address so that the site that we're requesting,
the server, knows where to send our data back to.
And in addition.
As part of those HTTP headers, we not only send our IP address,
but we're usually sending information about what operating system or running,
what browser we're currently using, where geographically we
might be located, so ways to help the routers route
traffic in the right direction.
Are we leaking as much information when we
use the internet to make a request as we are when our television is interpreting
or understanding a command?
Why is it that this particular action, this interpretation of sound,
feels so much more of a privacy violation
than just accessing something on the internet when we're voluntarily, sort
of, revealing the same information?
Are we not voluntarily relinquishing the same information
to a company like Samsung, whose smart TVs sort of precipitated this?
Moreover, is it technologically feasible for Samsung
to not collect all of the sounds that it hears?
One of the big concerns as well that came up
with these smart TVs is that when does the recording and transmitting start?
For those of you who maybe have seen old versions of Star Trek,
you may recall that in order to activate the computers on that television show,
someone would just say computer.
And then the computer would sort of spring to life,
and then they could have a normal English language interaction with it.
There's no need to program specific commands
or click anything or have any other interaction other than voice.
How would we technologically accomplish that now?
How would a device know whether or not it
should be listening unless it's listening for a specific word?
Is there a way for the device to perhaps listen
to everything that comes in but only start sending information
when it hears a command?
Is it impossible for it not to capture all of the information
that it's hearing and send it somewhere, encrypt it or not encrypt it, and just
transmit it somewhere else?
It's kind of an interesting question.
Samsung also allows not only voice controls, but gesture controls.
This may help people who are visually impaired
or help people who are unable to use a remote control device.
They can wave or make certain gestures.
And in so doing, they're going to capture your face perhaps
as part of this gesture.
Or they may capture certain movements that you're making
or maybe even capture, depending on the quality
of the camera built into the television, aspects of the room around you.
Is this necessarily problematic?
Is this something that we as users of this software
need to accept as something that just is part of the deal?
In order to use this feature, we have to do it?
Is there a necessary compromise?
Is there a way to ensure that Samsung is properly interacting with our data?
Should there be a way for us to verify this?
Or is that proprietary to Samsung, the way that it handles that data?
Again, these are all sorts of questions that we really
want to know the answers to.
We want to know whether or not what we are saying we're doing is secure,
is private.
And we can read the policies of these organizations that are providing
these tools for us to interact with.
But is that enough?
Do we have a way to verify?
Is there anything we can do other than just trust
that these companies are doing what they say they're doing,
or services or programmers are providing tools that
do exactly what they say that they do?
Without some really advanced knowledge and skill in tech, the answer is no.
And even if you have that advanced skill or knowledge,
it's really hard to take a look at a binary, zeros
and ones, the actual executable program that is being run on these devices,
and look at it and say, yeah, I think that that does match the source code
that they provided to me so I can really feel
reasonably confident that yeah I trust this particular piece of software.
As we've discussed in the context of security,
trust is sort of something we have to deal with.
We're constantly torn between this tension of not trusting other people,
and so we encrypt everything, but needing to trust people in order
for some things to work.
It's a very delicate balancing act that we have to contend with every day.
And again, I don't mean to pick on Samsung here.
This is just one of many different examples
that have sort of existed in popular culture.
Let's consider another one, for example.
Let's consider a piece of hardware called the Intel Management
Engine, or hardware, firmware, software, depending
on what it is, because one of the open questions
is, what exactly is the Intel Management Engine?
What we do know about it is that it is usually part of the CPU itself.
It's unclear.
It's not exactly been publicly disclosed whether it's built into the CPU
or perhaps built into the CMOS or the BIOS, different parts, low-level parts
of the motherboard itself.
But it is a chip or some software that runs on a computer, whose intended
purpose is to help network administrators in the event
that something has gone wrong with a computer.
So recall that we previously discussed this idea
that it's possible to encrypt your hard drive,
and that there are also ramifications that
can happen if you encrypt your hard drive
and forget exactly how to un-encrypt your hard drive.
What the Intel Management Engine would allow, one of its several features,
is for a network administrator, perhaps if you're
in an enterprise suite, your IT professional, your head of IT
might be able to access your computer remotely by issuing commands,
because the computer is able to listen on a specific port.
It's like 16,000 something.
I don't remember exactly the port number.
And it's discussed again, as well, in the article provided.
But it allows the computer to be listening
for a specific kind of request that should only
be coming from an administrator's computer to be able to remotely access
another computer.
But the concern is because it's listening on a specific port,
how is it possible to ensure that the request that it's
receiving on that port or via that IP address are accurate?
Because Intel has not disclosed the actual code
that comprises this module of the IME.
And then the question becomes, is that a problem?
Should they be required to reveal that code?
Some will certainly argue yes it's really important for us
as end users to understand what software is running on our devices.
We have a right to know what programs are running on our computers.
Others will say, no, we don't have a right to do that.
This is Intel's intellectual property.
It may contain trade secret information that allows its chips to work better.
We don't, for example, argue Coca-Cola should
be required to reveal its secret formula to us because it may implicate
certain allergies or Kentucky Fried Chicken needs
to disclose its secret recipe to us.
So why should Intel be required to tell us
about the lines of code that comprise this part of its hardware or software
or firmware, again depending on exactly what it is, because it's slightly
unclear as to what this tool is.
So the question again is, are they required
to provide some degree of transparency?
Do we have a right to know?
Should we just trust that this software is indeed
only being used to allow remote access only to authorized individuals?
If Intel were to provide a tool to tell us whether our computer was
vulnerable to attack from outside computers accessing
our own personal computers outside of the enterprise context,
should we trust the result of the software
that Intel provided that tells us whether or not it is vulnerable?
As it turns out, Intel does provide this software
to tell you whether or not your IME chip is activated in such a way
that yes, you are subject to potential remote access or no, you are not.
Does saying that you are or your aunt reveal potential trade
secret-related information about Intel?
Should we be concerned that Intel is the one providing us
this information versus a third party providing us this information?
Of course, Intel being the only organization
that really can tell us that we're vulnerable
or not because they're the only ones who know what is on this software.
So again, not picking on any individual company
here, just drawing from case studies that exist in popular culture
from in tech circles about the kinds of questions
that we need to start considering and wrestling with.
Are they going to be required to disclose this information?
Should Samsung be revealing information about what sorts of data
it's collecting and how it's collecting it?
Do we trust that our compilers, as Ken Thompson alluded to,
actually compile our code the way that they say that they do?
This healthy skepticism is always at the forefront of our mind
when we're considering programming- and technology-related questions.
But how do we press on these issues further in a legal context?
That's still to be determined.
And that's going to be something that we're
going to be grappling with for quite some time, I think.
Another key issue that's likely to be faced by technologists
and the lawyers who represent them, particularly
startups working in a small environment with limited numbers of programmers
that may be relying on material that's been open sourced online,
is this idea of open source software and licensing.
Because the scheme that exists out there is quite complicated.
There are many, many different licenses that
have many, many different provisions associated with them.
And each one will have different combinations
of some of these things being permitted, some of them not,
and potential ramifications of using some of these licenses.
We're going to discuss three of the most popularly used licenses, particularly
in the context of open source software, generally that is released on GitHub.
And the first of these is GPL version 3, GPL being the new Public License.
And one of the things that GPL often gets criticism for
is it is known as a copyleft license.
And copyleft is sort of designed to be the inverse of what copyright
protection's usually thought of.
Copyright protections give the owner or the person who owns the copyright, not
necessarily the creator but the person who owns the copyright, the ability
to restrict certain behaviors associated with that work or that material.
The GPL sort of does the opposite.
Instead of restricting the rights of others,
it compels others, who use code that has been licensed under the GPL,
to avoid allowing any restrictions at all,
such that others can also benefit from using and modifying
that same source code.
The catch with GPL is that any code that incorporates the GPL--
GPL license, excuse me.
Any code that includes GPL-licensed code--
so say you incorporate some module written by somebody else,
or your client incorporate something that they found on GitHub
or found on the internet and wants to include it into their own project.
If that code is licensed under the GPL, unfortunately one of the side effects
perhaps of what your client or what you have just done
is you have transformed your entire work into something that
is GPL, which means you are also then required to make the source
code available to anybody, make the binary available to anybody,
and also to allow anybody to have the same rights of modification
and redistribution that you had as well.
So think about some of the dangers that might introduce for a company that
relies extensively on GPL license code.
They may not be able to profit as much from that code
as they thought they would.
Perhaps they thought they had this amazing disruptive idea that
was going to transform the market.
And this particular piece of GPL code that they found online
allowed them-- it was the final piece of the puzzle that they needed.
When they included it in their own source code,
they transformed their entire project, according
to the terms of the GPL license, into something that was also GPL licensed.
So their profitability-- they could still sell it.
But their profitability may be diminished because the source code is
available freely to anybody to access.
Now, some people find this particularly restrictive.
In fact, pejoratively sometimes this is referred
to as the GNU virus, the General Public License virus,
because it propagates so extensively.
As soon as you touch code or use code really
that is GPL licensed, suddenly everything
that it touches is also GPL licensed.
So it's, depending on your perspective of open source licensing,
it's either a great thing because it's making more stuff available,
or it's a bad thing because it is preventing people
from using open source material to create further developments when they
don't necessarily want to license those changes or modifications that they
made.
The lesser General Public License, or the lesser GNU Public License,
is basically the same idea, but it only applies to a library code.
So if code is LGPL-ed, what this basically means
is any modifications that you make to that code also need to be LGPL-ed,
or released under the LGPL license.
But other ancillary things that you do in your program that
overall incorporates this library code does not need to be LGPL-ed.
So it would be possible to license it under other terms,
including terms that are not open source at all.
So changes that you make to the library need
to be propagated down the line so that other people can
benefit from the changes that are specific to the library that you made.
But it does not necessarily reflect back into your own code.
You don't have to necessarily make that publicly available.
So this is considered slightly lesser in terms of its ability to propagate.
And also, though, it's considered lesser in terms of its ability
to grant rights to others.
Then you have, at the other end of the extreme, the MIT license.
The MIT license is considered one of the most permissive licenses available.
It says, here's the software.
Do whatever you want with it.
You can make changes to it.
You don't have to re-license those changes to others.
You can take this code and profit from it.
You can take this code and make whatever-- re-license it
under some other scheme if you want.
So this is the other end of the extreme.
Is this license copyleft?
Well, no, it's not copyleft because it doesn't require others
to adhere to the same licensing terms.
Again, you can do with it whatever you would like.
Most of the code that is actually found on GitHub is MIT licensed.
So in that sense, using code that you find online
is not necessarily problematic to an entrepreneur or a budding developer who
wants to profit from some larger program that they write if it incorporates
MIT-licensed code, which might be an issue for those who are incorporating
GPL-licensed code.
What sorts of considerations, then, would
go into deciding which license to use?
And again, these are just three of many, many licenses that exist
that pertain to software development.
Then, of course, there are open source licenses
that are not tied to this at all.
So for example, a lot of the material that we produce for CS50,
the course on which this is based at Harvard College,
is licensed under a Creative Commons license,
which is similar in spirit to a GPL license,
in as much as it oftentimes will require people to re-license the changes that
they make to that material under GPL--
or under Creative Commons, excuse me.
It will generally require a non-commercial aspect of it.
It is not possible to profit from any changes that you make and so on.
And that's not a software license.
That's more of a general media-related license.
So these software open source licenses exist in both contexts.
But what sorts of considerations might go into choosing a license?
Well, again, it really does depend on the organization itself.
And so that's why understanding a bit about these licenses
certainly comes into play.
Do you want your changes to propagate and get out into the market
more easily?
That might be a reason to use the MIT license, which is a very permissive.
Do you just feel compelled to share code with others,
and you want to insist that others share that code as well?
Then you might want to use GPL.
Do you potentially want to use open source code
but not release your own code freely to others, the changes
that you make to interact with that code?
That might be cause for relying on LGPL for the library code
that you import and use but licensing your own changes and modifications
under some other scheme.
Again, a very complex and open field that's
going to require a lot of research for anyone who's
going to be pursuing and helping clients who
are working with software development and what they want
to do with that code going forward.
So let's turn our attention now from issues that have existed for a while
and sort of been bubbling underneath the surface,
issues of trust and issues of software licensing--
those have been around a lot longer--
and start to contend with new technologies
and how the law keeps up with them.
And so you'll also hear these terms that are
being considered emergent technologies or new technologies.
You'll sometimes see them referred to as disruptive technologies
because they are poised to materially affect the way that we interact
with technology, particularly in terms of purchasing things
through commerce, for example, as in the case of our first topic, 3D printing.
So how does 3D printing work, is a good question to ask at the outset.
Similar in spirit to a 2D printer, with a 2D printer
you have a write head that spits out ink, typically in some sort of toner.
It moves left to right across a piece of paper.
And the paper's also fed through some sort of feeder.
So the left-to-right movement of the toner or ink head
is the x-axis movement.
And the paper rolling underneath that provides y-axis movements.
Such that when we're done, we may be able to get access
to a piece of paper that has ink scattered across it, left to right,
top to bottom.
3D printers work in very much the same way, except instead of their medium,
instead of being ink or toner, is typically some sort of filament that
is conventionally, at least at the time of this recording, been
generally plastic based.
And what basically happens is the plastic
is melted just to above the melting point of the plastic.
And then it is deposited onto some surface.
And that surface that is being moved over by a similar read-write head,
basically it's a nozzle or eyedropper basically of plastic.
And it can move up and down across a flat surface,
similar to what the printer would do.
But instead of just being flat, the arm can also move up and down.
On some models of 3D printers, the table can move up and down
to allow it to not only print on the xy-plane, but also on the z-axis.
So it can print in space and create three-dimensional objects, 3D printing.
Typically the material used, again, is melted plastic just
above the melting point.
So that by the time it's deposited onto the surface
or onto other existing plastic, it's already basically cooled
enough that it's hardened again.
So the idea is we want to just melt it enough so
that by the time it's put onto some other surface,
it re-hardens and becomes a rigid material once again.
Now, 3D printing is usually considered to be a disruptive technology
because it allows people to create items they may not otherwise have access to.
And of course, the controversial one that
is often spoken about in terms of we need to ban things
or we need to ban certain 3D printers or ban certain 3D printing technologies
is guns, because it's actually possible, using technology
that exists right now, to 3D print a plastic gun that
would evade any sort of metal detection that is usually used for detecting guns
and is fully functional.
It can fire bullets, plastic bullets or real metal bullets.
The article that is recommended that goes with this part of the discussion
proposes several different ways that we might be able to--
or the law may be able to keep up with 3D printing technologies.
Because, again, the law typically lags behind technology, and so
is there a way that the law can contend with this?
And there are a couple of options that it proposes
that I think are worthy of discussion.
The first is allow permission-less innovation.
Should we just allow people to do whatever
they want with it, the 3D printing technology,
and decide ex post facto this, what you just did, is not OK,
the rest of it's fine and disallow that type of thing going forward?
This approach is interesting because it allows people to be creative,
and it allows potentially for things to be
revealed about 3D printing technology that were not
possible to forecast in advance.
But is that reactive-based approach better?
Or should we be proactive in trying to prevent
the production of certain things that we don't want to be produced?
And moreover, all the plastic filament tends
to be the most popular and common way that things are 3D printed right now.
3D printers are being developed that are much more advanced than this.
We are not necessarily restricted to plastic-based printing.
We may have metal-based printing.
And you may have even seen that there are 3D printers that exist
that can produce organic materials.
They use human cells, basically, to create things like organs.
Do we want people to be able to create these things?
Is this the kind of thing that should be regulated beforehand rather
than regulated after we've already printed
and exchanged copyrighted designs for what to build and construct?
Is it too late by the time we have regulated it to prevent it
from being reproduced in the future?
Another thought that this article proposes is immunizing intermediaries.
Should we allow people to do whatever they want with 3D printing?
Or maybe not allow people to do whatever they want 3D printing,
but regardless don't punish the manufacturers of 3D printers
and don't punish the designers of the CAD files,
the Computer-Aided Design files, that generally go into 3D printing?
Is this a reasonable policy approach?
It's not an unheard of policy approach.
This is the approach that we typically have used with respect
to gun manufacturers, for example.
Gun manufacturers generally are not subject to prosecution for crimes
that are committed using those guns.
Should we apply something similar to 3D printers, for example,
when the printer is used to manufacturer a gun?
Who should be punished in that case, the person who
designed the gun model, the person who actually
printed the gun, the 3D printer manufacturer itself,
any of those people?
Again, an unanswered question that the law is going
to have to contend with going forward.
Another solution potentially is to rely on existing common law.
But the problem that typically arises there
is that there is not a federal common law.
And so this would potentially result in 50 different jurisdictions handling
the same problem in different ways.
Whether this is a good thing or a bad thing, again,
sort of dependent on how quickly these things move.
Common law, as we've seen, certainly is capable of adapting
to new technologies.
Does it do it quickly enough for us?
Finally, another example that is proposed
is that we could just allow the 3D printing industry to self-regulate.
After all, we, as attorneys, self-regulate,
and that seems to work just fine.
Now, granted this may be because we are in an adversarial system,
and so there's advantages and extra incentives for adversaries
to insist that we are adhering to our ethical principles
and doing the right thing.
There's also the overhanging threat of outside regulation
if we do not self-regulate.
So in a lawyer context, adapting this model to 3D printing
may work because it seems to be working well for attorneys.
Then you consider that social media companies are also
self-regulating, with respect to data protection and data privacy.
And as we've seen, that's maybe not going so well.
So how do we handle the regulation of 3D printing?
Does it fall into the self-regulation category?
Does that succeed?
Does it fall into the self-regulation category that doesn't succeed?
Does it require preemptive regulation to deal with?
Now, 3D printing also has some other potential concerns.
Very easily, by the nature of the technology itself,
it's quite capable of violating copyrights, patents, trademarks,
potentially more just by the virtue of the fact
that you can create things that may be copywritten or patented or trademarked.
And there's also prior case law that sort of informs potential consequences
for using 3D printers, the Napster case from several years ago, the technology.
Napster would allow peer-to-peer sharing of digital music files.
Basically that service was deemed to entirely exist
for the purpose of violating copyright.
And so that shut down Napster basically.
Will 3D printers suffer the same fate?
Because you could argue that 3D printers are generally used to recreate things
that may be patented or may be subject to copyright.
Or is it going to fall more into a category like Sony, which
many years ago faced a lawsuit, or was part of a lawsuit involving VCRs
and tape-delaying copywritten material?
Is that going to be more of a precedent for 3D printing,
or is the Napster case going to be more of a precedent for 3D printing?
Again, we don't really know.
It's up to the future practitioners of technology law, who
are forced to grapple with the challenges presented by 3D printing,
to nudge us in that direction, one way or the other.
To dive a bit more deeply into this topic of 3D printing,
I do recommend you take a look at this article, "Guns Limbs and Toys--
What Future for 3D Printing?"
And if you're particularly interested in 3D printing and some
of the ramifications of it and the technological underpinnings of it,
I do encourage you to also take a look at "The Law and 3D Printing," which
is a Law Review article from 2015, which also is periodically updated online.
And it's a wonderful bibliography of all the different things
that 3D printing does.
And it will presumably continue to be updated as cases and laws come
into play that interact with 3D printing and start to define this relatively
ambiguous space.
Another particularly innovative space that
really pushes the boundaries of what the law is capable of handling
is the idea of augmented reality and virtual reality.
And we'll consider them in that order.
Let's define what augmented reality is.
And the most common example of this that you may be familiar with
is a phenomenon from several years ago called Pokemon Go.
It was a game that you played on your mobile phone.
And you would hold up your phone, and you
would see through the camera's lens, as if you
were taking a picture, the real world through the lens of the camera.
But superimposed onto that would be digital avatars
of Pokemon, which is part of this game of collectible creatures
that you're trying to walk around and find and capture, basically.
So you would try and throw some fake ball at them to capture them.
So augmented reality is some sort of technical graphical overlay
over the real world.
Contrast this with virtual reality, in which one typically
wears a headset of some sort.
It's usually proprietary.
It's not generally available as an app, for example,
like the augmented-reality game Pokemon Go was.
It's usually tied to a specific brand of headset,
like Oculus being one type of headset, for example.
And it is an immersive alternate reality basically.
When you put the headset on, you don't see the lens of the world around you.
You are transformed into another space.
And to make the experience even more immersive
is the potential to wear headphones, for example,
so that you are not only immersed in a visual space,
but also immersed in a soundscape.
Now, something that's particularly strange about these environments
is that they are still interactive.
It is still possible for multiple people, scattered
in different parts of the world, to be involved in the same virtual reality
experience, or the same augmented-reality experience.
Let's now consider virtual reality experiences, where
you are taken away from the real world.
What should happen if someone were to commit a crime in a virtual reality
space?
Studies have shown that people who are immersed in a virtual reality
experience can have serious ramifications.
They can have real feelings that last for a long time
based on their experiences in them.
For example, there's been a study out where people put on a virtual reality
headset, and they were then immersed in this space where
they were standing on a plank.
And they were asked to step off the plank.
Now, in the real world, this would be just like this room.
I can see that everything around me is a carpet.
There's no giant pit for me to fall into.
But when I have this headset on, I'm completely taken away from reality
as we see it here.
The experience is so pervasive for some people
that they walk to the edge of the plank, and they freeze in fear.
They can't move.
There's a real physical manifestation in the real world
of what they feel in this reality.
And for those brave people who are able to take the step off the edge,
many of them lean forward and try and fall into the space.
And some of them may even get the experience
like when you're on a roller coaster, and you feel that tingle in your spine
as you're falling.
The sense that that actually is happening to you
is so real in the virtual reality space that you can feel it.
So what would be the case, then, if you are in a virtual reality space,
and someone were to pull a virtual gun on you?
Is that assault?
Assault is a crime where your perception of harm is a material element.
It's not actual harm.
It's your perception of it.
You can perceive in the real world when somebody points a gun at you,
this fear of imminent bodily harm.
Can you feel that same imminent bodily harm in a virtual world?
That's not a question that's really been answered Moreover,
who has jurisdiction over a crime that is committed in virtual reality?
It's possible that I, here in the United States,
might be interacting with someone in France,
who is maybe the perpetrator of this virtual assault that I'm describing.
Is the crime committed in the United States?
Is the crime committed in France?
Do we have jurisdiction over the potential perpetrator,
even though all I'm experiencing or seeing
is that person's avatar as opposed to their real persona?
Does anyone have jurisdiction over it?
Does the jurisdiction only exist in the virtual world?
Virtual reality introduces a lot of really interesting questions
that are poised to redefine the way we think about jurisdiction
in defining crimes and the prosecutability of crimes
in a virtual space.
Some other terms just to bring up as well that sort of tangentially
relate to virtual and augmented reality so that you're
familiar with them are the real-world crimes that are very technologically
driven of doxing and swatting.
Doxing, if unfamiliar, is a crime involving
revealing or exposing the personal information of someone
on the internet with the intent to harass or embarrass or do
some harm to them by having that exposed, so, for example,
revealing somebody's phone number such that it can
be called incessantly by other people.
As well as swatting, which is a, well, pretty horrible crime, whereby
an individual calls the police and says, John Smith
is committing a crime at this address, is holding me
hostage, or something like that, with the intention
that the police would then go to that location
and a SWAT team would go, hence the term swatting,
and potentially cause serious injury or harm to the ostensibly innocent John
Smith, who's just sitting at home doing nothing.
These two crimes are generally interrelated.
But they oftentimes come up in the technological context,
usually as part of the same conversation, when we're
thinking about virtual reality crimes.
One of the potential upsides, though, if you
want to think about it like that, of crimes that are committed
in virtual or augmented reality are--
well, there's actually a few.
First, because it is happening in a virtual space,
and because generally in the virtual space all of our movements are tracked,
and the identities of everybody who's entering and leaving
that space are tracked by way of IP addresses,
it may be easier for investigators to figure out who
the perpetrators of those crimes are.
You know exactly the IP address of the person who apparently initiated
this threat against you in the virtual space, which may perhaps make it easier
to go and find that person in reality and question them
about their involvement in this alleged crime.
The other thing that's fortunately a good thing about these crimes,
and this is not to mitigate the effect that these crimes can have,
is that usually you can kind of mute them from happening.
If somebody is in a virtual space, and they're just screaming constantly,
such that you might consider that to be disturbing the peace when you're
in a virtual space trying to have some sort of pleasant experience ordinarily,
you usually have the capability of muting them.
This is not a benefit that we have in real life.
We generally can't stop crimes by just pretending they're not happening.
But in a virtual space, we do have that luxury.
That's, again, not to mitigate some of the very unpleasant and unfortunate
things that can happen in virtual reality that are just inappropriate.
But being in that space does allow people
the option to get away from the crime in a way that the confines of reality
may not allow.
But again, this is a very challenging area
because the law is not really equipped right now
to handle what happens in an alternate reality, which effectively
virtual reality is.
And so, again, if you're considering trying to figure out the best
way to prosecute these issues or deal with these issues,
you may be at the forefront of trying to define how crimes
are dealt with in a virtual space.
Or how potentially, if working with augmented reality,
if malicious code is put up in front of you
to simulate something that might be happening in the real world,
how do you prosecute those kinds of crimes, where you may be, for example,
using a GPS program that is designed to navigate you
in one direction versus the other based on the set of glasses
that you're wearing so you don't have to keep looking at your phone to make sure
that you're going the right way.
What if somebody maliciously programs that augmented-reality program to route
you off a cliff somewhere, right?
How do we deal with that?
Right now, again, augmented-reality virtual reality,
it's a relatively untested space for lawyers in the law.
In the second part of today's lecture, we're
going to take a look at some potential regulatory challenges going forward,
some issues at the forefront of law and technology generally related to privacy
and how the law is ill equipped or hopefully
soon to be equipped to handle the challenge that these issues present.
And the first of these is your digital privacy,
in particular, the abilities of organizations, companies,
and mobile device manufacturers to track your whereabouts, whether that's
your digital whereabouts, where you go on the internet,
or your physical whereabouts.
We'll start with the former, your digital whereabouts.
So there's an article we provided on digital tracking technologies.
This is designed to be a primer for the different types of things
that companies, in particular their marketing teams,
may do to track individuals online with, again,
relatively little recourse for the individuals
to know what sorts of information is being gathered
about them, at least in the US.
Now, of course, we're familiar with this idea
of a cookie from our discussion of interacting with websites.
It's our shorthand way to bypass the logging credentials
and show sort of a virtual hand stamp saying, yes, I am who I say I am.
I've already previously logged into your service.
Cookies are certainly one way that a site
can track a recurrent user from coming to the site over and over and over.
Now, this article posits that most consumers have just
come to accept that they're being tracked,
like that's just part of the deal with the internet.
Do you think that using cookies and being tracked
is an essential requirement of what it means to use the internet today?
And if you do think that, is that the way it should be?
And if you don't think that, is that also the way it should be?
Or should we be considering the fact that tracking is happening?
Is that an essential part of what it means to use the internet?
We also need to be concerned about the types of data
that companies are using or collecting about us.
Certainly cookies are one way to identify who we are.
But also it's possible for a cookie to be identified with what types of data
an individual accesses while visiting a particular site.
So for example, if I am on Facebook, and I'm using my cookie,
and I'm looking up lots of pictures on Facebook--
I'm just I'm searching for all my friends
profiles and clicking on all the ones that have cats in them--
that might then give Facebook, or the administrator of that site,
the ability to pair that cookie with a particular trend of things
that that cookie likes.
So in this case, it might want to then-- it knows, OK, maybe the person who
owns this cookie likes cats.
And as such, it may then start to serve up
advertisements related to cats to me.
And then when I log into a site, it's going
to get information about my IP address.
And if I use that cookie, it has now mapped my IP address to the fact
that I like cats.
And then it could sell the information about me, this particular IP address--
I guess it's not necessarily me because one IP address usually covers a house
but gets you pretty close--
maps this particular IP address to somebody who likes cats.
So they may sell that to some other service.
Now, it turns out that IP addresses are generally
allocated in geographic blocks, which means that, again, just by virtue
of the fact that I log into a particular site,
I'm able to access and access similar data when visiting that site.
They may not be able to geographically isolate down to--
again, depending on how populated the area you are currently living in
is, possibly narrow it down to a city block, that someone in this city block
really likes cats.
And then this company may be involved in targeted actual physical mail
advertising, snail mail advertising, where
some company that sells cat products, like a pet store or something,
might target that particular block with advertising, in the hopes that because
of this data that has been collected about this particular cookie, who then
logged in with a particular IP address, which
we've zeroed in to a particular geographic location--
it's kind of feeling a little unsettling, right?
Suddenly something that we do online is having a manifestation, again,
in the real world, where we're getting targeted advertising not just
on sites that we visit, but also in our mailbox at home.
It's a little bit discomfiting.
Should IP addresses be allocated in this way?
Is this the kind of thing that technologically can be changed?
The latter answer is yes, it is possible to allocate
IP addresses in a different way than we typically do.
Should we allocate IP addresses in a different way than we typically do?
Is the potential threat of receiving real-life advertisements
related to your online activities enough to justify that?
What would be enough to justify that kind of change?
Then, of course, there's the question of tracking not in the digital world,
but in the real world.
This is usually done through mobile phone tracking.
And so we provide an article from the Electronic Frontier Foundation.
And full disclosure, some of the articles we've presented here
do have a certain bias in them.
The Electronic Frontier Foundation is well-known as a rights advocacy
group for privacy.
And so they're going to naturally be disinclined to things that
involve tracking of data and so on.
So just bear that in mind, some additional context
when you're considering this article.
But it does contain a lot of factual information and not
necessarily just purely opinion about things that should be changed.
Although it does advocate for certain policy changes.
Now, why is it that tracking on a mobile device
is oftentimes perceived as much worse than tracking on a laptop or desktop?
Well, again, first of all, it's your mobile device
is generally with you at all times.
We've reached the point where our phones are generally carried in our pockets
and with us wherever we go, which means that it's very easy to use data
that's collected from mobile phone--
information that's given out by the mobile phone,
whether that's the cell phone towers or GPS data and so on,
to pinpoint that to us.
The other concern is that mobile phones are very, very quick
to become obsolete.
Oftentimes one or two versions of a new version
of a phone, whether it's a new Android phone release or software
release or a new iPhone or so on, the version that came out two years ago
is generally obsolete, which means it is no longer subject to firmware patches
provided by the manufacturer or the software
developers of the operating systems that are
run on those phones, which could also mean that they are much more
susceptible to people figuring out how to break into those phones
and use that tracking information against you.
So laptops and desktops generally don't move that much.
You may carry your laptop to and from but generally
to just a couple locations.
It's usually set at a desk somewhere in between.
Your desktop, of course, doesn't move at all.
So the tracking potential there is pretty minimal.
And also those devices tend to last quite a long time,
and the lifecycle support for service and keeping those operating systems
up to date is quite a bit longer versus the mobile phone,
where that window is much, much shorter.
Now, phones, contrary to most people's opinions of this,
phones do not actually track your information based on GPS data.
The way GPS works is your phone just fires off a signal,
and it gets a response back that is trying to triangulate
where exactly you are in space.
But there's no information about what device requested that data or so on.
And generally that data's not stored on the phone or in the GPS satellite
in any way.
It's just sort of ask-and-answer type inquiry.
The real threat vector for phone tracking, if this is the kind of thing
that you're concerned about, is actually through cell phone towers
because cell phone towers do track this information.
Different companies own different towers.
They would like to know who is using each tower,
whether or not this may involve also charging the--
say I'm using a Verizon phone, and I happen
to be connected to an AT&T tower.
AT&T may wish to know that this is mostly being used by Verizon customers.
And the only way they really know that is
by mapping the individual device to the phone number,
then checking that against Verizon's records.
And so they are collecting all this information
about every phone that connects their tower so they could potentially
bill Verizon for the portion of their customers
who were using their infrastructure.
So these towers do track information.
And towers also can be used to triangulate your location.
If I'm standing in the middle of an open field, for example,
and there's a tower over there and a tower maybe just beside me,
generally the signal that I'm sending-- my phone
is emitting a signal constantly.
If I'm emitting one signal in that direction,
and it's received by a tower fairly weakly, and if I'm emitting another--
my phone is, again, radially sort of emitting the signal.
If right next to me is another tower that's
picking it up very strongly, in space I can
use the information, sort of extrapolating from these two points,
I'm most likely here.
So even without having GPS turned on, just by trying to make a phone call
or use a 2G, 3G, 4G network, it's pretty easy
to figure out where you are in space.
And this is potentially a concern.
This concern comes up sometimes in the context
of are these companies who provide operating systems for phones
or firmware for phones, are they at the behest of government agencies, who
may request back doors into the devices so that they can then
spy on individuals?
And certainly this might be something that
comes up in a FISA court or the like, where
they're trying to get phone records.
And there's always this sort of unknown.
Is it happening to all of our devices all the time?
Is it is it happening right now the phone in my pocket?
Or is the sound being captured in such a way
that it can be transmitted just because?
Because there happens to be a backdoor in the operating
system or a backdoor in the firmware that
allows anybody to listen to it, even if they're not
supposed to be listening to it.
It's really hard to pretend to be somebody that you're not with a phone.
As you saw, it's pretty easy to pretend to be somebody
that you're not with a computer you can use a service like a VPN, which
pretends to be a different IP address.
You connect to the VPN.
And as long as you trust VPN, the VPN ostensibly protects your identity.
With mobile phones, every device has a unique ID.
And it's really hard to change that ID.
So one way around this is to use what are
called burner phones, devices that are used once, twice,
and then they're thrown away.
Now, this again comes down to how concerned are you about your privacy?
How concerned should you be about your privacy?
Are you concerned enough that you're willing to purchase these devices that
are one-time, two-time use devices, which you then
throw away and constantly do that?
And moreover, it's actually kind of interesting to know
that burner phones don't actually do--
they're not shown to do anything to protect one's identity or privacy
because it tends to be the case that we call the same people,
even if we're using different phones.
And so by virtue of the fact that this number seems
to be calling this number and this number all the time,
like maybe it's my work line and my family, my home number.
If I'm always calling those two numbers, even if the phone number
changes, a pattern can still be established with the device IDs of all
of the other phones, maybe my regular phone plus all the burners that I've
had, where you can still craft a picture of who I am,
even though I'm using different devices, based on the call patterns
that I'm making.
As usual, humans are the vulnerability here.
Humans are going to use the same-- they're going to call the same people
and talk to the same people on their phones all the time.
And so it's relatively easy for mobile devices to track our locations.
Again, every device has a unique ID.
You can't hide that ID.
That ID is part of something that gets transmitted to cell towers.
And potentially the threat exists that if somebody
is able to break into that phone, whether that's
because of old, outdated firmware that's not been updated
or because of the potential that there is some sort of backdoor that
would allow an agent, authorized or not, to access it, again,
this vulnerability exists.
How does the law deal with do you own the information that is being tracked?
Do you want that information to be available to other people?
It's an open question.
Another issue at the forefront of where we're going,
especially when it comes to legal technology and law firms itself
availing itself of technology, is artificial intelligence and machine
learning.
Both of these techniques are incredibly useful potentially
to law firms that are trying to process large amounts of data
relatively quickly, the type of work that's
generally been outsourced to contract attorneys or first-year associates
or the like.
First of all, we need to define what it means when
we talk about artificial intelligence.
Generally when we think about that, it means
something like pattern recognition.
Can we teach a computer to recognize specific patterns?
In the case of a law firm, for example, that might be can
it realize that something looks like a clause in a contract, a valid clause
that we might want to see or a clause that we're
hoping not to see in our contracts.
We might want to flag that for further human review.
Can the machine make a decision about something?
Should it, in fact, flag that for review?
Or is it just highlighting things that might be alarming or not?
Can it mimic the operations of the human mind?
If we can teach a computer to do those things--
we've already seen that we can teach a computer
to teach itself how to reproduce bugs.
We saw that in Ken Thompson's compiler example.
If we can teach a computer to mimic the types of things
that we would do as humans, that's when we've
created an artificial intelligence.
There's a lot of potential uses for artificial intelligences
in the legal profession, like I said, document review being
one potential avenue for that.
And there are a few different types of ways that artificial intelligences can
learn.
There are actually two kind of prevailing major ways.
The first is for humans to supply some sort of data
and also supply the rules that map the data to some outcome.
That's one way.
The other way is something called neuroevolution,
which is generally best exemplified by way of a genetic algorithm.
In a moment, we'll take a look at a genetic algorithm literally written
in Python, where a machine learns over time to try and generate
the right result.
In this model, we give the computer a target, something
that it should try and achieve, and request
that it generates data until it can match
that target that we are looking for.
So by way of example, let's see if we can
teach a computer to write Shakespeare.
After all, it's a theory that given an infinite amount of time,
enough monkeys could write Shakespeare.
Can we teach a computer to do the same?
Let's have a look.
So it might be a big ask to get a computer to write all of Shakespeare.
Let's see if we can get this computer to eventually realize
the following line, the target, so to speak, "a rose by any other name."
So we're going to try and teach a computer.
We want a computer to eventually on its own
arrive at this phrase using some sort of algorithm.
The algorithm we're going to use to do it is called the genetic algorithm.
Now, the genetic algorithm is called this based on the theory of genetics,
that best traits or good traits will propagate down and become
part of the defined set of traits we usually encounter.
And bad traits, things that we don't necessarily want,
will be weeded out of the population.
And over successive generations, hopefully only the good traits
will prevail.
Now, just like any other genetic variation,
we need to account for a mutation.
We need to allow things to change.
Otherwise we may end up in a situation where all we
have is the potential for bad traits.
We randomly might need something to happen to eliminate that bad trait.
We have no other way to do it.
So we do have to mutate some of our strings from time to time.
How are we going to teach the computer to do this?
We're not providing it with any data set to start with.
The computer's going to generate its own data set, trying to get at this target.
The way we're going to do this is to create a bunch of DNA objects.
DNA objects, in this example, we're just going to refer to as different strings.
And the strings are just a random--
as exemplified here in this code, a random set of characters.
We're going to have it randomly pick.
I believe that the string's about 23 characters long
that we're trying to have it match.
So it's going to randomly pick 23 characters,
uppercase letters, lowercase letters, numbers, punctuation marks,
doesn't matter, any legitimate Ascii character,
and just add itself to the list of potential candidates
for the correct phrase.
So randomly slam on your keyboard and hit 23 keys.
The computer has about 1,000 of those to get started.
Every one of those strings, every one of those DNA items,
also has the ability to determine how fit it is.
Fitness being is it more likely to go on to the next generation?
Does it have characteristics that we might want to propagate down the line?
So for example, the way we're going to, in a rudimentary way,
assess the fitness of a string, how close it is basically to the target,
is to go over every single character of it and compare,
does this match what we expect in this spot?
So if it starts with a T--
or excuse me, starts with an A, "a rose by any other name,"
if it starts with an A, then that's one point of fitness.
If the next character is a space, then that's one point of fitness.
So a perfect string will have all of the characters in the correct space.
But as long as it has even just one character in the correct space,
then it is considered fit.
And so we iterate over all of the characters in the string
to see if it is fit.
Now, much like multiple generations, we need the ability to create new strings
from the population that we had before.
And so this is the idea of crossover.
We take two strings.
And again, we're just going to arbitrarily decide
how to take two strings and mash them together.
We're going to say the first half comes from the mother string,
and the second half comes from the father string.
And that will produce a child, which may have some positive characteristics
from the mother and some positive characteristics
from the father, which may then make us a little bit closer towards this idea
of having the perfect string.
Again, the idea here is for the computer to evolve itself
into the correct string rather than us just giving it a set of data
and saying, do this.
We want to let it figure it out on its own.
That's the idea of the genetic algorithm.
So we're going to arbitrarily split the string in half.
Half the characters, or genes of the string, come from the mother.
The other half come from the father.
They get slammed together.
That is a new DNA sequence of the child.
And then again, to account for mutation, we
need some random percent of the time, in this case, we're saying less than 1%
the time, we would like one of those characters to randomly change.
So it doesn't come from the mother or the father string.
It just randomly changes into something else, in the hopes
that maybe that mutation will be beneficial somewhere down the line.
Now, in this other Python file, script.py,
we're actually taking those strings that we are just randomly creating--
those are the DNA objects from the previous file--
and starting to actually evolve them over time.
So we're going to start out with 1,000 of these random strings.
And the best score so far, the closest score we have,
the best match to "a rose by any other name" is currently zero.
No string is currently there.
We may randomly get it on the first generation.
That would be a wonderful success.
It's pretty unlikely.
Population here is just an array.
It's going to allow us to store all of these 1,000 strings.
And then as long as we have not yet found the perfect string.
The one that has 100% fitness or a score of exactly 1,
we would like to do the following, calculate the fitness score
for every one of those random 1,000 strings that we generated.
Then, if what we just found is better than anything we've seen before--
and at the beginning, we start with zero,
so everything is better than what we've seen before, as long as it
matches at least one character--
then print out that string.
So this is a sense of progression.
Over time we're going to see the strings get better and better and better.
Then we're going to create what's called a mating pool.
Again, this is this idea of two strings sort of crossing over.
They're sort of breeding to try and create a better subsequent string.
Depending on how good that string is, we may
want that child to be in the next population more times.
If a string is a 20% match, that's pretty good, especially
if it's an early generation.
So we may want that string to appear in the mating pool, the next generation,
20% of the time.
It has a better likelihood than a string that matches 5% of the characters
to be closer to the right answer.
So a string that barely matches anything,
sure, it should be in the pool.
Maybe it has the one character that we're looking for.
But we only want it in the pool 5% of the time
versus the string that matches 50% of the characters.
We probably want that in the pool 50% of the time.
The idea is, again, taking the best representatives of the next generation
and trying to have the computer learn and understand that those are good
and see if they can build better and better strings from those better
and better representatives of the population that
are more close to the target string that we're looking
for, "a rose by any other name."
Then in here all we're doing is picking two random items
from that pool we've just created of the best possible candidates
and mating those two together and continuing
this process of hopefully getting better and better approximations
of this string that we're looking for.
And what's going to happen there is they're going to create a crossover.
That crossover child DNA string will mutate into some other new string.
And we'll add that to the population to be considered for the next round.
So we're just going keep going over and over and over,
generating hopefully better and better strings.
So that's how these two files interact.
The first file that we took a look at defines the properties of a string
and how it can score itself basically.
And this process here in script.py--
and this these two files are based on a Medium post, which
we've described in the course materials, as well as an exam question that we've
previously asked in the college version of CS50,
for students to implement and solve on their own.
Hopefully these two files taken together, the script file,
will actually go through the process of creating this generation over and over.
So let's see this in action.
Let's see how in each successive generation
we see strings get closer and closer and closer to the target string.
Again, we never told the computer-- we never
gave the computer a set of starting data to work with, only an end goal.
The computer needs to learn how to get closer
and closer to finding the right string.
And that's what we do here.
So let's run our program and see if we've actually taught the computer how
to genetically evolve itself to figure out this target string
that we're looking for.
So we're going to run script.py, which is the Python file where
we described the process happening.
And let's just see how the generations evolve over time.
So we get started, and we have some pretty quick results.
This first string here has a matching score of 0.042, so 4%, which I believe
is one character.
So if we scroll through, we try and find "a rose by any other name,"
I don't know exactly which character it is here.
But this is basically saying one.
One of these characters matches.
It's 4.2% what we're hoping for.
That means that in the next pool, the next iteration,
this string will be included 4.2% of the time.
And there may also be other strings that also match.
Remember, we're only printing out when we have a better string.
So this only going to get included 4.2% of the time.
But there are going to be plenty of other things
that are also 4.2% matches that are probably matching-- each one of them
matches one different character.
So those will comprise part of the pool.
Then we're going to cross pollinate.
We're going to take each of those strings
that each had a one character match and mash them together.
Now, if the first string that we're considering
has the character match in the first half,
and the second string has a character match in the second half,
now we've created a new string that has two matches, right?
We know one of them was in the first half.
That came from the mother string.
We have one of them in the second half that came from the father's string.
And so the combined string together, unless that character
happens to get mutated out, which is a possibility--
we might actually take a good thing and turn it into a bad character.
Then the next one should be twice as good.
It should be 8.3% or 8.4% likely.
And that's exactly what it is.
So this next string has two matches.
And the next one has three and four.
And as we kind of scroll down, we see some patterns like this,
A question mark Q Y. That's obviously not part of the correct answer.
But it suggests that there's a parent in here that has this string that
tends to have really good fitness.
Like this string probably has many other characters outside of this box here
that match.
And so that parent propagates down the line for a while
until eventually those characteristics, in about the ninth generation or so,
get kind of wiped out.
And as we can see over time, what starts out
as a jumbled mess gets closer and closer to something
that is starting to look even at 58% like we're getting pretty close to
"a rose by any other name."
And as we go on and on, again, the likelihood gets better and better.
So that by the time we're here, at this line here,
this string is going to appear in 87 and 1/2%
of the next generation's population.
So a lot of these characteristics of this string that's close but not
exactly right will keep, appearing which makes it more and more likely
that it will eventually pair up with another string that
is a little bit better.
And as you probably saw, towards the end, this process got slower, right?
If all the strings are so good, it might just
take a while to find one where the match is better than the parents.
It might be the case that we are creating
combinations that are worse again.
We want to filter those back out.
And so it takes a while to find exactly what we're looking for.
But again, from this random string at the very beginning, over time,
the computer learns what parts are good.
So here's "rose," right, as part of the string.
This was eventually correct.
This got rooted out in the next generation.
It got mutated out by accident.
But mathematically, what it found was a little bit better.
There are more characters in this string that are correct than this one,
even if there are some recognizable patterns in the former.
But the computer has learned, evolved over time what it
means to match that particular string.
This is the idea of neuroevolution, teaching a computer
to recognize patterns without necessarily telling it
what those patterns are, just what the target should be.
So that genetic algorithm is kind of a fun programming activity.
But the principles that underpin it still apply to a legal context.
If you teach a computer to recognize certain patterns in a contract,
you can teach a computer to write contracts
potentially that match those patterns.
You can teach a computer to recognize those patterns
and make decisions based on them.
So we were using neuroevolution to build or construct something.
But you can also use neuroevolution to isolate correct sets of words
or correct sets of phrases that you're hoping to see in a contract
or that you might want to require for additional use.
So again, the types of legal work that this can be used to help automate
are things like collation, analysis, doing large document review,
predicting the potential outcome of litigation
based on having it review case precedents and outcomes
and seeing if there are any trends that appear in cases X, Y, and Z all
had this outcome.
Is there some other common thread in cases
X, Y, and Z that might also apply to the case that we're about to try?
Or potentially we need to settle because we see that the outcome is
going to be unfavorable to us.
But does this digital lawyering potentially make you uncomfortable?
Is it OK for legal decisions to be made by a computer?
Is it more OK if those decisions are made because we've trained them
with our own human instincts?
There are services out there.
There's a famous example of a parking ticket clearing service called Do Not
Pay from several years ago, where a 19- or 20-year-old computer
programmer basically taught a computer how
to argue parking tickets on people's behalf
so that they wouldn't have to hire attorneys to do so.
He wasn't a trained attorney himself.
He just recognized some of the things that are--
he talked to people and recognized some of the things that
are common threads for people who successfully challenged
parking tickets versus don't successfully challenge parking tickets,
taught a computer to mimic those patterns,
and have the computer send out notices and the like to defend parking
ticket holders.
And he was able to--
I think it was several hundred thousand dollars in potential legal fees saved
and several hundred thousand parking tickets that
were challenged successfully.
And the case was dropped, and there was no payment required.
So is it OK for computers to be making these decisions if humans teach them?
Is it only OK for computers to make those decisions
if the humans teaching them have legal training at the outset in order
to make these decisions?
Or can we trust programmers to write these kinds of programs for us as well?
Does lawyering rely on a gut instinct?
I'm sure sometimes in cases you've experienced
in your own practice the decision that you
make might be contrary to what you think might be the right thing
to do because you just feel like if I do this other thing
it's going to work better in this case.
And I'm sure that for many of you, this has paid off successfully.
Doing something that is in contravention of the accepted norm
is something that a computer may not be-- you
may not be able to train a computer to do that.
You may not be able to train gut instinct to challenge the rules,
when all this whole idea of neuroevolution and machine
learning and AI is designed to have computers learn and enforce rules.
Will the use of AI affect the attorneys' bottom line?
Hypothetically it should make legal work cheaper.
But this would then potentially reduce firm profits
by not having attorneys, humans, reviewing this material.
This is, in some ways, a good thing.
It makes things more affordable for our clients.
This is in some ways a bad thing.
We have entrenched expenses that we need to pay that are based on certain monies
coming in because of the hourly rates of our associates and our partners.
Does this change that up?
Does the fact of this changes it up, is it problematic?
Is it better for us to provide the most competent representation that we can,
even if that competent representation is actually from a computer?
Remember that as attorneys, we have an ethical obligation to stay on top of
and understand technology.
Sometimes that may become a situation where using that technology
and working with that technology really forces
us to do something we might not want to do
because it doesn't feel like the right thing
to do from a business perspective.
Nevertheless our ethical obligations compel us to potentially do that thing.
So we've seen some of the good things that machine learning can do.
But certainly there are also some bad things that machine learning can do.
There's an article that we provided about machine bias and a computer
program that is ostensibly supposed to be used by prosecutors and judges
when they are considering releasing somebody on bail
or setting the conditions for parole, whether or not
they're more likely to commit future crimes.
Like, what is their likely recidivism rate?
What kind of additional support might they need upon their release?
But it turns out that the data that we're feeding into these algorithms
is provided by humans.
And unfortunately these programs that are
supposed to help judges make better decisions have a racial bias in them.
The questions that get asked as part of figuring out
whether this person is more likely or not to commit a future crime,
they're never outright asking the question, what is your race
and basing a score on that.
But they're asking other questions that sort of are hints or indicators of what
someone's race might be.
For example, they're asking questions about socioeconomic status
and languages spoken and whether or not parents have ever
been imprisoned and so on.
And these programs sort of stereotype people in ways that are not OK,
or we might not deem to be OK in any way, to make decisions.
And these stereotypes are created by humans.
And so we're actually teaching the computer bias in this way.
We're supplying data.
We, as humans, are providing it.
We're imparting our bias into the program.
And the program is really just implementing
exactly what we're telling it to do.
Computers, yes, they are intelligent.
We can teach them to learn things about themselves.
But at the end of the day, that knowledge comes from us.
We are either telling them to hit some target or providing data to them
and telling them these are the rules to match.
So computers can are only as intelligent as the humans who create and program
them.
And unfortunately that means they're also as affected by bias
as the humans who create and program them.
These programs have been found that they are only 20%
of the time accurate in producing and predicting future violent crimes.
They are only 60% of the time accurate in predicting
any sort of future crime, so misdemeanors and so on,
so a little bit better than a 50/50 shot at getting it right
based on these predictive questions that they're asking people when
during intake process.
Proponents of these scoring metrics say that they provide useful data.
Opponents say that the data is being misused.
It's being used as part of sentencing determinations
rather than what its ostensible purposes, which
is to set conditions for bail and set conditions
for release, any sort of parole conditions that might come into play.
These calculations are also done by companies
that generally are for-profit entities.
They sell these programs to states and localities for a fixed rate per year
typically.
Does that mean that there's a financial incentive to make certain decisions?
Would you feel differently about these programs
if they were not free versus paid programs?
Should computers be involved in making these decisions that humans
would otherwise make anyway?
Like, given a questionnaire, would a human being
potentially reach the same conclusion?
Ideally that is what it should do.
It should be mimicking the human decision-making process.
Is it somehow less slimy feeling, for lack of a better phrase,
if a human being, a judge or a court clerk,
is making these determinations rather than a computer?
Now, granted the judge is still making the final call.
But the computer is printing out likely recidivism scores
and printing out all this data about somebody
that surely is going to influence the judge's decision
and in some localities, perhaps over influencing the judge's decision,
taking the human element out of it entirely.
Does it feel better if the computer is out of that equation entirely?
Or is it better to have a computer make these decisions
and potentially prevent mistakes from happening prevent or draw attention
to things that might otherwise be missed or minimize things that might otherwise
have too much attention drawn to them?
Again, a difficult question to answer, how much do we
want technology to be involved in the legal decision-making process?
But as we go forward, it's certainly undoubtedly true
that more and more decisions in a legal context
are going to be made by computers at the outset,
with humans sort of falling into the verification category rather
than active decision maker category.
Is this good?
Is this bad?
It's the future.
For entities based in the United States or who
solely have customers in the United States,
this next area may not be a concern now but it's very likely
to potentially become one in the future.
And that is what to do with GDPR, the General Data Protection
Regulation, or General Data Privacy regulation
that was promulgated by the European Union
and came into effect in May of 2018.
This basically defines the right for people to know what kind of data
is being collected about them.
This is not a right that currently exists in the United States.
And it'll be really interesting to see whether the EU
experiment about revealing this kind of data, which has never
been available to individuals before, will become something
that exists in the United States and is going to be something
that we have to deal with.
If you're based in the United States, and you do have customers in Europe,
you may be subject to the GDPR.
For example, us at CS50, we have students
who take the class through at edX, or HarvardX, the online MOOC platform.
And when GDPR took effect in May of 2018, we spoke to Harvard
and figured out ways that we needed to potentially interact
with European users of our platform, despite the fact that we're
based in the United States, and what sort of data implications
that might have.
And that it could be because of it's out of an abundance of caution to make sure
we're on the right side of it, even if we're not
necessarily subject to the GDPR, but it is certainly
an area of evolving concern for international companies.
The GDPR allows individuals to get their personal data.
That means data that either could identify an individual, something
like what we discussed earlier in terms of cookies and tracking
and the kinds of things that you search being tied to your IP address, which
then might be tied to your actual address and so on,
or data that even could identify an individual
but doesn't necessarily identify somebody just yet.
The requirement itself imposes requirements.
The regulation itself imposes requirements
on the controller, so the person who is providing a service
or is holding all of that data, and basically
says that what the controllers responsibilities are
for processing that data and what they have to reveal to users who request it.
So for example, on request, by a user of a service,
when that user and the controller are subjects the GDPR,
the controller must identify themselves, who they are,
what the best way is to contact them, tell the user what data they have
about them, how that data is being processed,
why they are processing that data, so what sorts of things
are they trying to do with it.
Are they trying to make longitudinal connections between different people?
Are they trying to collect it to sell it to marketers and so on?
They need to tell them if that data is going to be referred to a third party,
again, whether that's selling the data or using a third-party service to help
interpret that data.
So again for example, in the case of Samsung,
that might be Samsung is collecting your voice data.
But they may be sharing all the data they
get with a third party, whose focus, whose programming focus
is about processing that data and trying to find out better voice
commands by collecting the voices of hundreds of thousands
of different people so they can get a better
synthesis of a particular thing they hear, translating that into a command.
These same restrictions will apply whether the data
is collected or provided by the user, or is just inferred about the user
as well.
So that the controller would also need to reveal information
that was gleaned about somebody without necessarily having just
been given to them directly by the person providing that personal data.
The owner can also compel the controller to change data about them once they
get this report about what data they have about them that is inaccurate,
which brings up a really interesting question of, what if something
is accurate, but you don't like it, and you are
a person who's providing personal data?
Can you challenge it as inaccurate?
This is, again, something that has not been answered yet
but is very likely to be answered at some point by somebody.
What does it mean for data to be inaccurate?
Moreover, is it a good thing to delete data about somebody?
There are exceptions that exist in the GDPR for preserving data or not
allowing it to be deleted if it serves the public interest.
And so the argument that is sometimes made in favor of GDPR
is someone who commits a minor crime, for example,
might be haunted by this one mark on their record for years and years
and years.
They can never shake it.
And it's a minor crime.
There was no recidivism.
It wasn't violence in any way.
It just has now hampered-- it's impacted their life.
They can't get the kind of job that they want, for example.
They can't get the kind of apartment that they want.
Shouldn't they be able to eliminate that data?
Some people would argue yes, that the individual's already paid the price.
Society is not harmed by this crime or this past event any longer.
And so sure, delete that data.
Others would argue no, it's a part of history.
We don't have a policy of erasing history.
That's not what we do.
And so even though it's annoying perhaps to that individual,
or it's had a non-trivial impact on their life,
we can't just get rid of data that we don't like.
So data that might be deemed inaccurate personally,
like if a company gets a lot of information about me
because I'm doing a lot of online shopping, and they say,
I'm a compulsive spender, and that's part of their processed data,
can I challenge that is inaccurate because I
don't think I'm a compulsive spender?
I feel like I earn enough money and can spend this money how I want,
and it has an impact on my life negatively.
But they think, well, you've spent $20,000 on pictures of cats.
Maybe you are kind of a compulsive spender.
And that's something that we've gleaned from this data,
and that's part of your record.
Can I challenge that?
Open question.
For those of you who may be contending with the GDPR in your future practice,
we've excerpted some parts of it that are particularly relevant,
that deal with the technological implications
of what we've just discussed as part of the recommended
reading for this module.
The last subject that we'd like to consider in this course
is what is kind of a political hot potato right now in the United States.
And that is this idea of net neutrality.
And before we get into the back and forth of it,
I think it's properly important for us to define
what exactly net neutrality is.
At its fundamental core, the idea is that all traffic on the internet
should be treated equally.
We shouldn't prioritize some packets over others.
So whether your service is Google, Facebook, Netflix,
some huge data provider, or you are some mom-and-pop shop
in Kansas somewhere that has a few customers,
but you still have a website and a web presence,
that web traffic from either that location, the small shop,
or the big data provider should be treated equally.
One should not be prioritized over the other.
That is the basic idea that underpins-- when you hear net neutrality,
it is all traffic on the web should be treated equally.
The hot potato, of course, is, is that the right thing to do?
Let's try and visualize one way of thinking
about net neutrality that kind of shows you how both sides might perceive this.
It may help to think about net neutrality in terms of a road.
Much like a road has cars flowing over it,
the internet has information flowing over it.
So we can think about this like we have a road.
And proponents of net neutrality will say, well,
wait a minute, if we built a second road that was parallel to the first road,
went to the same place, but this road was maybe better maintained,
and you had to pay a toll to use it, proponents would say, hey, wait,
this is unfair.
All this traffic needs to use this main road
that we've been using for a long time.
But people who can afford to go into this new road, where
traffic moves faster, but you have to pay the toll, well, then
their traffic's going to be prioritized.
Their packets are to get there faster.
This is not fundamentally fair.
This is not the way the internet was designed,
where free flow of information is sort of priority,
and every packet is treated equally.
So proponents of net neutrality will say this arrangement is unfair.
Opponents of net neutrality, people who feel
like you should be able to have traffic that goes faster
on some roads than others, will say, no, no, no, this
is the free market talking.
The free market is saying, hey, if I really
want to make sure that my service gets to people faster,
I should have the right to do that.
After all, that's how the market works for just about everything else.
Why should the internet be any different?
And that's really the basic idea.
Is it should everybody use the same road,
or should people who can afford to use a different road be permitted to do so?
Proponents will say no.
Opponents will say yes.
That's the way the free market works.
From a theoretical perspective or from a technical perspective,
how would we implement this?
It's relatively easy if the service that we're trying to target
has paid for premium service.
Their IP addresses associated with their business.
And so the internet service provider, the people
who own the infrastructure on which the internet operates, so they literally
own the fiber optic cables along which the data operate,
can just say, well, any data that's going to this IP address,
we'll just prioritize it over other traffic.
There might be real reasons to actually want to prioritize other traffic.
So for example, if you are sending an email to somebody
or trying to access a website, there's a lot of redundancy built in here.
We've talked about TCP, for example, the Transmission Control Protocol,
and how it has redundancy built in.
If a packet is dropped, if there's so much network
congestion because everybody's flowing along that same road,
if there's so much congestion that the packet gets dropped,
TCP will re-send that packet.
So services that are low impact, like accessing a website for some company
or sending an email to somebody, there's no real worry here.
But now imagine a service like you're trying
to make an international business video call
using Skype or using Google Hangouts, or you're
trying to stream a movie on Netflix or some other internet video streaming
provider.
Generally, those packets are not sent using TCP.
They're usually using a different protocol called
UDP, whose purpose in life is really just to get information to as quickly
as possible, but there's no redundancy.
If a package gets dropped, that packet gets dropped, so be it.
Now, imagine if you're having an international business call.
There's a lot of packets moving, especially if you're
having a call with Asia, for example.
Between the United States and Asia, that has to travel along that Pacific cable.
There's a lot of traffic that has to use that Pacific cable.
Wouldn't it be nice, advocates against net neutrality would say,
if the company that's providing that service
was able to pay to ensure that its packets had priority thus
reducing the likelihood of those packets being dropped,
thus improving the quality of the video call, thus generally providing,
theoretically again, a better service for the people who use it.
So it might be the case that some services just need prioritization.
And the internet is designed in such a way
that we can't guarantee or give them that prioritization.
Isn't that a reason in favor of repealing net neutrality,
making it so that people could pay for certain services that
don't work with redundancy and require just to get there quickly
and get there guaranteed over other traffic?
In 2015, the Obama administration, when the Federal Communications Commission
was Democratically controlled, voted in favor of net neutrality,
reclassifying the internet as a Title II communications service.
Meaning it could be much more tightly regulated by the FCC
and imposing this net neutrality requirement.
Two years later, when the Trump administration came into office,
President Trump appointed Ajit Pai, the current chairman of the FCC,
who basically said he was going to repeal the net neutrality rules that
had been set in place by the Obama administration.
And he did.
Those took effect in the summer of 2018.
So we're now back in this wild lands of net neutrality
is on the books in some places.
There are even states now who have state laws
that are designed to enforce this idea, this theory of net neutrality,
that you're now running into conflict with federal law.
So there's now this question of who wins out here?
Has Congress claimed this domain?
Can states set different rules from what Congress and what regulators
appointed by or delegated responsibility by Congress to make these decisions?
Can states do something different than that?
It is probably one of the most hot-button hot-potato issues
in technology and the law right now.
What is going to happen with respect to net neutrality?
Is it a good thing?
Is it a bad thing?
Is it the right thing to do for the internet?
To learn a bit more about net neutrality,
we've supplied as an additional reading a con take on net neutrality.
Generally you'd see pro takes about this in tech blogs.
But we've explicitly included a con take on why net neutrality should not
be the norm, which we really do encourage you to take a look at
and consider as you dive into this topic.
But those are just some of the challenges
that lie at the intersection of law and technology.
We've certainly barely skimmed the surface.
And my hope is that I've created far more questions than answers
because those are the kinds of questions that you
are going to have to answer for us.
Ultimately it is you, as practitioners, who
will go out and face these challenges and figure out
how we're going to deal with data breaches, how we're
going to deal with AI in the law, how we're
going to deal with net neutrality, how we're going to deal with issues
of software and trust.
Those are the questions for the future that lie at this intersection.
And the future is in your hands.
So help lead us in the right direction.